AI’nt That Easy #1: Chat with Multiple PDFs using RAG with HuggingFaceHub and Langchain
3 min readJan 23, 2024
Welcome to the first blog of our series, AI’nt That Easy, where we’ll dive into practical AI applications and break down the code behind them. Today, we’ll unleash the power of RAG (Retrieval-Augmented Generation) to chat with multiple PDFs, turning them into interactive knowledge reservoirs.
The Scenario:
Imagine a pile of research papers, filled with valuable information, but locked away in text format. Wouldn’t it be amazing to have a conversation with them, ask questions, and receive insightful answers?
RAG makes it possible!
The Code Breakdown:
Let’s dissect the code snippet and understand how it works:
- Data Preparation:
get_pdf_text
: This function extracts text from all uploaded PDFs, merging them into one big pool of knowledge.
def get_pdf_text(pdf_docs):
text = ""
for pdf in pdf_docs:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text() # extracting text from each page
return text
get_text_chunks
: We break down the text into smaller chunks (1000 characters with overlap)…