Ragent: Whisperer PDF in PDF built on Langchain + Langgraph

Author: Dwipayan Bandyopadhyay

Originally published in the direction of artificial intelligence.

RThe Etrieval generation is a very well -known approach in the AI ​​generative field, which usually consists of a linear flow of document administration, storage in a vector database, and then recovering appropriate fragments based on the user's inquiry and supplying them to LLM for the final answer. Recently, the term “agentic ai” has stormed the Internet, in simple categories refers to breaking the problem into smaller sections and assigning it to some “agents” who are able to handle a specific task and combine smaller agents to build a complex flow. What happens if we connect this agency approach and extended generation? In this article, we will explain a similar concept/architecture, which we have developed with the help of Langraph, Faiss and Openai.

Source: photo by the author

We will not examine AI agents and how they work in this article; Otherwise, this would become a full book. But in order to present a short overview of what “AI agents” are, we can consider “agent AI” as an assistant, someone or something that is a master in one task, many agents with many possibilities are added together to perform a full graphic flow of agency work in which each agents can communicate with each other,

In our approach, we divided the concept of “generating extended recovery” into three different tasks and created an agent for each task that is able to handle one specific task, one agent will look at the part of the download, while the other looks at the part of the expansion, and finally the last agent will look at parts of the generation. Then we combined all three agents to make a comprehensive comprehensive flow of the agency's work. Let's immerse ourselves deeply in the coding section.

The coding section begins

First of all, we will install all the necessary packages required. The best practice would first be to create a virtual environment, and then install the following packages.

After installing them successfully, we import all the necessary packages to first create a retriever agent.

Coding Retrieretagent:

from langchain_openai import ChatOpenAI
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pypdf import PdfReader
import re
from dotenv import load_dotenv
import streamlit as st

load_dotenv()

LLM = ChatOpenAI(model_name="gpt-4o", temperature=0.0)

def extract_text_from_pdf(pdf_path):
try:
pdf = PdfReader(pdf_path)
output = ()
for i, page in enumerate(pdf.pages, 1):
text = page.extract_text()
text = re.sub(r"(w+)-n(w+)", r"12", text)
text = re.sub(r"(?, " ", text.strip())
text = re.sub(r"ns*n", "nn", text)
output.append((text, i)) # Tuple of (text, page number)
return output
except Exception as e:
st.error(f"Error reading PDF: {e}")
return ()

def text_to_docs(text_with_pages):
docs = ()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
for text, page_num in text_with_pages:
chunks = text_splitter.split_text(text)
for i, chunk in enumerate(chunks):
doc = Document(
page_content=chunk,
metadata={"source": f"page-{page_num}", "page_num": page_num}
)
docs.append(doc)
return docs

def create_vectordb(pdf_path):
text_with_pages = extract_text_from_pdf(pdf_path)
if not text_with_pages:
raise ValueError("No text extracted from PDF.")
docs = text_to_docs(text_with_pages)
embeddings = OpenAIEmbeddings()
return FAISS.from_documents(docs, embeddings)

# Define Tools
def retrieve_from_pdf(query: str, vectordb) -> dict:
"""Retrieve the most relevant text and page number using similarity search."""
# Use similarity_search to get the top result
docs = vectordb.similarity_search(query, k=3) # k=1 for single most relevant result
if docs:
doc = docs(0)
content = f"Page {doc.metadata('page_num')}: {doc.page_content}"
page_num = doc.metadata("page_num")
return {"content": content, "page_num": page_num}
return {"content": "No content retrieved.", "page_num": None}

RETRIEVE_PROMPT = ChatPromptTemplate.from_messages((
("system", """
You are the Retrieve Agent. Your task is to fetch the most relevant text from a PDF based on the user's query.
- Use the provided retrieval function to get content and a single page number.
- Return the content directly with the page number included (e.g., 'Page X: text').
- If no content is found, return "No content retrieved."
"""
),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{query}"),
))

Explanation of the code –

In this code of the Retriever agent, first of all, we import all necessary modules and classes required, we store our certificates, such as the Openai API key, Wa .env File, which is why the DEPENV module was used here next to the calling_dotenv function. Then initiating LLM, providing the required arguments, such as model name, temperature, etc.

Function descriptions

fragment_text_from_pdf It is used for reading and separating the PDF content and cleaning it a bit by determining the cracks of the line, which cause the word to fall into two pieces, transforming individual new lines into spaces, unless they are part of paragraph spacing, etc. The cleaning process is performed, which is the reason why the loop is used using a function in the vessel. Finally, from this function, the cleared of separate content is returned along with its pagen is returned as a form of the list of shorts. If there is any unwanted error, it can also be handled through A sample except block used; This ensures that the CODE works with an indirect one without breaking due to errors.

text_to_docs is used for a fragment here Reursivecharactertextspitter The Langchain module class is used, each fragment size would be 4,000, and the application is 200. Then the loop is performed by the argument text_with_pages, which will receive an exit from the previous function, i.e. extract_text_from_pdf, Returning the output data on the short format list. Two variables are used in the loop to consider both elements of the short time. Then the purified text is divided into pieces and transformed into a document object, which will be further used to convert to deposition. In addition to the content of the page, the document object will accommodate the page number and the chain label, including the page number as metadata. Each document will then be attached to the list and returned.

create_vectordbb This function uses the above two functions to create prisoning using VectorStore Faiss (Facebook and similarities). It is a light vector store that stores the index locally and easily helps in searching for similarities. This function simply creates and returns the vectors database. That's all.

Source_from_pdf In this function we conduct a search of similarity and receive the 3 best fragments, and if found, we consider only the first part so that it consists of the most similar content and return it along with the page number as a dictionary.

ProtRivet_prompt is a chatPROMPTMMS album consisting of instructions, i.e. a system message for LLM, mentioning your task as a retriever agent. He also considers the whole story of the chat of a specific session and accepts the user's question as a human contribution.

Angentator agent coding

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from typing import Optional

def augment_with_context(content: str, page_num: Optional(int)) -> str:
"""Augment retrieved content with source context."""
if content != "No content retrieved." and page_num:
return f"{content}nnAdditional context: Sourced from page {page_num}."
return f"{content}nnAdditional context: No specific page identified."

AUGMENT_PROMPT = ChatPromptTemplate.from_messages((
("system", """
You are the Augment Agent. Enhance the retrieved content with additional context.
- If content is available, append a note with the single page number.
- If no content is retrieved, return "No augmented content."
"""
),
MessagesPlaceholder(variable_name="chat_history"),
("human", "Retrieved content: {retrieved_content}nPage number: {page_num}"),
))

Explanation of functions

augment_with_context This is a very simple approach in which we are looking for additional information from the provided PDF file to strengthen the downloaded information by the download agent. If it is found, additional content, next to the page number, will be added to the original recovered content; Otherwise, if both are not found, it will simply return the same original content without any modifications

Augment_prompt Again, it is very simple, simply LLM information is searching for information that will strengthen the content downloaded by the download agent, which is also considered chat_history, and the variables downloaded_Conent and page_num will be filled by LLM automatically during performance.

Generatoragent coding

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

GENERATE_PROMPT = ChatPromptTemplate.from_messages((
("system", """
You are the Generate Agent. Create a detailed response based on the augmented content.
- Focus on DBMS and SQL content.
- Append "Source: Page X" at the end if a page number is available.
- If the user query consists of terms like "explain", "simple", "simplify" etc. or relatable, then do not return any page number, otherwise return the proper page number.
- If the question is not DBMS-related, reply "Not applicable."
- Use the chat history to maintain context.
"""
),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{query}nAugmented content: {augmented_content}"),
))

The generator agent consists only of an inkietka disc with instructions for generating a final response based on the recovered content, as well as additional information from previous two steps.

After the creation of all these separate agents, the time has come to store them under one umbrella and create a whole comprehensive flow of work with Langgraph.

Creating the chart using Langgrafh

import streamlit as st
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Optional
import re
from IPython.display import display, Image
from retriever import (LLM,extract_text_from_pdf,text_to_docs,create_vectordb,retrieve_from_pdf,RETRIEVE_PROMPT)
from augmentation import augment_with_context,AUGMENT_PROMPT
from generation import GENERATE_PROMPT
from dotenv import load_dotenv

load_dotenv()

PDF_FILE_PATH = "dbms_notes.pdf"

# Define the Agent State
class AgentState(TypedDict):
query: str
chat_history: List(dict)
retrieved_content: Optional(str)
page_num: Optional(int) # Single page number instead of a list
augmented_content: Optional(str)
response: Optional(str)

def format_for_display(text):
def replace_latex(match):
latex_expr = match.group(1)
return f"$${latex_expr}$$" # Use $$ for Streamlit Markdown to render LaTeX
text = re.sub(r'\frac{((^})+)}{((^})+)}', r'$\frac{1}{2}$', text)
return text

# Define Multi-Agent Nodes
def retrieve_agent(state: AgentState) -> AgentState:
chain = RETRIEVE_PROMPT | LLM
retrieved = retrieve_from_pdf(state("query"), st.session_state.vectordb)
response = chain.invoke({"query": state("query"), "chat_history": state("chat_history")})
#print(retrieved)
return {
"retrieved_content": retrieved('content'),
"page_num": retrieved("page_num")
}

def augment_agent(state: AgentState) -> AgentState:
chain = AUGMENT_PROMPT | LLM
if state("retrieved_content") and state("retrieved_content") != "No content retrieved.":
# Prepare input for the LLM
input_data = {
"retrieved_content": state("retrieved_content"),
"page_num": str(state("page_num")) if state("page_num") else "None",
"chat_history": state("chat_history")
}
# Invoke the LLM to generate augmented content
response = chain.invoke(input_data)
augmented_content = response.content # Use the LLM's output
else:
augmented_content = "No augmented content."
return {"augmented_content": augmented_content}

def generate_agent(state: AgentState) -> AgentState:
chain = GENERATE_PROMPT | LLM
response = chain.invoke({
"query": state("query"),
"augmented_content": state("augmented_content") or "No augmented content.",
"chat_history": state("chat_history")
})

return {"response": response.content}

# Define Conditional Edge Logic
def decide_augmentation(state: AgentState) -> str:
if state("retrieved_content") and state("retrieved_content") != "No content retrieved.":
return "augmentation"
return "generation"

workflow = StateGraph(AgentState)
workflow.add_node("retrieve_agent", retrieve_agent)
workflow.add_node("augment_agent", augment_agent)
workflow.add_node("generate_agent", generate_agent)

workflow.set_entry_point("retrieve_agent")
workflow.add_conditional_edges(
"retrieve_agent",
decide_augmentation,
{
"augmentation": "augment_agent",
"generation": "generate_agent"
}
)
workflow.add_edge("augment_agent", "generate_agent")
workflow.add_edge("generate_agent", END)

agent = workflow.compile()

# display(Image(agent.get_graph().draw_mermaid_png(output_file_path="tutor_agent.png")))

st.set_page_config(page_title="🤖 RAGent", layout="wide")
st.title("🤖 RAGent : Your Personal Teaching Assistant")
st.markdown("Ask any question from your book and get detailed answers with a single source page!")

# Initialize session state for vector database
if "vectordb" not in st.session_state:
with st.spinner("Loading PDF content... This may take a minute."):
try:
st.session_state.vectordb = create_vectordb(PDF_FILE_PATH)
except Exception as e:
st.error(f"Failed to load PDF: {e}")
st.stop()

# Initialize chat history in session state
if "messages" not in st.session_state:
st.session_state.messages = ()

# Display chat history
for message in st.session_state.messages:
with st.chat_message(message("role")):
st.markdown(message("content"))

# User input
user_input = st.chat_input("Ask anything from the PDF")

if user_input:
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(user_input)

# Display assistant response
with st.chat_message("assistant"):
message_placeholder = st.empty()

# Prepare chat history for the agent
chat_history = (
{"type": "human", "content": msg("content")} if msg("role") == "user" else
{"type": "ai", "content": msg("content")}
for msg in st.session_state.messages(:-1) # Exclude current input
)

# Prepare initial state
initial_state = {
"query": user_input,
"chat_history": chat_history,
"retrieved_content": None,
"page_num": None,
"augmented_content": None,
"response": None, # Add field for Ragas sample
}

# Run the agent with a spinner
with st.spinner("Processing..."):
final_state = agent.invoke(initial_state)
answer = final_state("response")
formatted_answer = format_for_display(answer)

# Display response
message_placeholder.markdown(formatted_answer)

# Update chat history
st.session_state.messages.append({
"role": "assistant",
"content": formatted_answer
})

Code explanation

Agentstate class – In this class, we define a pattern that will be enforced on an ongoing basis with the LLM reaction, and the whole “condition” will have the same structure in the entire flow of work. This will be transferred as an argument when creating a state.

Format_For_display function – This function has a nested function that will be used to handle output -based outputs. We use this because the document may contain fractions that may not be served correctly, so using it as an additional precaution.

Recovery function_agent – This will use the recovery function_from_pdf, which we have defined earlier. First of all, we will create a chain using the recovery and LLM hints. Then recall it using the user's inquiry, which is nothing but a user question, and also consider the entire chat_history, and finally refund the content and page number.

AUGMENT_AGENT – Here, we will create a chain again using augment_prompt, this time and check whether Agent Retriever has returned any content or not. If he returned any content, we will call the augment_with_context and provide the downloaded content, page number, as well as chat_history, and then return the content given by the answer.

Generate function_agent – Finally, we provide extended content, user query and chat history so that LLM can use extended content and generate the final response based on extended information and display them to the user.

Decisive function_augmentation – This is an optional step to check whether an enlargement agent is necessary or not.

After creating all the necessary agents, it was time to combine them to create a comprehensive flow of work that will be done using the Langraph class classigraphh class. During the initialization of the classigraphh we will undergo the Agentstate class, which we have previously defined as its parameter to indicate that during the entire flow of work they are the only keys that will be there in response, nothing more. Then add knots to the state to create the entire flow of work, manually configuring the entrance point to understand which knot will be made first, adding the edges between the nodes to determine what the work flow will look like, adding the conditional edge between the fact that the node is associated with the conditional edge, may or may not be called during work.

Finally, compiling the entire flow of work to see if everything works well and the chart that has been created is appropriate or not. We can display the chart using the iPithhon module and Mermaid ink methods. The chart will look below if everything goes correctly.

Source: photo by the author

Then the rest of the code is completely improved. The user can design the user interface according to their choice. We have adopted a very basic approach to designing the user interface so that it remains user -friendly. We also consider some session states, so in order to maintain the history of chat, user inquiries, etc. This will not start without the user's entry, which means that until the user provides any query, the work flow will start.

Application screenshots in the work –

Source: photo by the author
Source: photo by the author

This article was written in cooperation with Biswajit Das

Published via AI

LEAVE A REPLY

Please enter your comment!
Please enter your name here