LLM and AI Agent Applications with LangChain and LangGraph – Part 29: Agnostic Model Pattern and LLM API Gateway

Author's): Michałzarnecki

Originally published in Towards Artificial Intelligence.

Hi! In this part we move from experiments and prototyping to the real world – production implementations.

Because the truth is: building a working notebook or proof of concept is just the beginning. The real challenges begin when the application must support hundreds or thousands of users, operate reliably 24/7, and still stay within budget.

Let's start with the first foundation: a model independent approach.

Indifferent to models from day one

Many teams building AI applications quickly lock themselves into a single vendor – just OpenAI or just Anthropic. It's understandable: it's faster to pick one API and focus. But in the long run it is a huge risk. If a provider increases prices, goes out of business, or changes license terms, the entire application may come to a halt.

That's why it's worth thinking about it from the very beginning model-independent gateway layer.

In practice, this means that your code does not communicate directly with one specific model. Instead, it invokes an abstraction:

  • “give me LLM classes on chat” or
  • “give me an embedding generator”

And only the gateway decides whether under the hood it should call GPT-5, Claude 4.5 Sonnet, or a local LLaMA running on its own infrastructure.

API gateway + routing + fallback

The second foundation is API Gateway.

Imagine you share a simple endpoint like POST /v1/chatwhere users send requests. In the header, e.g X-Modelthe customer specifies which model to use.

The gateway can run multiple models in parallel and can also implement fallback logic: if the primary model fails to respond within a certain time, you will automatically switch to a backup model, such as an open source model running locally.

This pattern not only improves reliability, but also opens the door to experimentation.

You can send 1% of your traffic to a new model and see how it performs compared to the previous one without changing the entire system.

Cost monitoring and control

The third foundation – often neglected – is cost monitoring and control.

In a prototype, you just say “it works”. In production you will be asked more difficult questions:

  • How much does it cost per day?
  • What is our hallucination rate?
  • How often do we reject results?

This is where tools like LangSmith help – but even a simple internal logging system can work.

We measure latency (because users don't want to wait 30 seconds), we measure cost, and we measure quality – for example: how many responses were rejected by guardrails or grading.

We can also set very simple but effective alerts:

  • if daily cost exceeds $50 → send notification,
  • if the average response time exceeds 5 seconds → trigger another alert.

This gives you real insight into what is happening inside the system.

These three elements – model independent gateway, API GatewayAND monitoring — are not “nice to have.” These are the foundations. If you take them seriously, your application will not only work in production, but will also remain resilient to changes in the market and technology.

Now let's move on to the code.

Install the libraries and load the environment variables

!pip install -U langchain langchain-openai langgraph fastapi uvicorn
from dotenv import load_dotenv
load_dotenv()

Man in the loop

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command

@tool
def risky_operation(secret: str) -> str:
"""Perform a risky operation that requires manual approval."""
return f"Executed risky operation with: {secret}"

tools = (risky_operation)
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

hitl = HumanInTheLoopMiddleware(
interrupt_on={
"risky_operation": {"allowed_decisions": ("approve", "edit", "reject")}
},
description_prefix="Manual approval required for risky operation:"
)

checkpointer = MemorySaver()
agent = create_agent(
model=model,
tools=tools,
middleware=(hitl),
checkpointer=checkpointer,
debug=True
)

config = {"configurable": {"thread_id": "hitl-demo-1"}}

result = agent.invoke(
{"messages": ({"role": "user", "content": "Please run the risky operation with secret code $%45654@."})},
config=config,
)

Exit:

(values) {'messages': (HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'))}
(updates) {'model': {'messages': (AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}), usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}))}}
(values) {'messages': (HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}), usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}))}
(updates) {'__interrupt__': (Interrupt(value={'action_requests': ({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'description': "Manual approval required for risky operation:nnTool: risky_operationnArgs: {'secret': '$%45654@'}"}), 'review_configs': ({'action_name': 'risky_operation', 'allowed_decisions': ('approve', 'edit', 'reject')})}, id='a3abdfe342bd7c8be8b1b586ee9f8815'),)}

interrupt handling:

if "__interrupt__" in result:
print("Interrupt detected!")
decisions = ({"type": "approve"})

result = agent.invoke(
Command(resume={"decisions": decisions}),
config=config,
)

Exit:

(values) {'messages': (HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}), usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='Executed risky operation with: $%45654@', name='risky_operation', id='13109032-38fb-4d94-920c-90026acc41f3', tool_call_id='call_dK786IhVaO3Z4VssPOI1cM6y'))}

Model an agnostic API gateway

To run the sample code below with a model-agnostic API gateway:
1. Place the above code in the app.py file


# Place the above code in a file app.py

from fastapi import FastAPI, Header
from pydantic import BaseModel
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import AIMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

from langchain_openai import ChatOpenAI

class ChatRequest(BaseModel):
message: str

class ChatResponse(BaseModel):
provider: str
model: str
answer: str

prompt = ChatPromptTemplate.from_messages((
("system", "You are a helpful assistant."),
("human", "{message}")
))

def build_model(x_model: str):
"""
x_model format:
- 'openai:gpt-4o-mini'
"""

if ":" in x_model:
provider, model_name = x_model.split(":", 1)
else:
provider, model_name = "openai", x_model

provider = provider.lower().strip()

if provider == "openai":
return provider, model_name, ChatOpenAI(model=model_name, temperature=0)

# if provider == "anthropic": # support for another LLM API provider
# return provider, model_name, ChatAnthropic(model=model_name, temperature=0)

def _unknown(inputs: dict):
return AIMessage(content=f"(unknown provider) Echo: {inputs.get('message','')}")
return "unknown", x_model, RunnableLambda(_unknown)

app = FastAPI(title="Model-Agnostic LangChain Gateway")

@app.post("/chat", response_model=ChatResponse)
def chat_endpoint(
req: ChatRequest,
x_model: str = Header(default="openai:gpt-4o-mini", alias="X-Model"),
):
provider, model_name, model = build_model(x_model)
chain = prompt | model | StrOutputParser()
answer: str = chain.invoke({"message": req.message})
return ChatResponse(provider=provider, model=model_name, answer=answer)

2. Start the server:

uvicorn app:app - reload

3. Send inquiry:

curl -X POST 'http://127.0.0.1:8000/chat' 
-H 'Content-Type: application/json'
-H 'X-Model: openai:gpt-5-mini'
-d '{"message":"Podaj 3 zalety Pythona."}'

curl -X POST 'http://127.0.0.1:8000/chat'
-H 'Content-Type: application/json'
-H 'X-Model: openai:gpt-4o-mini'
-d '{"message":"Podaj 3 zalety Pythona."}'

The future of GenAI

This brings us to the second part of this episode: the future of GenAI.

What will this industry look like in the next few years? No one has a crystal ball – but some trends are already very clear.

Trend No. 1: Multimodality

Models such as GPT-5 or Claude 4.5 can already analyze images, audio and video. This will soon be standard.

When you build apps, you need to assume that users won't just send text. They will send screenshots, photos of documents, audio recordings. Your architecture must be ready for this.

Trend #2: Agentic workflows

Classic APIs and linear workflows are not enough when the process is complex and dynamic.

Instead of encoding the conditions in traditional code, we will declare agent state charts: Researcher, Critic, Expert – and let the system iterate based on status and quality signals.

With these trends in mind, we can prepare our applications for the next generation of even more efficient AI models.

That's all for this chapter on Model Agnostic Pattern, LLM API Gateway, and Future AI Trends.

See next chapter

See previous chapter

see the full code from this GitHub article warehouse

Published via Towards AI

LEAVE A REPLY

Please enter your comment!
Please enter your name here