LangGraph
LangGraph is where agents stop being demos and become products. Korely is
the memory those products run on: typed bi-temporal facts that resolve
their own contradictions, point-in-time as_of recall, and
semantic vector search over everything the user told you — all behind one
endpoint. There are two patterns depending on how you want the memory to
surface:
- Prompt-injected context (the moat path). Call
korely.get_context()before the LangGraph run and prepend the result to the system prompt. It assembles the user's active typed facts plus the most relevant memories into one prompt-ready block, so the agent never has to decide whether to look. Best for structured pipelines with deterministic recall. - Tool-wrapped SDK calls. Wrap
korely.searchandkorely.addas LangChain tools and let the agent decide when to call them. Best for chat agents where the agent needs to reason about whether a memory lookup is warranted.searchhere is semantic vector recall over raw memories.
Requirements
pip install korely-memory langgraph "langchain[openai]"
Any chat model works; the examples use openai:gpt-4.1. Swap
in anthropic:... or google_genai:... with the
matching extra installed. You also need a Korely API key: sign up at
korely.ai/agents and copy the kor_live_
key from the dashboard.
Set your API key
export KORELY_API_KEY="kor_live_..."Tool-wrapped approach
Wrap korely.search and korely.add as LangChain
tools and pass them to the agent. The agent calls them when it decides a
memory lookup or write is needed:
import asyncioimport os
from langchain.agents import create_agentfrom langchain_core.tools import toolfrom korely_memory import Korely
korely = Korely() # reads KORELY_API_KEY from the environment
@tooldef recall(query: str) -> str: """Search this user's memory before answering.""" hits = korely.search(query, limit=5) return "\n".join("- " + (h.snippet or "") for h in hits) or "No memories yet."
@tooldef remember(content: str) -> str: """Save a durable fact to memory.""" m = korely.add(content, agent_id="assistant") return "Saved as " + m.id
agent = create_agent( model="openai:gpt-4.1", tools=[recall, remember], system_prompt=( "You are an assistant with persistent memory. " "Call recall before answering questions about the user's past. " "Call remember when the user tells you something durable." ),)
async def main() -> None: result = await agent.ainvoke( { "messages": [ { "role": "user", "content": "What did I decide about the lease renewal?", } ] } ) print(result["messages"][-1].content)
asyncio.run(main()) Why the system prompt matters. A LangGraph agent only
has what you give it, so the search-before-answer rule lives in
system_prompt. With it, the agent reaches for
recall reliably; without it, the model sometimes answers
from its own weights.
Example run
What the loop looks like. The agent calls recall, which
calls korely.search under the hood, gets retrieval results
back, and reasons over them with its own model:
$ python agent.py
→ tool call recall(query="lease renewal decision")
← korely.search returns 2 hits:
- Renewal deadline is July 1. Anna confirmed a 3% increase,
1,200 to 1,236 EUR per month starting August. She wants the
signed copy by post.
- Lease, insurance certificate, meter readings, deposit receipt...
stdout › You decided to renew. The deadline is July 1: on the
May 28 call Anna confirmed a 3% increase to 1,236 EUR per month
starting in August, and she asked for the signed copy by post.
What the agent receives is pure retrieval: scored hits from semantic vector
search, each carrying a snippet of the matched memory. No generative model
composes output on the read path; your agent's own model does the
reasoning. That is also why read quotas are an order of magnitude more
generous than write quotas. When you want the resolved facts instead of raw
snippets — typed, bi-temporal, contradiction-free — reach for
get_context or get_facts.
On create_react_agent: stacks pinned to
earlier LangGraph releases use the equivalent factory from
langgraph.prebuilt:
create_react_agent(model, tools) with the same tools list
and the same ainvoke shape. Newer LangChain releases name
it create_agent, as in the example above. Both produce a
LangGraph graph.
Building a product on Korely memory
The single-user example above writes to one shared namespace, which is
exactly right for a personal assistant or an internal ops tool. A product
that serves many people needs one more dimension:
per-end-user scoping. That is what user_id
is for. Every write and every read carries the identifier of the end user
your agent is serving, and each end user gets an isolated memory space.
End users are unlimited on every plan: quotas count memories and queries,
never people.
The pattern in LangGraph is a per-request agent factory. Bind the end
user's user_id into the tools at request time:
import os
from langchain.agents import create_agentfrom langchain_core.tools import toolfrom korely_memory import Korely
korely = Korely() # reads KORELY_API_KEY from the environment; EU-hosted on every plan
def build_agent_for(user_id: str): @tool def recall(query: str) -> str: """Search everything this user has told us before.""" hits = korely.search(query, user_id=user_id, limit=5) return "\n".join("- " + hit.snippet for hit in hits) or "No memories yet."
@tool def remember(content: str) -> str: """Save a durable fact about this user.""" memory = korely.add(content, agent_id="support-bot", user_id=user_id) return "Saved as " + memory.id
return create_agent( model="openai:gpt-4.1", tools=[recall, remember], system_prompt=( "Call recall before answering anything about this customer. " "Call remember when they tell you something durable." ), )
# One agent per request, scoped to the end user in the sessionagent = build_agent_for(user_id="customer-4812") Always pass user_id on reads in multi-tenant
products. A search without it spans every end user in the
namespace, which is what you want for an internal ops agent and not
what you want inside a customer-facing chat. And when a customer asks
to be forgotten, korely.delete_all(user_id=...) erases
every memory for that user with one call.
If you would rather inject memory into the prompt than expose tools,
korely.get_context(query, user_id, token_budget) assembles a
prompt-ready context block in one call. The full client surface,
add, search, get_facts with
point-in-time as_of queries, batch import, is on the
Python SDK page, and the wire
contract is in the API reference.
Troubleshooting
| Symptom | Fix |
|---|---|
401 Unauthorized from api.korely.ai |
Key missing or revoked. Check that KORELY_API_KEY is
set in the environment and starts with kor_live_. You
can also pass it explicitly: Korely(api_key="kor_live_...").
|
ImportError: No module named 'korely_memory' |
Run pip install korely-memory (note the dash, not
underscore). The import name is korely_memory.
|
| Agent answers without calling memory tools |
Keep the search-before-answer rule in system_prompt.
If the model still ignores the tools, simplify: expose only the
tools relevant to the current node and keep tool descriptions short
and action-oriented.
|
ImportError on langchain.agents |
Your stack predates the create_agent factory. Use
from langgraph.prebuilt import create_react_agent with
the same tools list, or upgrade langchain and
langgraph.
|
korely.get_facts() returns nothing |
Facts are extracted asynchronously after a write, so a freshly
added memory may not have facts yet — give extraction a moment, then
re-read. get_facts works on every plan including Hobby;
it needs only the memories:read scope.
|
Something not working? Email
[email protected] with your
korely-memory version and the traceback. We read
every message.