Korely

LangGraph

LangGraph is where agents stop being demos and become products. Korely is the memory those products run on: typed bi-temporal facts that resolve their own contradictions, point-in-time as_of recall, and semantic vector search over everything the user told you — all behind one endpoint. There are two patterns depending on how you want the memory to surface:

  • Prompt-injected context (the moat path). Call korely.get_context() before the LangGraph run and prepend the result to the system prompt. It assembles the user's active typed facts plus the most relevant memories into one prompt-ready block, so the agent never has to decide whether to look. Best for structured pipelines with deterministic recall.
  • Tool-wrapped SDK calls. Wrap korely.search and korely.add as LangChain tools and let the agent decide when to call them. Best for chat agents where the agent needs to reason about whether a memory lookup is warranted. search here is semantic vector recall over raw memories.

Requirements

Terminal window
pip install korely-memory langgraph "langchain[openai]"

Any chat model works; the examples use openai:gpt-4.1. Swap in anthropic:... or google_genai:... with the matching extra installed. You also need a Korely API key: sign up at korely.ai/agents and copy the kor_live_ key from the dashboard.

Set your API key

Terminal window
export KORELY_API_KEY="kor_live_..."

Tool-wrapped approach

Wrap korely.search and korely.add as LangChain tools and pass them to the agent. The agent calls them when it decides a memory lookup or write is needed:

import asyncio
import os
from langchain.agents import create_agent
from langchain_core.tools import tool
from korely_memory import Korely
korely = Korely() # reads KORELY_API_KEY from the environment
@tool
def recall(query: str) -> str:
"""Search this user's memory before answering."""
hits = korely.search(query, limit=5)
return "\n".join("- " + (h.snippet or "") for h in hits) or "No memories yet."
@tool
def remember(content: str) -> str:
"""Save a durable fact to memory."""
m = korely.add(content, agent_id="assistant")
return "Saved as " + m.id
agent = create_agent(
model="openai:gpt-4.1",
tools=[recall, remember],
system_prompt=(
"You are an assistant with persistent memory. "
"Call recall before answering questions about the user's past. "
"Call remember when the user tells you something durable."
),
)
async def main() -> None:
result = await agent.ainvoke(
{
"messages": [
{
"role": "user",
"content": "What did I decide about the lease renewal?",
}
]
}
)
print(result["messages"][-1].content)
asyncio.run(main())

Why the system prompt matters. A LangGraph agent only has what you give it, so the search-before-answer rule lives in system_prompt. With it, the agent reaches for recall reliably; without it, the model sometimes answers from its own weights.

Example run

What the loop looks like. The agent calls recall, which calls korely.search under the hood, gets retrieval results back, and reasons over them with its own model:

langgraph agent python
$ python agent.py

→ tool call recall(query="lease renewal decision")

← korely.search returns 2 hits:

   - Renewal deadline is July 1. Anna confirmed a 3% increase,
     1,200 to 1,236 EUR per month starting August. She wants the
     signed copy by post.

   - Lease, insurance certificate, meter readings, deposit receipt...

stdout › You decided to renew. The deadline is July 1: on the
May 28 call Anna confirmed a 3% increase to 1,236 EUR per month
starting in August, and she asked for the signed copy by post.

What the agent receives is pure retrieval: scored hits from semantic vector search, each carrying a snippet of the matched memory. No generative model composes output on the read path; your agent's own model does the reasoning. That is also why read quotas are an order of magnitude more generous than write quotas. When you want the resolved facts instead of raw snippets — typed, bi-temporal, contradiction-free — reach for get_context or get_facts.

On create_react_agent: stacks pinned to earlier LangGraph releases use the equivalent factory from langgraph.prebuilt: create_react_agent(model, tools) with the same tools list and the same ainvoke shape. Newer LangChain releases name it create_agent, as in the example above. Both produce a LangGraph graph.

Building a product on Korely memory

The single-user example above writes to one shared namespace, which is exactly right for a personal assistant or an internal ops tool. A product that serves many people needs one more dimension: per-end-user scoping. That is what user_id is for. Every write and every read carries the identifier of the end user your agent is serving, and each end user gets an isolated memory space. End users are unlimited on every plan: quotas count memories and queries, never people.

The pattern in LangGraph is a per-request agent factory. Bind the end user's user_id into the tools at request time:

import os
from langchain.agents import create_agent
from langchain_core.tools import tool
from korely_memory import Korely
korely = Korely() # reads KORELY_API_KEY from the environment; EU-hosted on every plan
def build_agent_for(user_id: str):
@tool
def recall(query: str) -> str:
"""Search everything this user has told us before."""
hits = korely.search(query, user_id=user_id, limit=5)
return "\n".join("- " + hit.snippet for hit in hits) or "No memories yet."
@tool
def remember(content: str) -> str:
"""Save a durable fact about this user."""
memory = korely.add(content, agent_id="support-bot", user_id=user_id)
return "Saved as " + memory.id
return create_agent(
model="openai:gpt-4.1",
tools=[recall, remember],
system_prompt=(
"Call recall before answering anything about this customer. "
"Call remember when they tell you something durable."
),
)
# One agent per request, scoped to the end user in the session
agent = build_agent_for(user_id="customer-4812")

Always pass user_id on reads in multi-tenant products. A search without it spans every end user in the namespace, which is what you want for an internal ops agent and not what you want inside a customer-facing chat. And when a customer asks to be forgotten, korely.delete_all(user_id=...) erases every memory for that user with one call.

If you would rather inject memory into the prompt than expose tools, korely.get_context(query, user_id, token_budget) assembles a prompt-ready context block in one call. The full client surface, add, search, get_facts with point-in-time as_of queries, batch import, is on the Python SDK page, and the wire contract is in the API reference.

Troubleshooting

SymptomFix
401 Unauthorized from api.korely.ai Key missing or revoked. Check that KORELY_API_KEY is set in the environment and starts with kor_live_. You can also pass it explicitly: Korely(api_key="kor_live_...").
ImportError: No module named 'korely_memory' Run pip install korely-memory (note the dash, not underscore). The import name is korely_memory.
Agent answers without calling memory tools Keep the search-before-answer rule in system_prompt. If the model still ignores the tools, simplify: expose only the tools relevant to the current node and keep tool descriptions short and action-oriented.
ImportError on langchain.agents Your stack predates the create_agent factory. Use from langgraph.prebuilt import create_react_agent with the same tools list, or upgrade langchain and langgraph.
korely.get_facts() returns nothing Facts are extracted asynchronously after a write, so a freshly added memory may not have facts yet — give extraction a moment, then re-read. get_facts works on every plan including Hobby; it needs only the memories:read scope.

Something not working? Email [email protected] with your korely-memory version and the traceback. We read every message.