Korely

Memory as a function-calling tool

There are two ways to wire memory into an agent. You can orchestrate it yourself, recall before the turn, write after, like the other cookbooks do. Or you can hand the memory operations to the model as tools and let it decide when to remember and when to recall. This cookbook does the second. It works with OpenAI function calling, Anthropic tool use, or any framework with a tool loop.

The snippets use the Python SDK (pip install korely-memory) and OpenAI. The dispatch is the same over the REST API or the Node SDK.

Three tools

Expose three: save_memory (write), recall_memory (an assembled context block, ideal as a tool result), and search_memory (ranked hits when the model wants raw matches).

from korely_memory import Korely
korely = Korely(api_key="kor_live_...", region="eu")
TOOLS = [
{
"type": "function",
"function": {
"name": "save_memory",
"description": "Save a durable fact, preference, or decision the user shared, so it is remembered in future sessions.",
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "The thing to remember, in plain language."}
},
"required": ["content"],
},
},
},
{
"type": "function",
"function": {
"name": "recall_memory",
"description": "Recall what is known about the user relevant to a query. Returns an assembled context block (active facts + relevant memories). Call this before answering anything personal.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "What you want to recall about the user."}
},
"required": ["query"],
},
},
},
{
"type": "function",
"function": {
"name": "search_memory",
"description": "Search the user's past memories and get back ranked snippets.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer", "default": 5},
},
"required": ["query"],
},
},
},
]

The dispatch

When the model calls a tool, route it to Korely and hand the result back. Every call is scoped to one user_id, so the same tool set serves every end user safely, one customer's memory can never surface for another.

import json
def dispatch(name: str, args: dict, user_id: str) -> dict:
if name == "save_memory":
mem = korely.add(args["content"], user_id=user_id)
return {"saved": True, "memory_id": mem.id}
if name == "recall_memory":
ctx = korely.get_context(query=args["query"], user_id=user_id)
return {"context": ctx.context} # a ready block, not raw rows
if name == "search_memory":
hits = korely.search(args["query"], user_id=user_id, limit=args.get("limit", 5))
return {"results": [{"id": h.id, "snippet": h.snippet, "score": h.score} for h in hits]}
return {"error": f"unknown tool {name}"}

The loop

Standard OpenAI tool loop: send the tools, run any tool calls the model makes through dispatch, feed the results back, let it answer.

from openai import OpenAI
oai = OpenAI()
USER = "user-camila-22"
messages = [
{"role": "system", "content": (
"You are an assistant with long-term memory. Call save_memory when the "
"user shares a durable fact or preference. Call recall_memory before "
"answering anything personal."
)},
{"role": "user", "content": "Remember I'm vegetarian and allergic to peanuts."},
]
resp = oai.chat.completions.create(model="gpt-4o", messages=messages, tools=TOOLS)
msg = resp.choices[0].message
# Run whatever tools the model decided to call.
for call in (msg.tool_calls or []):
result = dispatch(call.function.name, json.loads(call.function.arguments), USER)
messages.append(msg)
messages.append({
"role": "tool", "tool_call_id": call.id, "content": json.dumps(result),
})
# -> the model called save_memory("vegetarian; allergic to peanuts").
# A week later, "what can I cook tonight?" makes it call recall_memory first,
# and Korely hands back the vegetarian + peanut-allergy facts as a context block.

Why Korely fits the tool pattern

  • recall_memory returns an answer, not homework. get_context hands back an assembled block, active facts plus relevant memories, fitted to a budget. The model drops it straight into its reasoning instead of post-processing a list of rows.
  • Typed facts mean resolved truth. If the user said "I eat meat" last year and "I'm vegetarian" today, the model doesn't see both and guess. The contradiction is resolved server-side; recall returns what's true now.
  • Reads are retrieval, not generation. No model runs on the read path, so recall is cheap. Let the agent call it liberally, that's why read quotas are an order of magnitude more generous than writes.
  • One tool set, every user. Scope each call by user_id and the same three tools serve all your end users, unlimited on every tier.

Anthropic tool use

The same three tools, in Anthropic's schema (input_schema instead of parameters); dispatch is unchanged. When the model returns a tool_use block, run it through dispatch and reply with a tool_result.

tools = [
{
"name": "recall_memory",
"description": "Recall what is known about the user relevant to a query.",
"input_schema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
# ... save_memory, search_memory in the same shape
]

Already speak MCP? If your client is MCP-native (Claude Desktop, Cursor, a custom GPT), you don't hand-build these tools at all, point it at Korely's MCP server, which exposes the same operations as ready-made tools.

One honest caveat: save_memory (which calls korely.add) extracts facts asynchronously, they land a few seconds after the call returns. So a fact the model saves mid-turn is reliably recallable on the next turn, not the same instant. Reads are immediate.

Where to go next

Get context is the call behind recall_memory; Korely MCP is the no-code version of this for MCP clients; the multi-session research cookbook orchestrates the same operations yourself instead of letting the model drive.