LlamaIndex
LlamaIndex is
the Python framework for building agents and data-backed LLM apps. Its
agents run on tools: plain Python functions wrapped as
FunctionTool and attached to a FunctionAgent.
Korely plugs in as the memory layer, define three thin
tools that call the korely-memory Python SDK, register them on
the agent, and the model decides when to recall, save, or look something up.
The agent itself stays stateless; the memory lives in Korely.
What you get
The differentiator is the recall tool. Most "memory" layers hand the model
raw text chunks to re-read every turn. Korely keeps
typed bi-temporal facts, the
(subject, predicate, object) triples it extracts from
everything you save, kept current through contradiction resolution. So the
recall tool does not return rows; get_context assembles the
user's active facts into a compact, prompt-ready block ("Luca
upgraded to Pro", "prefers async standups") with the superseded ones
already excluded. The model gets settled knowledge, not a pile of snippets.
Install
You need the Korely SDK, LlamaIndex core, and one LLM integration (each LLM is a separate package, here OpenAI):
pip install korely-memory llama-index-core llama-index-llms-openaiYou also need a Korely API key. Copy it from Settings → API Keys in the Korely app and export it (alongside your LLM provider key):
export KORELY_API_KEY="kor_live_..."export OPENAI_API_KEY="sk-..."Define the memory tools
A LlamaIndex tool is just a typed Python function plus a docstring. The
function name becomes the tool name and the docstring becomes the
description the model reads to decide when to call it, so write them for
the model. Wrap each with FunctionTool.from_defaults(fn=...).
We give the agent three: recall (the moat),
save, and a targeted search.
import osfrom korely_memory import Korelyfrom llama_index.core.tools import FunctionTool
korely = Korely(api_key=os.environ["KORELY_API_KEY"], region="eu")
def recall_memory(query: str) -> str: """Recall settled facts and relevant memories about the user.
Reach for this FIRST, before answering anything about the user. Returns an assembled, prompt-ready block of the user's active bi-temporal facts, not raw rows. """ ctx = korely.get_context(query=query, token_budget=800) return ctx.context
def save_memory(content: str) -> str: """Save something worth remembering about the user.
Pass a single, self-contained statement. Korely extracts the typed facts from it; you do not write the facts yourself. """ memory = korely.add(content) return f"Saved memory {memory.id}."
def search_memory(query: str) -> str: """Find the exact memory that mentioned something specific.
Use this when you need the original wording, not settled facts. Pass a keyword-style query of 1 to 5 words. """ hits = korely.search(query, limit=5) if not hits: return "No matching memories." return "\n".join(f"- ({h.score:.2f}) {h.snippet}" for h in hits)
recall_tool = FunctionTool.from_defaults(fn=recall_memory)save_tool = FunctionTool.from_defaults(fn=save_memory)search_tool = FunctionTool.from_defaults(fn=search_memory) Three tools, three jobs. recall_memory is
the one you want the model to reach for by default, it returns the
user's settled facts already assembled. search_memory is for
"find the exact note that mentioned X", raw semantic search over memory
snippets. save_memory writes back. The read tools cost
effectively nothing (no generative model runs on Korely's read path; your
LLM does the reasoning), which is why read quotas are an order of
magnitude more generous than write quotas.
Attach the tools to an agent
Register the tools on a FunctionAgent (for LLMs with a native
tool-calling API) and run it. .run() is async, so it is
awaited inside an async function. The model calls
recall_memory on its own when the system prompt nudges it to.
import asynciofrom llama_index.core.agent.workflow import FunctionAgentfrom llama_index.llms.openai import OpenAI
from memory_tools import recall_tool, save_tool, search_tool
agent = FunctionAgent( tools=[recall_tool, save_tool, search_tool], llm=OpenAI(model="gpt-4o-mini"), system_prompt=( "You are a helpful assistant with long-term memory. " "Recall memory before answering questions about the user, " "and save any durable new fact the user tells you." ),)
async def main(): # First turn: the user states a fact worth keeping. r1 = await agent.run( user_msg="I just upgraded to the Pro plan, and I prefer async standups." ) print(r1)
# Later turn (even a fresh process): the agent recalls it. r2 = await agent.run(user_msg="Which plan am I on, and how do I like standups?") print(r2)
asyncio.run(main())
On the first turn the model calls save_memory; Korely extracts
the typed facts. On the second turn it calls recall_memory and
get_context returns the assembled block, so the model answers
"Pro plan, async standups" without you threading any state between calls.
Tools also accept plain functions. The
tools=[...] argument takes FunctionTool
instances or bare Python functions (LlamaIndex auto-wraps them).
So tools=[recall_memory, save_memory, search_memory] works
too, wrapping with from_defaults is what you reach for when
you want to override the name, description, or pass an
async_fn.
ReAct agents and multi-agent workflows
FunctionAgent needs an LLM with a tool-calling API. For any
other model, swap in ReActAgent, same tools, same
.run(), it just drives tool use through ReAct prompting:
from llama_index.core.agent.workflow import ReActAgentfrom llama_index.llms.openai import OpenAI
agent = ReActAgent( tools=[recall_tool, save_tool, search_tool], llm=OpenAI(model="gpt-4o-mini"), system_prompt="Recall memory before answering about the user.",)# response = await agent.run(user_msg="What do you remember about me?")
To orchestrate several agents, wrap them in an
AgentWorkflow, the memory tools attach to whichever agent
should own recall and persistence:
from llama_index.core.agent.workflow import AgentWorkflow
workflow = AgentWorkflow(agents=[agent], root_agent=agent.name)# response = await workflow.run(user_msg="Catch me up on what you know.") Fact extraction is asynchronous. add()
returns as soon as the memory is stored, but the typed facts are
extracted a few seconds later in the background. So a
recall_memory call fired immediately after a
save_memory in the same turn may not yet see the new
facts. Within a normal conversation this is invisible, by the next user
message the facts are live. Do not chain "save then recall the same thing"
inside one .run() and expect the fact already assembled.
FunctionAgent is stateless across runs. Separate
.run() calls do not share conversation history by default,
which is exactly why Korely sits underneath. The chat turns can be
ephemeral; the durable knowledge lives in Korely and comes back through
recall_memory on the next turn, the next session, or a fresh
process.
Where to go next
- Memory as a tool,
the full pattern for exposing recall, save and search to an agent, with
prompt nudges and scoping by
user_id. - Get context, how the
recall tool assembles active facts into a prompt-ready block, with the
token_budgetand source controls.
Something not working? Email
[email protected] with your
llama-index-core and korely-memory versions and
the error output. We read every message.