LlamaIndex

LlamaIndex is the Python framework for building agents and data-backed LLM apps. Its agents run on tools: plain Python functions wrapped as FunctionTool and attached to a FunctionAgent. Korely plugs in as the memory layer, define three thin tools that call the korely-memory Python SDK, register them on the agent, and the model decides when to recall, save, or look something up. The agent itself stays stateless; the memory lives in Korely.

What you get

The differentiator is the recall tool. Most "memory" layers hand the model raw text chunks to re-read every turn. Korely keeps typed bi-temporal facts, the (subject, predicate, object) triples it extracts from everything you save, kept current through contradiction resolution. So the recall tool does not return rows; get_context assembles the user's active facts into a compact, prompt-ready block ("Luca upgraded to Pro", "prefers async standups") with the superseded ones already excluded. The model gets settled knowledge, not a pile of snippets.

Install

You need the Korely SDK, LlamaIndex core, and one LLM integration (each LLM is a separate package, here OpenAI):

pip install korely-memory llama-index-core llama-index-llms-openai

You also need a Korely API key. Copy it from Settings → API Keys in the Korely app and export it (alongside your LLM provider key):

export KORELY_API_KEY="kor_live_..."
export OPENAI_API_KEY="sk-..."

Define the memory tools

A LlamaIndex tool is just a typed Python function plus a docstring. The function name becomes the tool name and the docstring becomes the description the model reads to decide when to call it, so write them for the model. Wrap each with FunctionTool.from_defaults(fn=...). We give the agent three: recall (the moat), save, and a targeted search.

import os
from korely_memory import Korely
from llama_index.core.tools import FunctionTool

korely = Korely(api_key=os.environ["KORELY_API_KEY"], region="eu")


def recall_memory(query: str) -> str:
    """Recall settled facts and relevant memories about the user.

    Reach for this FIRST, before answering anything about the user.
    Returns an assembled, prompt-ready block of the user's active
    bi-temporal facts, not raw rows.
    """
    ctx = korely.get_context(query=query, token_budget=800)
    return ctx.context


def save_memory(content: str) -> str:
    """Save something worth remembering about the user.

    Pass a single, self-contained statement. Korely extracts the
    typed facts from it; you do not write the facts yourself.
    """
    memory = korely.add(content)
    return f"Saved memory {memory.id}."


def search_memory(query: str) -> str:
    """Find the exact memory that mentioned something specific.

    Use this when you need the original wording, not settled facts.
    Pass a keyword-style query of 1 to 5 words.
    """
    hits = korely.search(query, limit=5)
    if not hits:
        return "No matching memories."
    return "\n".join(f"- ({h.score:.2f}) {h.snippet}" for h in hits)


recall_tool = FunctionTool.from_defaults(fn=recall_memory)
save_tool = FunctionTool.from_defaults(fn=save_memory)
search_tool = FunctionTool.from_defaults(fn=search_memory)

Three tools, three jobs. recall_memory is the one you want the model to reach for by default, it returns the user's settled facts already assembled. search_memory is for "find the exact note that mentioned X", raw semantic search over memory snippets. save_memory writes back. The read tools cost effectively nothing (no generative model runs on Korely's read path; your LLM does the reasoning), which is why read quotas are an order of magnitude more generous than write quotas.

Attach the tools to an agent

Register the tools on a FunctionAgent (for LLMs with a native tool-calling API) and run it. .run() is async, so it is awaited inside an async function. The model calls recall_memory on its own when the system prompt nudges it to.

import asyncio
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

from memory_tools import recall_tool, save_tool, search_tool

agent = FunctionAgent(
    tools=[recall_tool, save_tool, search_tool],
    llm=OpenAI(model="gpt-4o-mini"),
    system_prompt=(
        "You are a helpful assistant with long-term memory. "
        "Recall memory before answering questions about the user, "
        "and save any durable new fact the user tells you."
    ),
)


async def main():
    # First turn: the user states a fact worth keeping.
    r1 = await agent.run(
        user_msg="I just upgraded to the Pro plan, and I prefer async standups."
    )
    print(r1)

    # Later turn (even a fresh process): the agent recalls it.
    r2 = await agent.run(user_msg="Which plan am I on, and how do I like standups?")
    print(r2)


asyncio.run(main())

On the first turn the model calls save_memory; Korely extracts the typed facts. On the second turn it calls recall_memory and get_context returns the assembled block, so the model answers "Pro plan, async standups" without you threading any state between calls.

Tools also accept plain functions. The tools=[...] argument takes FunctionTool instances or bare Python functions (LlamaIndex auto-wraps them). So tools=[recall_memory, save_memory, search_memory] works too, wrapping with from_defaults is what you reach for when you want to override the name, description, or pass an async_fn.

ReAct agents and multi-agent workflows

FunctionAgent needs an LLM with a tool-calling API. For any other model, swap in ReActAgent, same tools, same .run(), it just drives tool use through ReAct prompting:

from llama_index.core.agent.workflow import ReActAgent
from llama_index.llms.openai import OpenAI

agent = ReActAgent(
    tools=[recall_tool, save_tool, search_tool],
    llm=OpenAI(model="gpt-4o-mini"),
    system_prompt="Recall memory before answering about the user.",
)
# response = await agent.run(user_msg="What do you remember about me?")

To orchestrate several agents, wrap them in an AgentWorkflow, the memory tools attach to whichever agent should own recall and persistence:

from llama_index.core.agent.workflow import AgentWorkflow

workflow = AgentWorkflow(agents=[agent], root_agent=agent.name)
# response = await workflow.run(user_msg="Catch me up on what you know.")

Fact extraction is asynchronous. add() returns as soon as the memory is stored, but the typed facts are extracted a few seconds later in the background. So a recall_memory call fired immediately after a save_memory in the same turn may not yet see the new facts. Within a normal conversation this is invisible, by the next user message the facts are live. Do not chain "save then recall the same thing" inside one .run() and expect the fact already assembled.

FunctionAgent is stateless across runs. Separate .run() calls do not share conversation history by default, which is exactly why Korely sits underneath. The chat turns can be ephemeral; the durable knowledge lives in Korely and comes back through recall_memory on the next turn, the next session, or a fresh process.

Where to go next

Memory as a tool, the full pattern for exposing recall, save and search to an agent, with prompt nudges and scoping by user_id.
Get context, how the recall tool assembles active facts into a prompt-ready block, with the token_budget and source controls.

Something not working? Email [email protected] with your llama-index-core and korely-memory versions and the error output. We read every message.