The memory model

Agent memory in Korely is a three-layer, EU-hosted knowledge store. The layer that sets it apart is the bottom one: a durable set of typed (subject, predicate, object) facts that carry validity over time, supersede each other through a two-stage contradiction check, and can be replayed at any past instant with as_of. On top of those facts sits a managed cloud store of memories (notes, documents, transcripts, tasks) with semantic vector recall, and a session-scoped view for thread-local working state. You write plain text; extraction, embedding, entity linking, and contradiction detection all happen automatically behind a single add() call. Your agent's primary recall path, get_context(), assembles the active facts plus the most relevant memories into a prompt-ready block — structured, current knowledge on demand, with no generative model on the read path. For an agent builder this means: never prompt-stuff stale context, never manually reconcile conflicting facts, and never build a separate memory layer from scratch.

Korely stores memory in three layers. One stack, shared between the human, who reads and edits it in the desktop app, and every agent that writes and queries it over the REST API or SDK. We did not build a separate "agent memory" silo next to the notes. The agent's memory is the user's knowledge base, plus two layers on top of it.

flowchart LR
    A["Layer 1 — Memory store<br/>add / search / get_context<br/>scoped by user_id"]
    B["Layer 2 — Session<br/>run_id tag<br/>thread-local scope"]
    C["Layer 3 — Facts<br/>typed triples<br/>valid_from · invalid_at · as_of"]

    A -->|"mined at write time"| C
    A -->|"run_id scopes a slice"| B
    B -->|"durable conclusions promoted"| C

Three-layer memory model: managed cloud store, session scoping, and typed bi-temporal facts

Layer	What it holds	Scope	Lifetime
1 — Memory store	Notes, documents, transcripts, tasks	User	Forever (it is the user's data)
2 — Session memory	Working state of one conversation	Session	The thread
3 — Cross-session facts	Typed (subject, predicate, object) triples	User	Bi-temporal: superseded, never silently lost

Layer 1 — The memory store

Layer 1 is a managed cloud store in the EU — Postgres with a vector index. Every memory your agent writes lands here, scoped to your account and an end-user namespace. You read and write it over the REST API or the SDK; there is nothing to host yourself, and you can export or erase everything for a user with one call.

For agents, recall over the memory store is semantic vector search:

768-dimension vectors. Every memory is embedded as a 768-dimension vector, so search() finds semantically similar content by cosine distance even when no keyword overlaps. The memory's embedding is computed once at write time; the query is embedded at read time.
Entity graph (behind the facts). Typed entities (people, organizations, products, places, concepts) are extracted from every memory on our own infrastructure and linked with typed edges using canonical predicates in 9 families. The graph is what powers the typed facts in Layer 3 — it is not a search mode the agent queries directly. Details in The graph.

Your primary recall path is get_context() / GET /v1/context: it assembles the active typed facts for the user plus the most relevant memories into one prompt-ready block. That fact-first assembly — current values, contradictions already resolved — is the piece most agents are missing; it answers "what do we actually know about this person right now?" without the agent having to guess search keywords. Raw search() / POST /v1/memories/search is the secondary path when you want the underlying memory snippets ranked by semantic similarity.

Layer 2 — Session memory

Session memory is per-conversation working state. When an agent runs a long thread, a research session, a multi-step refactor, a week-long project, it calls add() with a stable run_id to tag memories to that session. A subsequent search() scoped to the same run_id retrieves only those memories, keeping the thread coherent without polluting the broader user namespace.

This is what keeps a thread coherent across context-window resets and client restarts: the agent re-reads its own session memories, picks up the state, and continues. It is scoped to the session by design. Durable conclusions belong in Layer 1 (a memory) or Layer 3 (a fact), not in the session log.

Layer 3 — Cross-session facts

Layer 3 is what most people mean by "agent memory": things the system knows about the user and their world, independent of any one note or conversation. We store them as typed (subject, predicate, object) triples, extracted automatically whenever content is written, and queryable via get_facts() in the SDK or GET /v1/facts in the REST API.

Every predicate belongs to one of 9 families:

Family	Example triple
preferences	(Marco, likes, peach fruit salad)
people	(Sara, reports_to, Marco)
places	(Marco, lives_in, Bologna)
work	(Sara, works_at, Acme GmbH)
ownership	(Marco, owns, a 2019 MacBook Pro)
health	(Marco, allergic_to, peanuts)
financial	(Aurora plan, costs, 50 euro per month)
events	(team offsite, scheduled_for, 2026-07-12)
other	catch-all for valid triples outside the 8 above

Facts are bi-temporal

Every fact carries valid_from and invalid_at. When new information contradicts an existing fact, a two-stage contradiction check invalidates the old one and keeps it as history instead of deleting it. A chain after a price change looks like this:

(Aurora plan, costs, 40 euro per month)   valid_from 2026-05-18   invalid_at 2026-06-07
(Aurora plan, costs, 50 euro per month)   valid_from 2026-06-07   invalid_at —

By default, GET /v1/facts returns only the facts that are valid now (those with invalid_at null). Pass include_invalidated=true and superseded facts come back too, each carrying its invalid_at timestamp and the invalidated_by id of the fact that replaced it, so an agent can reason about how a value changed over time, not just what it is now. The full mechanics are in Temporal facts.

Point-in-time queries: the REST API exposes the same data with as-of slicing (GET /v1/facts?as_of=2026-05-15) and a prompt-ready GET /v1/context block. See the API reference.

The human is in the loop

Layer 3 is not a black box the agent maintains behind the user's back. The end user sees every fact in the Korely app. A Memory Panel lists them, lets them edit a wrong one or forget one entirely (erasure, with an audit cascade), and an Entity Profile drawer shows everything known about one person or company. Forgotten facts are excluded from agent reads by default. Korely is the only memory layer that ships a first-class memory UI for the end user. We think memory an agent collects about a person should be inspectable by that person. More in Human in the loop.

How a write flows

When an agent calls add() (or POST /v1/memories), the pipeline runs asynchronously:

The memory is persisted: stored in the index and visible to the human in the Korely app.
Entity extraction runs on our own infrastructure and wires typed entities into the graph.
Typed fact extraction pulls (subject, predicate, object) triples into Layer 3.
Each new fact passes the two-stage contradiction check; contradicted facts get invalid_at stamped.
The memory is embedded into the 768D vector index.

The write call itself returns once the memory is persisted, with facts empty on the immediate response; extraction then runs behind it and the typed facts populate shortly after. You never block on extraction. The write path is where the intelligence runs: the document embedding, entity extraction, typed-fact extraction with contradiction checking and bi-temporal validity. It costs about a tenth of a cent per memory, all included in your plan.

Here is the same pipeline from the SDK. One add call with raw conversation text in, structured memory out:

korely sdk python

from korely_memory import Korely

korely = Korely(api_key="kor_live_...")

# One write call. Extraction runs asynchronously behind it.
memory = korely.add(
    "Giulia upgraded to the Advanced plan and asked us to stop "
    "calling her. She prefers email follow-ups. She is based in Turin.",
    user_id="customer-giulia-4812",
    agent_id="support-bot",
)

print(memory.id)
mem_8f2c1a

Seconds later, the same raw sentence exists as a searchable memory, three graph entities, and three typed facts. One of the facts retires an older preference via the contradiction check:

Memory    mem_8f2c1a  (embedded for vector search)

Entities  Giulia (person) · Advanced plan (product) · Turin (place)

Facts     (Giulia, has_plan, Advanced)             family: other
          (Giulia, likes, email follow-ups)        family: preferences
            ↳ invalidated (Giulia, likes, phone calls)  2026-04-02 → 2026-06-11
          (Giulia, lives_in, Turin)                family: places

How a read flows

Reads are retrieval, not generation. Most read calls — get_facts(), get_profile(), users(), history() in the SDK, or their REST equivalents — are pure SQL lookups, zero AI calls. get_context() reuses the vectors stored at write time and assembles the user's active typed facts — no new embedding. search() embeds the query — a fraction of a hundredth of a cent — and can run a tiny query-understanding step that is skipped automatically for short keyword queries. Recall over the memory store is semantic vector similarity (cosine); facts and profile reads are deterministic SQL.

No generative model ever composes output on the read path. There is no reranker LLM and no answer synthesis. Your agent's own model is the only LLM in the loop: it gets raw, ranked data and does its own reasoning. That split is why read quotas are an order of magnitude more generous than write quotas; a read costs us a database query, not an inference call. Facts reads are deterministic and typically return in under 50 ms.

The intelligence runs at write time. Fact extraction and contradiction checks need a model; retrieval does not. The split is intentional: writes are rare and asynchronous, reads are frequent and latency-sensitive. Pay the inference cost once on the way in, never on the way out.

Scoping a read: user_id, agent_id

Reads accept scoping parameters. Filters are additive (AND): omit user_id to search the whole agent namespace. The run_id tag on writes lets you isolate session memories when listing or deleting, but search() operates across sessions by design so the relevant context surfaces regardless of which run created it.

Parameter	Scopes to	Example
`user_id`	The end user your agent serves. End users are unlimited on every tier.	`customer-giulia-4812`
`agent_id`	Your application. One workspace runs many agents with no accidental cross-reads.	`support-bot`
`run_id`	One session or agent run — used as a tag on writes and for listing/deleting session memories.	`2026-06-11-session-03`

korely sdk python

# Same client as above. Scope the read to one person and one app.
results = korely.search(
    "contact preferences",
    user_id="customer-giulia-4812",   # one end user
    agent_id="support-bot",             # your application
)

for r in results:
    print(r.score, r.snippet)

0.91  Giulia prefers email follow-ups; phone preference invalidated 2026-06-11
0.84  Upgraded to the Advanced plan; based in Turin

Python and Node.js today. See pricing for plan details.

Which call touches which layer

The REST API and SDK give you one surface for all three layers. Which layer each call touches:

SDK method / REST endpoint	Layer	Direction
`get_context()` / `GET /v1/context`	1 + 3 — Active facts assembled with relevant memories (primary recall)	Read
`search()` / `POST /v1/memories/search`	1 — Memory store (semantic vector)	Read
`get_all()` / `GET /v1/memories`	1 — Memory store	Read
`add()` / `POST /v1/memories`	1 — Memory store (feeds Layer 3 extraction)	Write
`update()`, `delete()` / `PATCH /v1/memories/{id}`	1 — Memory store	Write
`add(run_id=...)` / `GET /v1/memories?run_id=...`	2 — Session memory (scoped view of Layer 1)	Read / Write
`get_facts()` / `GET /v1/facts`	3 — Cross-session facts	Read
`add_fact_triple()` / `POST /v1/facts`	3 — Cross-session facts	Write

Note the asymmetry: facts are extracted automatically from every add() call — the memory is always the traceable source. For explicit fact writes the SDK exposes add_fact_triple() and the REST API exposes POST /v1/facts. See the API reference for the full contract.

EU-hosted: all cloud data lives on our own infrastructure in Helsinki. No data leaves the EU on any tier.

End-to-end example: support bot with persistent memory

The following example shows all three layers interacting in a single support session. The agent writes a conversation turn, then reads back structured context for the next turn, including the typed facts that were extracted automatically from earlier sessions.

korely sdk python

from korely_memory import Korely

korely = Korely(api_key="kor_live_...")

# ── Layer 1 write: the agent stores a conversation turn.
# Extraction runs async: entities + typed facts are mined behind the call.
korely.add(
    "Customer-4812 called about her Developer subscription. "
    "She is based in Milan and wants all follow-ups by email. "
    "She is allergic to peanuts — mentioned it while chatting.",
    user_id="customer-4812",
    agent_id="support-bot",
    run_id="session-2026-06-16",   # Layer 2 tag — scopes this turn to the session
)

# ── Layers 1 + 3 read: fetch a prompt-ready context block for the next turn.
# get_context() assembles the user's active Layer 3 facts plus the most
# relevant Layer 1 memories into one block. No generative call on the read path.
ctx = korely.get_context(
    query="preferred contact channel and account tier",
    user_id="customer-4812",
    agent_id="support-bot",
)
print(ctx)

# ── Layer 3 read: pull structured facts for an explicit check.
# Returns deterministic SQL rows, not a generated summary.
facts = korely.get_facts(
    user_id="customer-4812",
    agent_id="support-bot",
)

for f in facts:
    print(f.subject, "|", f.predicate, "|", f.object, "|", f.predicate_family)

# Example output (f.predicate is normalized; f.predicate_raw keeps the source verb):
customer-4812  | lives_in        | Milan           | places
customer-4812  | likes           | email follow-ups| preferences
customer-4812  | has_plan        | Developer       | other
customer-4812  | allergic_to     | peanuts         | health

The agent wrote one sentence. By the time the next turn calls get_context(), four typed facts are already in Layer 3, "Milan", "Developer plan", and "customer-4812" exist as typed entities in the graph, and the preference fact supersedes any prior contact preference via the contradiction check. get_context() returns those active facts assembled into the prompt block. No post-processing, no prompt engineering to remember what was said three turns ago.

SDK parity: the Node.js SDK exposes the same surface with camelCase method names (getContext, getFacts, addFactTriple, deleteAll, batchStatus). The REST API is at https://api.korely.ai/v1 with Authorization: Bearer kor_live_.... All plans, including hobby (free), share the same three-layer stack.

The memory model

Layer 1 — The memory store

Layer 2 — Session memory

Layer 3 — Cross-session facts

Facts are bi-temporal

The human is in the loop

How a write flows

How a read flows

Scoping a read: user_id, agent_id

Which call touches which layer

End-to-end example: support bot with persistent memory

See also