The memory model
Agent memory in Korely is a three-layer, EU-hosted knowledge
store. The layer that sets it apart is the bottom one: a durable
set of typed (subject, predicate, object) facts that
carry validity over time, supersede each other through a two-stage
contradiction check, and can be replayed at any past instant with
as_of. On top of those facts sits a managed cloud store of
memories (notes, documents, transcripts, tasks) with semantic vector
recall, and a session-scoped view for thread-local working state. You
write plain text; extraction, embedding, entity linking, and
contradiction detection all happen automatically behind a single
add() call. Your agent's primary recall path,
get_context(), assembles the active facts plus the most
relevant memories into a prompt-ready block — structured, current
knowledge on demand, with no generative model on the read path. For an
agent builder this means: never prompt-stuff stale context, never
manually reconcile conflicting facts, and never build a separate memory
layer from scratch.
Korely stores memory in three layers. One stack, shared between the human, who reads and edits it in the desktop app, and every agent that writes and queries it over the REST API or SDK. We did not build a separate "agent memory" silo next to the notes. The agent's memory is the user's knowledge base, plus two layers on top of it.
flowchart LR
A["Layer 1 — Memory store<br/>add / search / get_context<br/>scoped by user_id"]
B["Layer 2 — Session<br/>run_id tag<br/>thread-local scope"]
C["Layer 3 — Facts<br/>typed triples<br/>valid_from · invalid_at · as_of"]
A -->|"mined at write time"| C
A -->|"run_id scopes a slice"| B
B -->|"durable conclusions promoted"| C
| Layer | What it holds | Scope | Lifetime |
|---|---|---|---|
| 1 — Memory store | Notes, documents, transcripts, tasks | User | Forever (it is the user's data) |
| 2 — Session memory | Working state of one conversation | Session | The thread |
| 3 — Cross-session facts | Typed (subject, predicate, object) triples | User | Bi-temporal: superseded, never silently lost |
Layer 1 — The memory store
Layer 1 is a managed cloud store in the EU — Postgres with a vector index. Every memory your agent writes lands here, scoped to your account and an end-user namespace. You read and write it over the REST API or the SDK; there is nothing to host yourself, and you can export or erase everything for a user with one call.
For agents, recall over the memory store is semantic vector search:
- 768-dimension vectors. Every memory is embedded as a
768-dimension vector, so
search()finds semantically similar content by cosine distance even when no keyword overlaps. The memory's embedding is computed once at write time; the query is embedded at read time. - Entity graph (behind the facts). Typed entities (people, organizations, products, places, concepts) are extracted from every memory on our own infrastructure and linked with typed edges using canonical predicates in 9 families. The graph is what powers the typed facts in Layer 3 — it is not a search mode the agent queries directly. Details in The graph.
Your primary recall path is get_context() /
GET /v1/context: it assembles the active typed facts for the
user plus the most relevant memories into one prompt-ready block. That
fact-first assembly — current values, contradictions already resolved —
is the piece most agents are missing; it answers "what do we actually
know about this person right now?" without the agent having to guess
search keywords. Raw search() /
POST /v1/memories/search is the secondary path when you want
the underlying memory snippets ranked by semantic similarity.
Layer 2 — Session memory
Session memory is per-conversation working state. When an agent runs a
long thread, a research session, a multi-step refactor, a week-long
project, it calls add() with a stable run_id
to tag memories to that session. A subsequent search()
scoped to the same run_id retrieves only those memories,
keeping the thread coherent without polluting the broader user namespace.
This is what keeps a thread coherent across context-window resets and client restarts: the agent re-reads its own session memories, picks up the state, and continues. It is scoped to the session by design. Durable conclusions belong in Layer 1 (a memory) or Layer 3 (a fact), not in the session log.
Layer 3 — Cross-session facts
Layer 3 is what most people mean by "agent memory": things the system
knows about the user and their world, independent of any one note or
conversation. We store them as typed
(subject, predicate, object) triples, extracted
automatically whenever content is written, and queryable via
get_facts() in the SDK or GET /v1/facts in
the REST API.
Every predicate belongs to one of 9 families:
| Family | Example triple |
|---|---|
| preferences | (Marco, likes, peach fruit salad) |
| people | (Sara, reports_to, Marco) |
| places | (Marco, lives_in, Bologna) |
| work | (Sara, works_at, Acme GmbH) |
| ownership | (Marco, owns, a 2019 MacBook Pro) |
| health | (Marco, allergic_to, peanuts) |
| financial | (Aurora plan, costs, 50 euro per month) |
| events | (team offsite, scheduled_for, 2026-07-12) |
| other | catch-all for valid triples outside the 8 above |
Facts are bi-temporal
Every fact carries valid_from and invalid_at.
When new information contradicts an existing fact, a two-stage
contradiction check invalidates the old one and keeps it as history
instead of deleting it. A chain after a price change looks like this:
(Aurora plan, costs, 40 euro per month) valid_from 2026-05-18 invalid_at 2026-06-07(Aurora plan, costs, 50 euro per month) valid_from 2026-06-07 invalid_at —
By default, GET /v1/facts returns only the facts that are
valid now (those with invalid_at null). Pass
include_invalidated=true and superseded facts come back too,
each carrying its invalid_at timestamp and the
invalidated_by id of the fact that replaced it, so an agent
can reason about how a value changed over time, not just what it is now.
The full mechanics are in
Temporal facts.
Point-in-time queries: the REST API exposes the same
data with as-of slicing (GET /v1/facts?as_of=2026-05-15)
and a prompt-ready GET /v1/context block. See the
API reference.
The human is in the loop
Layer 3 is not a black box the agent maintains behind the user's back. The end user sees every fact in the Korely app. A Memory Panel lists them, lets them edit a wrong one or forget one entirely (erasure, with an audit cascade), and an Entity Profile drawer shows everything known about one person or company. Forgotten facts are excluded from agent reads by default. Korely is the only memory layer that ships a first-class memory UI for the end user. We think memory an agent collects about a person should be inspectable by that person. More in Human in the loop.
How a write flows
When an agent calls add() (or POST /v1/memories),
the pipeline runs asynchronously:
- The memory is persisted: stored in the index and visible to the human in the Korely app.
- Entity extraction runs on our own infrastructure and wires typed entities into the graph.
- Typed fact extraction pulls (subject, predicate, object) triples into Layer 3.
- Each new fact passes the two-stage contradiction check; contradicted facts get
invalid_atstamped. - The memory is embedded into the 768D vector index.
The write call itself returns once the memory is persisted, with
facts empty on the immediate response; extraction then runs
behind it and the typed facts populate shortly after. You never block on
extraction. The write path is where the intelligence runs: the document
embedding, entity extraction, typed-fact extraction with
contradiction checking and bi-temporal validity. It costs about a tenth
of a cent per memory, all included in your plan.
Here is the same pipeline from the SDK. One add call with
raw conversation text in, structured memory out:
from korely_memory import Korely
korely = Korely(api_key="kor_live_...")
# One write call. Extraction runs asynchronously behind it.
memory = korely.add(
"Giulia upgraded to the Advanced plan and asked us to stop "
"calling her. She prefers email follow-ups. She is based in Turin.",
user_id="customer-giulia-4812",
agent_id="support-bot",
)
print(memory.id)
mem_8f2c1a Seconds later, the same raw sentence exists as a searchable memory, three graph entities, and three typed facts. One of the facts retires an older preference via the contradiction check:
Memory mem_8f2c1a (embedded for vector search)
Entities Giulia (person) · Advanced plan (product) · Turin (place)
Facts (Giulia, has_plan, Advanced) family: other (Giulia, likes, email follow-ups) family: preferences ↳ invalidated (Giulia, likes, phone calls) 2026-04-02 → 2026-06-11 (Giulia, lives_in, Turin) family: placesHow a read flows
Reads are retrieval, not generation. Most read calls —
get_facts(), get_profile(),
users(), history() in the SDK, or their REST
equivalents — are pure SQL lookups, zero AI calls.
get_context() reuses the vectors stored at write time and
assembles the user's active typed facts — no new embedding.
search() embeds the query — a fraction of a hundredth of a
cent — and can run a tiny query-understanding step that is skipped
automatically for short keyword queries. Recall over the memory store is
semantic vector similarity (cosine); facts and profile reads are
deterministic SQL.
No generative model ever composes output on the read path. There is no reranker LLM and no answer synthesis. Your agent's own model is the only LLM in the loop: it gets raw, ranked data and does its own reasoning. That split is why read quotas are an order of magnitude more generous than write quotas; a read costs us a database query, not an inference call. Facts reads are deterministic and typically return in under 50 ms.
The intelligence runs at write time. Fact extraction and contradiction checks need a model; retrieval does not. The split is intentional: writes are rare and asynchronous, reads are frequent and latency-sensitive. Pay the inference cost once on the way in, never on the way out.
Scoping a read: user_id, agent_id
Reads accept scoping parameters. Filters are additive (AND): omit
user_id to search the whole agent namespace. The
run_id tag on writes lets you isolate session memories
when listing or deleting, but search() operates across
sessions by design so the relevant context surfaces regardless of which
run created it.
| Parameter | Scopes to | Example |
|---|---|---|
user_id | The end user your agent serves. End users are unlimited on every tier. | customer-giulia-4812 |
agent_id | Your application. One workspace runs many agents with no accidental cross-reads. | support-bot |
run_id | One session or agent run — used as a tag on writes and for listing/deleting session memories. | 2026-06-11-session-03 |
# Same client as above. Scope the read to one person and one app.
results = korely.search(
"contact preferences",
user_id="customer-giulia-4812", # one end user
agent_id="support-bot", # your application
)
for r in results:
print(r.score, r.snippet)
0.91 Giulia prefers email follow-ups; phone preference invalidated 2026-06-11
0.84 Upgraded to the Advanced plan; based in Turin Python and Node.js today. See pricing for plan details.
Which call touches which layer
The REST API and SDK give you one surface for all three layers. Which layer each call touches:
| SDK method / REST endpoint | Layer | Direction |
|---|---|---|
get_context() / GET /v1/context | 1 + 3 — Active facts assembled with relevant memories (primary recall) | Read |
search() / POST /v1/memories/search | 1 — Memory store (semantic vector) | Read |
get_all() / GET /v1/memories | 1 — Memory store | Read |
add() / POST /v1/memories | 1 — Memory store (feeds Layer 3 extraction) | Write |
update(), delete() / PATCH /v1/memories/{id} | 1 — Memory store | Write |
add(run_id=...) / GET /v1/memories?run_id=... | 2 — Session memory (scoped view of Layer 1) | Read / Write |
get_facts() / GET /v1/facts | 3 — Cross-session facts | Read |
add_fact_triple() / POST /v1/facts | 3 — Cross-session facts | Write |
Note the asymmetry: facts are extracted automatically from every
add() call — the memory is always the traceable source.
For explicit fact writes the SDK exposes add_fact_triple()
and the REST API exposes POST /v1/facts. See the
API reference for the full
contract.
EU-hosted: all cloud data lives on our own infrastructure in Helsinki. No data leaves the EU on any tier.
End-to-end example: support bot with persistent memory
The following example shows all three layers interacting in a single support session. The agent writes a conversation turn, then reads back structured context for the next turn, including the typed facts that were extracted automatically from earlier sessions.
from korely_memory import Korely
korely = Korely(api_key="kor_live_...")
# ── Layer 1 write: the agent stores a conversation turn.
# Extraction runs async: entities + typed facts are mined behind the call.
korely.add(
"Customer-4812 called about her Developer subscription. "
"She is based in Milan and wants all follow-ups by email. "
"She is allergic to peanuts — mentioned it while chatting.",
user_id="customer-4812",
agent_id="support-bot",
run_id="session-2026-06-16", # Layer 2 tag — scopes this turn to the session
)
# ── Layers 1 + 3 read: fetch a prompt-ready context block for the next turn.
# get_context() assembles the user's active Layer 3 facts plus the most
# relevant Layer 1 memories into one block. No generative call on the read path.
ctx = korely.get_context(
query="preferred contact channel and account tier",
user_id="customer-4812",
agent_id="support-bot",
)
print(ctx)
# ── Layer 3 read: pull structured facts for an explicit check.
# Returns deterministic SQL rows, not a generated summary.
facts = korely.get_facts(
user_id="customer-4812",
agent_id="support-bot",
)
for f in facts:
print(f.subject, "|", f.predicate, "|", f.object, "|", f.predicate_family)
# Example output (f.predicate is normalized; f.predicate_raw keeps the source verb):
customer-4812 | lives_in | Milan | places
customer-4812 | likes | email follow-ups| preferences
customer-4812 | has_plan | Developer | other
customer-4812 | allergic_to | peanuts | health
The agent wrote one sentence. By the time the next turn calls
get_context(), four typed facts are already in Layer 3,
"Milan", "Developer plan", and "customer-4812" exist as typed entities
in the graph, and the preference fact supersedes any prior contact
preference via the contradiction check. get_context()
returns those active facts assembled into the prompt block. No
post-processing, no prompt engineering to remember what was said three
turns ago.
SDK parity: the Node.js SDK exposes the same surface
with camelCase method names (getContext, getFacts,
addFactTriple, deleteAll, batchStatus).
The REST API is at https://api.korely.ai/v1 with
Authorization: Bearer kor_live_.... All plans, including
hobby (free), share the same three-layer stack.
See also
- Temporal facts —
how
valid_from,invalid_at, and the two-stage contradiction check work in detail, with point-in-timeas_ofqueries. - The knowledge graph —
entity extraction and typed edges, and how shared entities connect
memories your agent did not explicitly link. Query the neighborhood of
an entity with
GET /v1/facts?entity=.... - Human in the loop — how end users see, edit, and forget the memory their agents collect about them, and why that boundary is enforced server-side.