SDK
korely-memory is a typed Python and Node.js client over the
REST API. Every method maps 1:1
onto an endpoint, so anything you can do with curl you can do with the SDK,
and the JSON shapes in the API reference are the attribute shapes here.
Same backend, same memory, as the
CLI.
# 1. Connectfrom korely_memory import Korelykorely = Korely(api_key="kor_live_...")
# 2. Remember (typed facts + bi-temporal built-in)korely.add("User prefers TypeScript with strict mode", user_id="dana")
# 3. The contradiction resolves itself: the new fact# invalidates the old one, never deletes itkorely.add("Actually I switched to Rust", user_id="dana")
# 4. Recall — assembles the active facts into a prompt-ready blockctx = korely.get_context(query="preferred language", user_id="dana")print(ctx.context)# → "## Known facts\n- dana likes Rust (since ...)"# the superseded TypeScript fact is excluded automaticallyInstall
pip install korely-memoryPython 3.9 or later. The Node SDK needs Node 18+. No heavy dependencies: the SDK is a thin HTTP client. Embeddings, entity extraction, and fact extraction with contradiction checking all run server-side on our own infrastructure, so your install stays small and your process stays light.
Initialize
from korely_memory import Korely
korely = Korely(api_key="kor_live_...")
# Or read the key from the environment (KORELY_API_KEY)korely = Korely()
Keys look like kor_live_... and are scoped per agent
namespace: a key minted for agent_id=support-bot only reads
and writes inside that namespace. All keys belong to the eu
region. Your data is stored and processed in the EU, on our own
infrastructure.
Core methods
Every method wraps exactly one REST endpoint:
| Method | REST endpoint | Path |
|---|---|---|
korely.add(...) | POST /v1/memories | Write |
korely.search(...) | POST /v1/memories/search | Read |
korely.get_all(...) | GET /v1/memories | Read |
korely.get(id) | GET /v1/memories/:id | Read |
korely.update(id, ...) | PATCH /v1/memories/:id | Write |
korely.delete(id) | DELETE /v1/memories/:id | Write |
korely.delete_all(user_id=...) | DELETE /v1/users/:user_id/memories | Write |
korely.add_fact_triple(...) | POST /v1/facts | Write |
korely.get_facts(...) | GET /v1/facts | Read |
korely.get_profile(user_id=...) | GET /v1/profile | Read |
korely.get_context(...) | GET /v1/context | Read |
korely.history(id) | GET /v1/memories/:id/history | Read |
korely.users(...) | GET /v1/users | Read |
korely.batch(...) | POST /v1/batch | Write |
korely.batch_status(id) | GET /v1/batch/:id | Read |
add also accepts a list of chat messages (role / content
dicts), not just a string — they are joined into one block before storing,
so you can hand it a conversation as-is.
Reads are retrieval, not generation. No generative model ever composes output on the read path. There is no reranker model and no answer synthesis; your agent's own model does the reasoning. The write path is where the intelligence runs: embeddings, entity extraction, typed-fact extraction with contradiction checking and bi-temporal validity, about a tenth of a cent per memory, all included. That is why read quotas are an order of magnitude more generous than write quotas.
add
Maps to POST /v1/memories. One call runs the full write
pipeline. The return value includes the facts that were extracted, and
which older facts the write superseded.
memory = korely.add( "Northwind Hosting costs 50 euro per month since the June upgrade.", agent_id="infra-bot", user_id="customer-4812", metadata={"source": "slack"},)
print(memory.id) # mem_8f2c1afor fact in memory.facts: print(fact.subject, fact.predicate, fact.object) # Northwind Hosting costs 50 euro per month print(fact.invalidated) # ["fct_a774"] — the old price, retiredsearch
Maps to POST /v1/memories/search. Semantic vector search:
cosine similarity over the memory embeddings. The only model call on the
read path is the query embedding, a fraction of a hundredth of a cent.
Keyword-style queries (1 to 5 words) work best. For recall you almost always
want get_context (below) instead — it assembles the active
typed facts into a prompt-ready block, which is the primary recall path.
search returns raw memory snippets and is the secondary path.
results = korely.search( "northwind pricing", user_id="customer-4812", limit=5,)
for hit in results: print(hit.id, hit.score, hit.snippet) # mem_8f2c1a 0.91 Northwind Hosting costs 50 euro per month...
Optional filters mirror the REST params: agent_id,
user_id, limit. Filters are additive (AND).
get_all
Maps to GET /v1/memories. List memories in a scope, newest
first, no search query needed. Pass user_id or
agent_id to narrow the scope, and page with
limit (default 50) and offset. Listing is plain
SQL on the read path, no model calls.
page = korely.get_all( user_id="customer-4812", agent_id=None, limit=50, offset=0,)
print(page.total) # 218for memory in page: print(memory.id, memory.created_at) # mem_8f2c1a 2026-06-07T09:14:00Zget
Maps to GET /v1/memories/:id. Full content, metadata, and the
facts extracted from this memory.
memory = korely.get("mem_8f2c1a")
print(memory.content) # Northwind Hosting costs 50 euro per month...print(memory.metadata) # {"source": "slack"}print(memory.created_at) # 2026-06-07T09:14:00Zupdate
Maps to PATCH /v1/memories/:id. Updating content re-runs
extraction, so facts stay in sync with what the memory says. Pass
expected_updated_at for optimistic concurrency: if another
writer got there first, the call raises instead of clobbering.
memory = korely.update( "mem_8f2c1a", content="Northwind Hosting costs 55 euro per month after the storage add-on.", expected_updated_at="2026-06-07T09:14:00Z",)
print(memory.facts[0].object) # 55 euro per monthprint(memory.facts[0].invalidated) # ["fct_b91e"]delete
Maps to DELETE /v1/memories/:id. Forget one memory.
Audited invalidation, not a hard row delete: the memory and its facts
drop out of every default read, and an audit stub records when and by
which key it was forgotten.
receipt = korely.delete("mem_8f2c1a")
print(receipt.status) # forgottenprint(receipt.facts_invalidated) # 1print(receipt.audit_id) # aud_3d0fdelete_all
Maps to DELETE /v1/users/:user_id/memories. Bulk erasure
for one end user: every memory and fact scoped to that
user_id is invalidated in a single call, with one audit
record. Deleting every memory for a user becomes one method call.
receipt = korely.delete_all(user_id="customer-4812")
print(receipt.memories_forgotten) # 218print(receipt.facts_invalidated) # 64print(receipt.audit_id) # aud_91xbget_facts
Maps to GET /v1/facts. Typed (subject, predicate, object)
triples with bi-temporal validity — the heart of the moat. Reads from the
fact store are deterministic SQL, no model calls, typically under 50 ms.
Returns a flat list of facts (use get_profile for the
grouped-by-family view). Pass as_of for a point-in-time query:
what was true on that date. Works on every tier, hobby included.
# Current state: only active factsfacts = korely.get_facts(entity="Northwind Hosting")print(facts[0].object) # 50 euro per monthprint(facts[0].invalid_at) # None — active
# Point-in-time: what did we believe on June 1?facts = korely.get_facts(entity="Northwind Hosting", as_of="2026-06-01")print(facts[0].object) # 40 euro per monthprint(facts[0].invalid_at) # 2026-06-07T09:14:00Z — superseded since
# Full history chain, superseded facts includedfacts = korely.get_facts( entity="Northwind Hosting", include_invalidated=True,)
Filters mirror the REST contract: subject,
entity (matches either side of the triple),
predicate, predicate_family,
include_invalidated, as_of, limit.
See temporal facts for
how invalidation works.
get_context
Maps to GET /v1/context. The one-call method: it assembles a
prompt-ready context block (profile plus relevant facts plus relevant
memories) within a token budget. Assembly is deterministic retrieval and
formatting, not generation. Drop the returned string into your system
prompt. This is the method most agent frameworks use.
ctx = korely.get_context( query="plan infra budget", user_id="customer-4812", token_budget=800,)
print(ctx.tokens) # 642print(ctx.sources) # ["fct_b91e", "mem_8f2c1a"]
messages = [ {"role": "system", "content": f"You are a helpful assistant.\n\n{ctx.context}"}, {"role": "user", "content": user_message},]batch
Maps to POST /v1/batch. Bulk import for migrations: up to 500
memory objects per call (same shape as add), processed
asynchronously. Items count against the memory quota.
job = korely.batch([ {"content": "Prefers async standups over meetings.", "user_id": "customer-0001"}, {"content": "Renewal date moved to October 1st.", "user_id": "customer-0002"},])
print(job.status) # processing
job = korely.batch_status(job.id)print(job.status) # completedprint(job.imported) # 2add_fact_triple
Maps to POST /v1/facts. Write a typed
(subject, predicate, object) fact directly, skipping extraction, when your
agent already has the structured form. The contradiction check still runs,
and the fact is bi-temporal — pass valid_from for a historical
fact.
fact = korely.add_fact_triple( "Marco", "works_at", "Acme GmbH", user_id="customer-4812", subject_type="person", valid_from="2026-06-01",)
print(fact.invalidated) # ids of any facts this one supersededget_profile
Maps to GET /v1/profile. The assembled profile of one end user:
the active facts known about them, the end user's own facts first, grouped
by family. Pass as_of for the profile as it stood on a past date.
profile = korely.get_profile(user_id="customer-4812")print(profile.total) # 7print(list(profile.by_family)) # ["places", "work", "preferences"]
# The profile as it stood on March 1stpast = korely.get_profile(user_id="customer-4812", as_of="2026-03-01")history
Maps to GET /v1/memories/:id/history. The lifecycle of one
memory, keyed on the mem_ id: the events
created, updated, fact_extracted,
and fact_invalidated, each timestamped. (To walk a single
fact's supersede chain, use get_facts with
include_invalidated=True instead.)
h = korely.history("mem_8f2c1a")for event in h.events: print(event.event, event.at) # created ... / fact_extracted ...users
Maps to GET /v1/users. The end users you've stored data for,
each with active memory and fact counts. Returns a page: iterable like a
list, with .total for pagination.
page = korely.users()print(page.total) # 1for u in page: print(u.user_id, u.memories, u.facts)Migrating from another memory API? The migration guide maps the request shapes side by side.
Method signatures at a glance
All parameters are keyword-only in Python and passed as the second
positional object in Node. Node method names are camelCase versions of the
Python names (get_all becomes getAll,
delete_all becomes deleteAll, and so on).
Memory
| Method (Python) | Parameters | Returns |
|---|---|---|
add(content, *, user_id?, agent_id?, run_id?, metadata?) | content: str or list of role/content dicts.
Scope with any combination of user_id, agent_id, run_id.
metadata: free-form dict stored alongside the memory.
| Memory with id, facts, invalidated |
search(query, *, user_id?, agent_id?, limit?) | query: str, keyword or natural-language.
Filters are AND-combined.
limit: default 15, max 50.
| list of SearchHit with id, score, snippet, user_id, agent_id, metadata |
get_all(*, user_id?, agent_id?, limit?, offset?) | No query. Lists by recency. limit default 50. | MemoryPage with memories, total |
get(id) | id: str — e.g. mem_8f2c1a | Memory full object including facts |
update(id, *, content, expected_updated_at?) | Re-runs extraction on new content. expected_updated_at enables optimistic concurrency. | Memory updated |
delete(id) | id: str | DeleteReceipt with status, facts_invalidated, audit_id |
delete_all(*, user_id) | user_id: str — erases every memory for one end user | BulkReceipt with memories_forgotten, facts_invalidated, audit_id |
history(id) | id: str | MemoryHistory with list of events |
batch(items, *) | items: list of add-shaped dicts, max 500 | BatchJob with id, status, received |
batch_status(id) | id: str — job id from batch() | BatchJob with status, received, imported, failed, errors |
Facts
| Method (Python) | Parameters | Returns |
|---|---|---|
get_facts(*, entity?, subject?, predicate?, predicate_family?, user_id?, agent_id?, as_of?, include_invalidated?, limit?) |
All filters optional. entity matches either side of the triple.
predicate is normalized server-side (the raw verb is returned as
predicate_raw); predicate_family is a limited taxonomy,
so many predicates map to other — filter by entity when in doubt.
as_of: ISO-8601 date string for point-in-time queries.
include_invalidated: bool, default false.
Works on every tier, hobby included.
| flat list of Fact with id, subject, predicate, predicate_raw, object, predicate_family, valid_from, invalid_at, invalidated_by, source_memory_id (liveness = invalid_at is null) |
add_fact_triple(subject, predicate, object, *, user_id?, agent_id?, subject_type?, valid_from?) |
Writes a typed triple directly, skipping NLP extraction.
Contradiction check still runs.
valid_from: ISO-8601 for historical back-dating.
| Fact with invalidated list |
Context and profile
| Method (Python) | Parameters | Returns |
|---|---|---|
get_context(*, query, user_id?, agent_id?, token_budget?) | query: the current user turn or topic.
token_budget: default 800. Assembly is deterministic retrieval, no generation.
| Context with context (str ready for system prompt), tokens, sources |
get_profile(*, user_id, as_of?) | user_id: required.
as_of: ISO-8601 for a historical snapshot.
| Profile with total, by_family dict |
Admin
| Method (Python) | Parameters | Returns |
|---|---|---|
users(*, limit?, offset?) | Lists end users you have stored data for. | UserPage with users, total |
End-to-end example
A support bot that remembers each customer across sessions. On every turn it
pulls a prompt-ready context block, calls the LLM, then stores what the user
said. Three SDK calls per turn: get_context,
llm.chat, add.
"""Support bot with persistent per-customer memory.Uses korely-memory for context recall and fact storage."""import osimport google.generativeai as genaifrom korely_memory import Korely, APIError
korely = Korely(api_key=os.environ["KORELY_API_KEY"])genai.configure(api_key=os.environ["GOOGLE_API_KEY"])model = genai.GenerativeModel("gemini-2.0-flash")
AGENT = "support-bot"
def chat(user_id: str, user_message: str) -> str: # 1. Assemble memory context for this user + query ctx = korely.get_context( query=user_message, agent_id=AGENT, user_id=user_id, token_budget=600, )
system = ( "You are a concise support assistant. " "Use the memory context below to personalise your reply.\n\n" + ctx.context )
# 2. Call the LLM reply = model.generate_content( [{"role": "user", "parts": [system + "\n\nUser: " + user_message]}] ).text
# 3. Persist what the user said (extracts facts, detects contradictions) try: korely.add( user_message, agent_id=AGENT, user_id=user_id, metadata={"turn": "user"}, ) except APIError as err: if err.code != "quota_exceeded": raise pass # over the write quota — log and continue, the reply was already generated
return reply
# --- run a two-turn session ---uid = "customer-4812"
print(chat(uid, "Hi, I am on the Developer plan and I prefer async updates."))# context is empty on first turn — the bot greets + confirms
print(chat(uid, "What plan am I on again?"))# second turn: context block contains the plan + preference facts from turn 1# → bot answers correctly without the user repeating themselves
What happens inside add on turn 1: Korely extracts two typed
facts — (customer-4812, subscribed_to, Developer plan) and
(customer-4812, likes, async updates) (the raw verb "prefers"
is normalized to likes, and kept verbatim in
predicate_raw) — and stores them alongside the full memory
text. Extraction runs server-side and is asynchronous, so
memory.facts may be empty on the immediate response and fill in
a moment later. On turn 2, get_context retrieves both: the
context block your system prompt receives contains the plan and preference
before the LLM sees the query. No prompt engineering required beyond
dropping in ctx.context.
One agent, unlimited customers. The same
agent_id="support-bot" can serve thousands of
user_id values. Quotas count writes and queries, never the
number of end users you remember.
Scoping
Three identifiers, three levels of scope. They are the same parameters everywhere: SDK, REST, and CLI.
| Param | What it identifies | Example |
|---|---|---|
agent_id | Your application or agent. One namespace per product surface. | "support-bot" |
user_id | Your end user. Free-form string, you choose the identifier. Scopes memory to one person your agent serves. | "customer-4812" |
run_id | One session or agent run. Sub-scope inside a user. | "session-2026-06-11" |
End users are unlimited on every tier. One support-bot
agent can remember thousands of distinct customers; quotas count memories
and queries, never people. A typical product setup is one
agent_id per surface and one user_id per
customer:
# Write: scoped to this customerkorely.add( "Asked to be contacted on Slack, not email.", agent_id="support-bot", user_id="customer-4812", run_id="session-2026-06-11",)
# Read: only this customer's memory comes backresults = korely.search("contact preference", user_id="customer-4812") Always pass user_id on reads in multi-tenant
products. Filters are additive (AND). A search without
user_id spans every end user in the namespace, which is
what you want for an internal ops agent and not what you want inside a
customer-facing chat.
Error handling
Every error response carries the same envelope —
{"code": "...", "message": "..."} — and the SDK
surfaces it as a single APIError exception with
.status, .code, and .message.
Branch on err.code to handle each case.
| Status | code | When |
|---|---|---|
401 | invalid_key | Missing, malformed, or revoked API key. Message: "Invalid or missing API key". |
404 | not_found | Memory id does not exist, or was forgotten. Message: "Memory not found". |
422 | invalid_request | Validation failure, e.g. "content: Field required". |
429 | quota_exceeded | Monthly write limit reached. The write-quota 429 has no Retry-After; only the per-second rate-limit 429 carries a Retry-After header in integer seconds. |
from korely_memory import Korely, APIError
korely = Korely(api_key="kor_live_...")
try: memory = korely.get("mem_8f2c1a")except APIError as err: if err.code == "invalid_key": # 401: check the key, or rotate it in the dashboard raise elif err.code == "not_found": # 404: the memory does not exist, or an end user forgot it memory = None elif err.code == "quota_exceeded": # 429: monthly write limit reached — upgrade or wait for the reset memory = None else: raise
There is no overage billing, ever. At 80% of quota you get an email and a
quota.warning webhook; past 100% there is a +10% soft cap so
a busy day does not break your agent; past that, writes return
429 with code: "quota_exceeded" (surfaced as
APIError) until the month rolls over or you upgrade.
Your bill is always exactly the tier price. See
the API reference for quotas per
tier.
Related
Explore further by topic:
| Topic | Where to go |
|---|---|
| Full REST contract | API reference — every endpoint, parameter, and response shape. The SDK is a thin wrapper; the reference is the source of truth. |
| CLI surface | CLI reference — same memory, same key, from the terminal. korely context --user-id alice "what do I know about billing?" |
| MCP surface | MCP reference — connect Claude, Cursor, or any MCP-compatible agent to your namespace over OAuth. |
| Bi-temporal facts | Temporal facts — how contradiction detection works, what as_of queries return, and how the fact timeline is stored. |
| End-to-end chatbot | Cookbook: chatbot that remembers — a fuller version of the example above, with conversation history and streaming. |
| Bulk import | Cookbook: bulk import — migrate an existing user history using batch(), with progress tracking and error handling. |
| Surfaces overview | When to use SDK vs CLI vs REST — a decision table for choosing the right surface per use case. |
| Migration from Mem0 | Migration guide — request shapes mapped side by side, and what the Korely fact graph adds. |
| Pricing and quotas | Pricing — hobby (free), developer (€19/mo), team (€79/mo), scale (€249/mo). Writes and queries count; end users are always unlimited. |