Surfaces
Korely has one memory and three ways to reach it. Every surface shares the
same account, the same scoping model (user_id /
agent_id / run_id), and the same permission model.
You don't have to pick one, and nothing you write through one surface is
invisible to another.
The three agent surfaces are: the REST API at
https://api.korely.ai/v1 (the foundation everything else
builds on), the SDK (korely-memory for Python
and Node, wraps the REST API), and the CLI (ships inside
the Python package, pipeable into any shell).
Install / connect
Every surface points at the same base URL:
https://api.korely.ai/v1, EU-hosted in Helsinki. Auth is a
single header on every request: Authorization: Bearer kor_live_....
Get a key from the dashboard after signup.
pip install korely-memory# Python 3.9+; no heavy deps — embeddings and extraction run server-sideWrite from anywhere
Any surface
- Your product via SDK
- A cron job via CLI
- A pipeline via REST
One account
Your memory store
- Cloud store (Postgres + pgvector), EU-hosted
- Bi-temporal facts
- One permission model
Read from anywhere
Every other surface
- CLI writes at 2am, SDK reads at 9am
- Your app writes a fact, your agent recalls it
Capability matrix
| Operation | CLI | SDK / REST |
|---|---|---|
| Search memories | ✓ | ✓ |
| Read a memory, related items | ✓ | ✓ |
| Typed facts (bi-temporal) 1 | ✓ | ✓ |
| Write memories 2 | ✓ | ✓ |
| Batch import 3 | — | ✓ |
Point-in-time facts (as_of) | ✓ | ✓ |
Delete all memories for a user_id | ✓ | ✓ |
| Answer generation 4 | — | — |
- Typed facts — the bi-temporal core — are available on every tier, including the free Hobby tier. Reading facts costs only a query against your quota.
-
The CLI exposes
korely addfor writing a new memory andkorely delete/korely delete-allfor removal. -
Bulk import (
POST /v1/batch) is REST/SDK-only. Point-in-time fact queries (as_of) work on every surface, including the CLI (korely facts --as-of). See the API reference. - No surface generates answers, by design. Reads are retrieval, not generation: no model composes output on the read path, so your agent's own model does the reasoning and read quotas are an order of magnitude more generous than write quotas. Full detail in Architecture.
Python and Node.js are available now.
CLI
The scripting surface. Search, read, add, and delete memories from any terminal, pipeable into anything. Reach for it when you already know what to pull: cron jobs, CI prep, editor commands, shell pipelines.
$ korely search "pricing decisions" --limit 3
1. Pricing review June score 0.92
2. Plan change approval score 0.87
3. Q2 budget retro score 0.81
$ korely search "open action items" --json > weekly.json
$ jq '.results[0].snippet' weekly.json
"Open actions, week 24"
# Pipe results into any local model
$ korely search "customer feedback" | ollama run llama3.1 "summarize" Read the CLI deep dive →
Install, every command, --json for scripts, and all flags.
SDK / REST
The product surface. Per-customer memory inside your own application:
developer API keys, user_id scoping for your end users
(unlimited on every tier), agent_id namespaces for your app,
run_id for sessions. One call to write, one call to read.
from korely_memory import Korely
korely = Korely(api_key="kor_live_...") # EU-hosted, Helsinki
# Write: graph + typed-fact extraction included in the call
korely.add(
"Prefers invoices as PDF, replies fastest before 10am CET",
user_id="customer-4812", # your end user, unlimited on every tier
agent_id="billing-bot", # your app's namespace
)
# Recall: fact-assembled context block, prompt-ready (the moat)
ctx = korely.get_context(query="invoice preferences", user_id="customer-4812")
print(ctx.context) # active typed facts + relevant memories, no AI on the read path Read the SDK deep dive → Client setup, scoping, facts, and the full REST API reference behind it.
Which surface should I use?
- You are scripting or automating (cron, CI, shell
pipelines, editor commands): use the
CLI. Composable,
--jsonoutput, no SDK dependency in the pipeline. - You are building a product with memory per customer: use
the SDK. API keys,
user_idscoping, quotas. - You want direct HTTP control (any language, any runtime): call the REST API directly. Same auth header, same scoping model.
| Use case | Best surface | Why |
|---|---|---|
| Per-customer memory inside your product | SDK / REST | user_id scoping, API keys, quotas — end users unlimited on every tier. |
| Pipe memory into CI / cron scripts | CLI | Composable, --json output, no SDK dependency in the pipeline. |
| A support bot serving 10,000 customers | SDK / REST | One agent_id, 10,000 user_id values. Filters are additive. |
| Automation workflows in n8n | SDK / REST | HTTP calls, full scoping model, nothing to host yourself. |
| Audit or correct what an agent remembered about me | Memory Panel (in-app) | Human-in-the-loop: see, edit, forget any fact. |
| Migrating off Mem0 | SDK / REST | Same user_id / agent_id / run_id scoping. See the API reference. |
Use them together
The surfaces aren't competing options. They are entry points into the same
account. A setup we run ourselves: the SDK embedded in a product so each
customer gets their own memory under their own user_id, the
CLI in a nightly cron that adds a digest memory, and the REST API hit
directly from a webhook handler. Three surfaces, one memory store. The
memory the cron job writes at 2am is in the agent's search results at 9am.
One scoping model everywhere. user_id is
your end user's identifier (free-form, e.g. "customer-4812",
unlimited on every tier). agent_id is your application's
namespace (e.g. "support-bot"). run_id is one
session. The same three keys mean the same thing on every surface. Full
detail in Memory model.
Why reads feel free. Most read operations are pure SQL lookups, zero AI calls. Search embeds the query (a fraction of a hundredth of a cent) and retrieves by semantic vector similarity (cosine over embeddings). Facts reads are deterministic and typically return in under 50 ms. The intelligence runs on the write path: embeddings, entity extraction, typed-fact extraction with contradiction checking, about a tenth of a cent per memory, all included. Details in Architecture.
REST endpoints
Every SDK method and CLI subcommand maps to exactly one endpoint. The base
URL is https://api.korely.ai/v1. Auth is always
Authorization: Bearer kor_live_....
| Method | Path | What it does |
|---|---|---|
POST | /memories | Write a memory. Runs the full pipeline: embed, entity extract, fact extract with contradiction check. |
GET | /memories | List memories in a scope, newest first. Filters: user_id, agent_id, limit, offset. |
GET | /memories/{id} | Full content, metadata, and extracted facts for one memory. |
PATCH | /memories/{id} | Update content. Re-runs extraction. Pass expected_updated_at for optimistic concurrency (409 on stale write). |
DELETE | /memories/{id} | Forget one memory. Audited invalidation, not a hard delete. Returns an audit id. |
GET | /memories/{id}/history | Lifecycle of one memory: created, updated, forgotten, plus every fact it produced. |
POST | /memories/search | Semantic vector search (cosine over embeddings). Returns {results:[...]} with a snippet per hit. Body: query, user_id, agent_id, limit. |
GET | /facts | Typed (subject, predicate, object) triples — the bi-temporal core. Filters: entity, subject, predicate, predicate_family, as_of, include_invalidated, limit. Available on every tier. Returns flat JSON {facts:[...],total}. |
POST | /facts | Write a typed fact directly. Contradiction check still runs. Bi-temporal: pass valid_from for historical facts. |
GET | /context | The moat recall path: a fact-assembled, prompt-ready context block within a token budget. Query: query, user_id, agent_id, token_budget (default 800). |
POST | /batch | Bulk import up to 500 memory objects, processed async. Returns {id, status, received}. |
GET | /batch/{id} | Status and result counts for a batch job: {id, status, received, imported, failed, errors}. |
GET | /users | End users with stored data, as objects {user_id, memories, facts, last_active}. Paginated. |
DELETE | /users/{end_user}/memories | GDPR bulk forget: every memory and fact for one end user, one audit record. Returns 200 + {user_id, memories_forgotten, facts_invalidated, audit_id}. |
GET | /agents | Agent namespaces tied to the key, with memory counts. |
DELETE | /agents/{id} | Delete an agent namespace and all its memories. |
GET | /ping | Auth check. Requires the auth header like every endpoint. Returns {"ok":true,"tier":"...","scopes":[...]}. |
SDK method reference
Python uses snake_case; Node.js uses camelCase
for the same methods. The table below uses the Python name; the Node.js
equivalent is in parentheses where it differs.
| Method (Python / Node.js) | Signature (key params) | REST |
|---|---|---|
add() | content, *, user_id, agent_id, run_id, metadata | POST /memories |
search() | query, user_id, agent_id, limit | POST /memories/search |
get_all() / getAll() | user_id, agent_id, limit, offset | GET /memories |
get(id) | id | GET /memories/{id} |
update(id, ...) | id, content, expected_updated_at | PATCH /memories/{id} |
delete(id) | id | DELETE /memories/{id} |
delete_all() / deleteAll() | user_id | DELETE /users/{end_user}/memories |
history(id) | id | GET /memories/{id}/history |
get_facts() / getFacts() | entity, subject, predicate, predicate_family, as_of, include_invalidated, limit | GET /facts |
add_fact_triple() / addFactTriple() | subject, predicate, object, *, user_id, agent_id, subject_type, valid_from | POST /facts |
get_context() / getContext() | query, user_id, agent_id, token_budget | GET /context |
get_profile() / getProfile() | user_id, as_of | GET /profile |
batch() | items: list of add-shaped objects | POST /batch |
batch_status() / batchStatus() | job_id | GET /batch/{id} |
users() | agent_id, limit, offset | GET /users |
Every method is documented with examples in the SDK deep dive. The REST shapes are in the API reference.
CLI command reference
The korely command ships inside the Python package. All
subcommands accept --user-id, --agent-id, and
--json unless noted. The facts subcommand has
additional filter flags.
| Command | Key flags | One-line example |
|---|---|---|
korely auth | --api-key | korely auth — verify key and print base URL + end-user count |
korely add "..." | --user-id --agent-id --run-id | korely add "Prefers PDF invoices" --user-id customer-4812 |
korely search "..." | --user-id --agent-id --limit --json | korely search "invoice prefs" --user-id customer-4812 --limit 5 |
korely context "..." | --user-id --agent-id --json | korely context "budget decisions" --user-id alice | ollama run llama3.1 "summarize" |
korely facts | --user-id --agent-id --entity --subject --family --as-of --include-invalidated --limit --json | korely facts --user-id customer-4812 --family preferences --json |
korely profile | --user-id --json | korely profile --user-id customer-4812 |
korely get <id> | --json | korely get mem_8f2c1a |
korely users | --agent-id --limit --json | korely users --json | jq '.users[].user_id' |
korely delete <id> | --json | korely delete mem_8f2c1a |
korely delete-all | --user-id --yes --json | korely delete-all --user-id customer-4812 --yes |
Every subcommand is documented with all flags and worked examples in the CLI deep dive.
Error codes
Every error response carries the same flat envelope:
{"code":"<slug>","message":"<text>"} — never an
error or detail field. The SDK raises a typed
exception per status (with APIError as the base for anything
else); the CLI exits non-zero and prints the error JSON to stderr.
| Status | Code | SDK exception | When it happens |
|---|---|---|---|
401 | invalid_key | AuthenticationError | Missing, malformed, or revoked API key. |
403 | agent_cap_exceeded | NamespaceForbiddenError | A new agent_id would exceed your plan's agent limit. |
404 | not_found | NotFoundError | Memory or fact id does not exist, or was forgotten. |
409 | stale_write | StaleWriteError | update with an expected_updated_at older than the current record. |
422 | invalid_request | APIError | Request body failed schema validation. Message is flat, e.g. "content: Field required". |
429 | quota_exceeded | QuotaExceededError | Monthly write or query quota exceeded the +10% grace period. The write-quota 429 carries no Retry-After; only a rate-limit 429 sends Retry-After (integer seconds, header). |
There is no overage billing. At 80% of quota you receive an email and a
quota.warning webhook. Past 100%, writes and searches that
exceed the soft cap (10% above the tier limit) return 429 until the month
rolls over or you upgrade. Your bill is always exactly the tier price.
End-to-end example
A customer support agent that remembers preferences from one conversation and recalls them in the next. One agent namespace, one end user, three calls total.
from korely_memory import Korely
korely = Korely(api_key="kor_live_...", region="eu")
# ── Conversation 1: store what the customer told us ──────────────────────────memory = korely.add( "Prefers invoices as PDF, replies fastest before 10am CET", user_id="customer-4812", agent_id="support-bot", metadata={"source": "support-chat"},)print(memory.id) # mem_8f2c1aprint(len(memory.facts)) # 2 — e.g. "prefers PDF invoices", "active before 10am CET"
# ── Conversation 2 (next day): assemble context before the model call ────────ctx = korely.get_context( query="billing preferences", user_id="customer-4812", agent_id="support-bot", token_budget=400,)
messages = [ {"role": "system", "content": f"You are a helpful billing assistant.\n\n{ctx.context}"}, {"role": "user", "content": "Can you send me the invoice?"},]# → model sees: "customer-4812 prefers invoices as PDF (from 2026-06-11)"# → model replies: "I'll send the invoice to you as a PDF right away."
# ── GDPR: customer asks to be forgotten ─────────────────────────────────────receipt = korely.delete_all(user_id="customer-4812")print(receipt.memories_forgotten) # 1print(receipt.facts_invalidated) # 2print(receipt.audit_id) # aud_3d0fPricing
Four tiers, all priced in EUR. Quotas reset monthly. End users
(user_id values) are unlimited on every tier.
| Tier | Price | Write quota | Query quota | Agent cap |
|---|---|---|---|---|
| Hobby | Free | 1,000 writes/mo | 25,000 queries/mo | 2 agents |
| Developer | €19/mo | 5,000 writes/mo | 250,000 queries/mo | 10 agents |
| Team | €79/mo | 25,000 writes/mo | 1,000,000 queries/mo | 100 agents |
| Scale | €249/mo | 75,000 writes/mo | 10,000,000 queries/mo | 500 agents |
No overage billing. A soft +10% grace period applies past the monthly quota
limit; requests that exceed that cap return 429. Typed facts
(get_facts, add_fact_triple, as_of
queries) — the bi-temporal core — are available on every tier, including
the free Hobby tier. Reading them costs only a query against your quota.
Related
- SDK deep dive — every method with full signatures, scoping rules, error handling, and worked examples.
- CLI deep dive — every subcommand with all flags, JSON output, and shell pipeline patterns.
- API reference — the full REST contract, request and response shapes, endpoint by endpoint.
- Memory model — how
user_id,agent_id, andrun_idscoping works across all surfaces. - Temporal facts — the
bi-temporal model behind
get_factsandas_of. - Architecture — why writes cost and reads are nearly free.