Migrate from Zep
If you built on Zep, the concepts transfer almost one to one. A Zep user
becomes a user_id. A thread becomes a run_id.
Graph search becomes memory search plus a typed facts read. And the part
that usually makes migrations lossy is not lossy here: both systems are
bi-temporal, so the idea that a fact has a validity interval, became true
at one point and stopped being true at another, carries over intact. We
cite the Zep team's paper on our
research page; the two temporal models
are close relatives.
The short version: three steps. Export your episodes
per user with graph.episode.get_by_user_id, import them
with POST /v1/batch, then run a search and a facts read
against the migrated corpus. Entity extraction, graph edges, and typed
facts rebuild automatically on ingest. There is nothing to enable and
nothing to configure.
Concept mapping
| Zep concept | Korely equivalent |
|---|---|
User, created with user.add | user_id: free-form string, no registration call, end users are unlimited on every tier |
Thread, created with thread.create | run_id: session scope, no create call, pass it on writes |
| Episode: raw data ingested verbatim | A memory: POST /v1/memories |
Graph edge carrying a fact | Typed (subject, predicate, object) fact: GET /v1/facts |
valid_at / invalid_at on an edge | valid_from / invalid_at on a fact |
Context block from thread.get_user_context | Prompt-ready block from GET /v1/context |
One structural difference: Zep asks you to register the user and create the thread before adding messages. Korely scoping identifiers are free-form strings that exist the moment you first write with them, so two of your setup calls simply disappear.
Call mapping
Every core Zep Cloud call has a Korely equivalent, in the Python SDK and over REST. The full REST contract is in the API reference.
| Zep call | Korely SDK | Korely REST |
|---|---|---|
client.user.add(user_id="u1") | not needed: users exist on first write | — |
client.thread.create(thread_id="t1", user_id="u1") | not needed: pass run_id="t1" on writes | — |
client.thread.add_messages(thread_id, messages=[...]) | korely.add("...", user_id="u1", run_id="t1") | POST /v1/memories |
client.graph.add(user_id="u1", type="text", data="...") | korely.add("...", user_id="u1") | POST /v1/memories |
client.thread.get_user_context(thread_id) | korely.get_context(query="...", user_id="u1", token_budget=800) | GET /v1/context |
client.graph.search(query, user_id="u1", scope="episodes") | korely.search(query, user_id="u1") | POST /v1/memories/search |
client.graph.search(query, user_id="u1", scope="edges") | korely.get_facts(entity="...") | GET /v1/facts |
client.graph.edge.get_by_user_id(user_id="u1") | korely.get_facts(include_invalidated=True) | GET /v1/facts?include_invalidated=true |
client.graph.episode.get_by_user_id(user_id="u1", lastn=50) | korely.get_all(user_id="u1") | GET /v1/memories?user_id=u1 |
client.user.delete(user_id="u1") | korely.delete_all(user_id="u1") | DELETE /v1/users/:user_id/memories |
Node-neighborhood exploration, the scope="nodes" style of
query, maps to GET /v1/facts?entity=... (SDK
korely.get_facts(entity="...")): you read every typed fact
touching an entity, as pure SQL lookups with zero model calls. That is the
deterministic graph read. For prompt-ready recall, reach for
GET /v1/context, which assembles the entity's active typed
facts and the most relevant memories into one block. See
the graph.
In code, the switch looks like this:
# before
from zep_cloud.client import Zep
from zep_cloud.types import Message
zep = Zep(api_key="z_...")
zep.user.add(user_id="customer-4812")
zep.thread.create(thread_id="support-118", user_id="customer-4812")
zep.thread.add_messages("support-118", messages=[
Message(role="user", content="Prefers invoices as PDF"),
])
# after
from korely_memory import Korely
korely = Korely(api_key="kor_live_...")
korely.add("Prefers invoices as PDF",
user_id="customer-4812", run_id="support-118")
results = korely.search("invoice preferences", user_id="customer-4812")
The Python and Node SDKs are live — pip install korely-memory
and npm install korely-memory. API keys are available now on
the free Hobby plan.
Gotchas
- Backdating: REST yes, SDK
add()no. Zep lets you backdate an episode by passingcreated_atto the graph add call. The Korely SDKadd()convenience method does not expose a timestamp, but the underlyingPOST /v1/memoriesaccepts atimestampfield that is stored as the fact'svalid_from. If your history order matters, you can either backdate per memory over REST, or replay episodes sorted oldest-first throughPOST /v1/batch(no per-item timestamp there): write-time contradiction checking processes them in arrival order, so chronological replay is enough to reconstruct the supersede chain. Alternatively, usePOST /v1/factswith an explicitvalid_fromfor structured triples you already hold. - No
include_historyon search. Zep's search accepts ascopeparameter that toggles between episodes, edges, and nodes. Korely keeps retrieval and facts separate:POST /v1/memories/searchreturns memories;GET /v1/factsreturns typed triples. Passinclude_invalidated=trueto the facts endpoint for the full temporal history including superseded edges. - Agent cap vs user cap. Zep bills by user count. Korely
bills by write quota and agent count. End users (
user_id) are unlimited on every plan. Agents (agent_id) are capped per plan: 2 on Hobby, 10 on Developer, 100 on Team, 500 on Scale. Exceeding the agent cap returns403 agent_cap_exceeded. If you fan out one agent per end user in Zep today, map those to a singleagent_idplus per-useruser_idscoping instead. - Write quota, not request quota. The 429
quota_exceededresponse triggers when your monthly write count passes the plan limit. The body is the standard{"code": "quota_exceeded", "message": "..."}envelope; the write-quota 429 carries noRetry-Afterheader (only the separate rate-limit 429 does, as integer seconds). Read calls (search,get_context,get_facts,history) do not count against writes. - Graph rebuilds automatically; nothing to configure. In Zep you pick node types and configure edge schemas. In Korely, entity extraction, typed-fact derivation, and bi-temporal contradiction checking all run at write time without schema setup. If you import raw episode text, the graph and fact store materialize from it.
Step 1: export from Zep
Episodes are the raw data Zep ingested verbatim, so they are the right
thing to replay. List them per user with
graph.episode.get_by_user_id. Each episode carries its
content and created_at; keep both.
import jsonfrom zep_cloud.client import Zep
zep = Zep(api_key="z_...")user_ids = ["customer-4812", "customer-5113"] # your own user list
export = []for uid in user_ids: response = zep.graph.episode.get_by_user_id(user_id=uid, lastn=1000) for ep in response.episodes: export.append({ "user_id": uid, "content": ep.content, "created_at": ep.created_at, })
json.dump(export, open("zep_export.json", "w"))print(len(export), "episodes exported")
If parts of your graph were built from sources you no longer have, also
export the edges: graph.edge.get_by_user_id returns every
edge with its fact sentence, valid_at, and
invalid_at. Step 2 shows how to replay them so the history
survives.
Step 2: import with the batch endpoint
POST /v1/batch accepts up to 500 memory objects per request,
same shape as POST /v1/memories, processed asynchronously.
You get a job id back and poll it until it completes. Each imported item
runs the full write pipeline: document and chunk embeddings, entity
extraction on our own infrastructure, typed-fact extraction with
contradiction checking and bi-temporal validity. About a tenth of a cent
of intelligence per memory, all included in your plan.
import json, requests
export = json.load(open("zep_export.json"))
memories = [ { "content": item["content"], "user_id": item["user_id"], "metadata": {"source": "zep_export", "created_at": item["created_at"]}, } for item in export]
# POST /v1/batch takes up to 500 memories per requestfor i in range(0, len(memories), 500): r = requests.post( "https://api.korely.ai/v1/batch", headers={"Authorization": "Bearer kor_live_..."}, json={"memories": memories[i : i + 500]}, ) print(r.json()) # {"id": "job_7c20d1", "status": "processing", "received": 500}Poll the job until it reports completed:
curl https://api.korely.ai/v1/batch/job_7c20d1 \ -H "Authorization: Bearer kor_live_..."
# 200 OK{"id": "job_7c20d1", "status": "completed", "received": 500, "imported": 500, "failed": 0, "errors": []}
If your episodes are complete, stop there: extraction rebuilds the graph
and the fact store from the raw data. If you exported edges in step 1,
replay their fact sentences sorted by valid_at,
oldest first. Write-time contradiction checking then reconstructs the
supersede chain in order: the old fact lands, the newer one invalidates
it, and the invalidated version stays queryable as history. Agents that
already hold the structured form can skip extraction entirely and write
triples with POST /v1/facts, which accepts an explicit
valid_from.
Keep the import inspectable. Write migrated memories
into a dedicated folder by setting
"metadata": {"folder": "Agents/ZepImport"} on each item.
Your memory store stays organized and you can audit exactly what came over.
Step 3: verify with context and facts
Where Zep gave you a context block from thread.get_user_context,
Korely gives you GET /v1/context: it assembles the end user's
active typed facts and the most relevant memories into one prompt-ready
block. This is the recall path to verify first, because it is the one your
agent will use in production. No generative model composes the output; it is
fact assembly plus retrieval. Your agent's own model reasons over the block.
curl "https://api.korely.ai/v1/context?query=invoice%20preferences&user_id=customer-4812" \ -H "Authorization: Bearer kor_live_..."
# 200 OK{ "context": "## Known facts\n- customer-4812 likes invoices as PDF (since 2026-05-02)\n\n## Relevant memories\n- Prefers invoices as PDF, replies fastest before 10am CET.", "tokens": 34, "sources": ["fct_2d4a", "mem_5b80e2"]}
For raw retrieval, POST /v1/memories/search ranks an end user's
memories by semantic vector similarity (cosine over embeddings). The only
model call on the read path is the query embedding, a fraction of a
hundredth of a cent. The response is a flat results list; each
item carries a snippet (not the full body), a score,
and the scoping ids.
curl -X POST https://api.korely.ai/v1/memories/search \ -H "Authorization: Bearer kor_live_..." \ -H "Content-Type: application/json" \ -d '{"query": "invoice preferences", "user_id": "customer-4812", "limit": 5}'
# 200 OK{ "results": [ {"id": "mem_5b80e2", "score": 0.69, "snippet": "Prefers invoices as PDF, replies fastest before 10am CET.", "user_id": "customer-4812", "agent_id": null, "metadata": {"source": "zep_export"}} ]}
Then check the temporal layer the way you would have read edges in Zep.
Where you searched with scope="edges" and read
fact, valid_at, and invalid_at,
you now filter facts and read valid_from and
invalid_at. The response is a flat
{"facts": [...], "total": N} list. A fact is live when
its invalid_at is null. Korely normalizes the
predicate verb, so the raw verb you wrote (prefers) is stored
on predicate_raw while predicate holds the
canonical form (likes):
curl "https://api.korely.ai/v1/facts?entity=invoices&include_invalidated=true" \ -H "Authorization: Bearer kor_live_..."
# 200 OK{ "facts": [ {"id": "fct_2d4a", "subject": "customer-4812", "predicate": "likes", "predicate_raw": "prefers", "object": "invoices as PDF", "predicate_family": "preferences", "confidence": 0.9, "valid_from": "2026-05-02T08:10:00Z", "invalid_at": null, "invalidated_by": null, "source_memory_id": "mem_5b80e2"} ], "total": 1}
Point-in-time reads work the way you expect from a bi-temporal store:
pass as_of with an ISO date and you get what was true on
that date. The full model, including the two-stage contradiction check at
write time, is in
temporal facts.
What your agents gain
A managed cloud store behind the API
Every memory your agents write lands in a managed cloud store
(Postgres + pgvector, EU-hosted), organized in named folders and
readable through the Korely app. Memory stops being an opaque index you
can only query through an API and becomes a corpus you can open, skim,
and audit. The Agents/ZepImport folder from step 2 is
exactly that: your migrated graph, browsable in the app.
Memory your end users can see
Korely ships a desktop and web app where the human sees the same memory your agents use. The Memory Panel lists every fact; each one can be edited or forgotten, and an Entity Profile drawer shows everything known about one entity. A fact a user erases there does not come back to agents, on any read path, and one call deletes every memory for a user. If agents carry long-term memory about people, the people get the delete button. Read human in the loop for the full model.
In both cases: reads are retrieval, not generation. The intelligence runs once, at write time, which is why read quotas are an order of magnitude more generous than write quotas. Quotas for each plan are on the pricing page.
Next steps
- Quickstart: wire Korely into your stack and run your first search in five minutes.
- API reference: the complete REST contract behind every call in the mapping table.
- Architecture: where the write-time intelligence runs and why the read path stays deterministic.
- Questions about a large graph or an unusual export shape? Email [email protected]. We read every message.