Migrate from Zep

If you built on Zep, the concepts transfer almost one to one. A Zep user becomes a user_id. A thread becomes a run_id. Graph search becomes memory search plus a typed facts read. And the part that usually makes migrations lossy is not lossy here: both systems are bi-temporal, so the idea that a fact has a validity interval, became true at one point and stopped being true at another, carries over intact. We cite the Zep team's paper on our research page; the two temporal models are close relatives.

The short version: three steps. Export your episodes per user with graph.episode.get_by_user_id, import them with POST /v1/batch, then run a search and a facts read against the migrated corpus. Entity extraction, graph edges, and typed facts rebuild automatically on ingest. There is nothing to enable and nothing to configure.

Concept mapping

Zep concept	Korely equivalent
User, created with `user.add`	`user_id`: free-form string, no registration call, end users are unlimited on every tier
Thread, created with `thread.create`	`run_id`: session scope, no create call, pass it on writes
Episode: raw data ingested verbatim	A memory: `POST /v1/memories`
Graph edge carrying a `fact`	Typed (subject, predicate, object) fact: `GET /v1/facts`
`valid_at` / `invalid_at` on an edge	`valid_from` / `invalid_at` on a fact
Context block from `thread.get_user_context`	Prompt-ready block from `GET /v1/context`

One structural difference: Zep asks you to register the user and create the thread before adding messages. Korely scoping identifiers are free-form strings that exist the moment you first write with them, so two of your setup calls simply disappear.

Call mapping

Every core Zep Cloud call has a Korely equivalent, in the Python SDK and over REST. The full REST contract is in the API reference.

Zep call	Korely SDK	Korely REST
`client.user.add(user_id="u1")`	not needed: users exist on first write	—
`client.thread.create(thread_id="t1", user_id="u1")`	not needed: pass `run_id="t1"` on writes	—
`client.thread.add_messages(thread_id, messages=[...])`	`korely.add("...", user_id="u1", run_id="t1")`	`POST /v1/memories`
`client.graph.add(user_id="u1", type="text", data="...")`	`korely.add("...", user_id="u1")`	`POST /v1/memories`
`client.thread.get_user_context(thread_id)`	`korely.get_context(query="...", user_id="u1", token_budget=800)`	`GET /v1/context`
`client.graph.search(query, user_id="u1", scope="episodes")`	`korely.search(query, user_id="u1")`	`POST /v1/memories/search`
`client.graph.search(query, user_id="u1", scope="edges")`	`korely.get_facts(entity="...")`	`GET /v1/facts`
`client.graph.edge.get_by_user_id(user_id="u1")`	`korely.get_facts(include_invalidated=True)`	`GET /v1/facts?include_invalidated=true`
`client.graph.episode.get_by_user_id(user_id="u1", lastn=50)`	`korely.get_all(user_id="u1")`	`GET /v1/memories?user_id=u1`
`client.user.delete(user_id="u1")`	`korely.delete_all(user_id="u1")`	`DELETE /v1/users/:user_id/memories`

Node-neighborhood exploration, the scope="nodes" style of query, maps to GET /v1/facts?entity=... (SDK korely.get_facts(entity="...")): you read every typed fact touching an entity, as pure SQL lookups with zero model calls. That is the deterministic graph read. For prompt-ready recall, reach for GET /v1/context, which assembles the entity's active typed facts and the most relevant memories into one block. See the graph.

In code, the switch looks like this:

korely_memory.py python

# before
from zep_cloud.client import Zep
from zep_cloud.types import Message
zep = Zep(api_key="z_...")
zep.user.add(user_id="customer-4812")
zep.thread.create(thread_id="support-118", user_id="customer-4812")
zep.thread.add_messages("support-118", messages=[
    Message(role="user", content="Prefers invoices as PDF"),
])

# after
from korely_memory import Korely
korely = Korely(api_key="kor_live_...")
korely.add("Prefers invoices as PDF",
           user_id="customer-4812", run_id="support-118")

results = korely.search("invoice preferences", user_id="customer-4812")

The Python and Node SDKs are live — pip install korely-memory and npm install korely-memory. API keys are available now on the free Hobby plan.

Gotchas

Backdating: REST yes, SDK add() no. Zep lets you backdate an episode by passing created_at to the graph add call. The Korely SDK add() convenience method does not expose a timestamp, but the underlying POST /v1/memories accepts a timestamp field that is stored as the fact's valid_from. If your history order matters, you can either backdate per memory over REST, or replay episodes sorted oldest-first through POST /v1/batch (no per-item timestamp there): write-time contradiction checking processes them in arrival order, so chronological replay is enough to reconstruct the supersede chain. Alternatively, use POST /v1/facts with an explicit valid_from for structured triples you already hold.
No include_history on search. Zep's search accepts a scope parameter that toggles between episodes, edges, and nodes. Korely keeps retrieval and facts separate: POST /v1/memories/search returns memories; GET /v1/facts returns typed triples. Pass include_invalidated=true to the facts endpoint for the full temporal history including superseded edges.
Agent cap vs user cap. Zep bills by user count. Korely bills by write quota and agent count. End users (user_id) are unlimited on every plan. Agents (agent_id) are capped per plan: 2 on Hobby, 10 on Developer, 100 on Team, 500 on Scale. Exceeding the agent cap returns 403 agent_cap_exceeded. If you fan out one agent per end user in Zep today, map those to a single agent_id plus per-user user_id scoping instead.
Write quota, not request quota. The 429 quota_exceeded response triggers when your monthly write count passes the plan limit. The body is the standard {"code": "quota_exceeded", "message": "..."} envelope; the write-quota 429 carries no Retry-After header (only the separate rate-limit 429 does, as integer seconds). Read calls (search, get_context, get_facts, history) do not count against writes.
Graph rebuilds automatically; nothing to configure. In Zep you pick node types and configure edge schemas. In Korely, entity extraction, typed-fact derivation, and bi-temporal contradiction checking all run at write time without schema setup. If you import raw episode text, the graph and fact store materialize from it.

Step 1: export from Zep

Episodes are the raw data Zep ingested verbatim, so they are the right thing to replay. List them per user with graph.episode.get_by_user_id. Each episode carries its content and created_at; keep both.

import json
from zep_cloud.client import Zep

zep = Zep(api_key="z_...")
user_ids = ["customer-4812", "customer-5113"]  # your own user list

export = []
for uid in user_ids:
    response = zep.graph.episode.get_by_user_id(user_id=uid, lastn=1000)
    for ep in response.episodes:
        export.append({
            "user_id": uid,
            "content": ep.content,
            "created_at": ep.created_at,
        })

json.dump(export, open("zep_export.json", "w"))
print(len(export), "episodes exported")

If parts of your graph were built from sources you no longer have, also export the edges: graph.edge.get_by_user_id returns every edge with its fact sentence, valid_at, and invalid_at. Step 2 shows how to replay them so the history survives.

Step 2: import with the batch endpoint

POST /v1/batch accepts up to 500 memory objects per request, same shape as POST /v1/memories, processed asynchronously. You get a job id back and poll it until it completes. Each imported item runs the full write pipeline: document and chunk embeddings, entity extraction on our own infrastructure, typed-fact extraction with contradiction checking and bi-temporal validity. About a tenth of a cent of intelligence per memory, all included in your plan.

import json, requests

export = json.load(open("zep_export.json"))

memories = [
    {
        "content": item["content"],
        "user_id": item["user_id"],
        "metadata": {"source": "zep_export", "created_at": item["created_at"]},
    }
    for item in export
]

# POST /v1/batch takes up to 500 memories per request
for i in range(0, len(memories), 500):
    r = requests.post(
        "https://api.korely.ai/v1/batch",
        headers={"Authorization": "Bearer kor_live_..."},
        json={"memories": memories[i : i + 500]},
    )
    print(r.json())  # {"id": "job_7c20d1", "status": "processing", "received": 500}

Poll the job until it reports completed:

curl https://api.korely.ai/v1/batch/job_7c20d1 \
  -H "Authorization: Bearer kor_live_..."

# 200 OK
{"id": "job_7c20d1", "status": "completed", "received": 500,
 "imported": 500, "failed": 0, "errors": []}

If your episodes are complete, stop there: extraction rebuilds the graph and the fact store from the raw data. If you exported edges in step 1, replay their fact sentences sorted by valid_at, oldest first. Write-time contradiction checking then reconstructs the supersede chain in order: the old fact lands, the newer one invalidates it, and the invalidated version stays queryable as history. Agents that already hold the structured form can skip extraction entirely and write triples with POST /v1/facts, which accepts an explicit valid_from.

Keep the import inspectable. Write migrated memories into a dedicated folder by setting "metadata": {"folder": "Agents/ZepImport"} on each item. Your memory store stays organized and you can audit exactly what came over.

Step 3: verify with context and facts

Where Zep gave you a context block from thread.get_user_context, Korely gives you GET /v1/context: it assembles the end user's active typed facts and the most relevant memories into one prompt-ready block. This is the recall path to verify first, because it is the one your agent will use in production. No generative model composes the output; it is fact assembly plus retrieval. Your agent's own model reasons over the block.

curl "https://api.korely.ai/v1/context?query=invoice%20preferences&user_id=customer-4812" \
  -H "Authorization: Bearer kor_live_..."

# 200 OK
{
  "context": "## Known facts\n- customer-4812 likes invoices as PDF (since 2026-05-02)\n\n## Relevant memories\n- Prefers invoices as PDF, replies fastest before 10am CET.",
  "tokens": 34,
  "sources": ["fct_2d4a", "mem_5b80e2"]
}

For raw retrieval, POST /v1/memories/search ranks an end user's memories by semantic vector similarity (cosine over embeddings). The only model call on the read path is the query embedding, a fraction of a hundredth of a cent. The response is a flat results list; each item carries a snippet (not the full body), a score, and the scoping ids.

curl -X POST https://api.korely.ai/v1/memories/search \
  -H "Authorization: Bearer kor_live_..." \
  -H "Content-Type: application/json" \
  -d '{"query": "invoice preferences", "user_id": "customer-4812", "limit": 5}'

# 200 OK
{
  "results": [
    {"id": "mem_5b80e2", "score": 0.69,
     "snippet": "Prefers invoices as PDF, replies fastest before 10am CET.",
     "user_id": "customer-4812", "agent_id": null,
     "metadata": {"source": "zep_export"}}
  ]
}

Then check the temporal layer the way you would have read edges in Zep. Where you searched with scope="edges" and read fact, valid_at, and invalid_at, you now filter facts and read valid_from and invalid_at. The response is a flat {"facts": [...], "total": N} list. A fact is live when its invalid_at is null. Korely normalizes the predicate verb, so the raw verb you wrote (prefers) is stored on predicate_raw while predicate holds the canonical form (likes):

curl "https://api.korely.ai/v1/facts?entity=invoices&include_invalidated=true" \
  -H "Authorization: Bearer kor_live_..."

# 200 OK
{
  "facts": [
    {"id": "fct_2d4a", "subject": "customer-4812", "predicate": "likes",
     "predicate_raw": "prefers", "object": "invoices as PDF",
     "predicate_family": "preferences", "confidence": 0.9,
     "valid_from": "2026-05-02T08:10:00Z", "invalid_at": null,
     "invalidated_by": null, "source_memory_id": "mem_5b80e2"}
  ],
  "total": 1
}

Point-in-time reads work the way you expect from a bi-temporal store: pass as_of with an ISO date and you get what was true on that date. The full model, including the two-stage contradiction check at write time, is in temporal facts.

What your agents gain

A managed cloud store behind the API

Every memory your agents write lands in a managed cloud store (Postgres + pgvector, EU-hosted), organized in named folders and readable through the Korely app. Memory stops being an opaque index you can only query through an API and becomes a corpus you can open, skim, and audit. The Agents/ZepImport folder from step 2 is exactly that: your migrated graph, browsable in the app.

Memory your end users can see

Korely ships a desktop and web app where the human sees the same memory your agents use. The Memory Panel lists every fact; each one can be edited or forgotten, and an Entity Profile drawer shows everything known about one entity. A fact a user erases there does not come back to agents, on any read path, and one call deletes every memory for a user. If agents carry long-term memory about people, the people get the delete button. Read human in the loop for the full model.

In both cases: reads are retrieval, not generation. The intelligence runs once, at write time, which is why read quotas are an order of magnitude more generous than write quotas. Quotas for each plan are on the pricing page.

Next steps

Quickstart: wire Korely into your stack and run your first search in five minutes.
API reference: the complete REST contract behind every call in the mapping table.
Architecture: where the write-time intelligence runs and why the read path stays deterministic.
Questions about a large graph or an unusual export shape? Email [email protected]. We read every message.