Migrate from DIY RAG
If your agent remembers things, somewhere in your codebase there is a vector table, an embedding call, and a similarity query. It probably took an afternoon to build and it works. This page is for the moment after that: it lays out what the pipeline asks of you over the following months, shows the same outcomes as a handful of API calls, and walks through importing the corpus you already have.
The short version: korely.add replaces
your write pipeline, korely.search replaces your retrieval
query, korely.get_facts and korely.get_context
give you layers a vector table does not have. Migration is a
SELECT from your store and a POST /v1/batch
into ours, scoped by user_id. There is also an honest
section at the end about when you should keep your own pipeline.
Concept mapping
Every DIY retrieval pipeline has the same moving parts under different names. Here is how they map to Korely concepts.
| DIY concept | Korely equivalent | Notes |
|---|---|---|
| Vector row / embedding document | Memory (mem_...) | Korely chunks, embeds, and stores the canonical text. You never touch raw vectors. |
Tenant column (WHERE user_id = ...) | user_id parameter | Passed as a query parameter on every read and write. End users are unlimited on every plan; the isolation is enforced at the API layer. |
| Application / service namespace | agent_id parameter | Optional second scope. Use it if you run multiple agents sharing one API key and want to keep their memories separate. |
| Session / conversation window | run_id parameter |
Sub-scopes a single run inside agent_id + user_id.
Useful for chatbots that want to separate distinct conversations.
|
| Metadata dict / extra columns | metadata field on add / search | Arbitrary JSON attached at write time, returned in search results. Not indexed for semantic search; use it for source tags, timestamps, or any structured label you want back on retrieval. |
| Entity table / knowledge graph | Entity graph (automatic) |
Built at write time via GLiNER extraction. Queried via
get_related in the MCP surface and via the fact store.
No separate write step required.
|
| Assertion / slot store | Fact (fct_...) |
Typed (subject, predicate, object) triple with bi-temporal validity.
Extracted automatically on add; queried via
get_facts. Superseded facts are retired, not deleted;
time-travel queries (as_of) see the old value.
|
| DELETE FROM memories WHERE user_id = ... | delete_all(user_id=...) | Invalidates all memories and facts in scope and returns an audit record. One call, one audit trail. |
| Bulk import / ETL job | batch / POST /v1/batch |
Up to 500 objects per request, processed asynchronously. Poll
status via batch_status(batch_id) or
GET /v1/batch/{id}.
|
Call mapping
If you have existing code and want to see the direct substitution, the table below maps the most common DIY operations to their Korely SDK and REST equivalents.
| DIY operation | Python SDK | REST |
|---|---|---|
| INSERT a new memory | korely.add(content, user_id=...) | POST /v1/memories |
| Similarity search | korely.search(query, user_id=...) | POST /v1/memories/search |
| Fetch one memory by id | korely.get("mem_8f2c1a") | GET /v1/memories/mem_8f2c1a |
| List all memories for a user | korely.get_all(user_id=...) | GET /v1/memories?user_id=... |
| Update a memory's text | korely.update("mem_8f2c1a", content=...) | PATCH /v1/memories/mem_8f2c1a |
| Delete one memory | korely.delete("mem_8f2c1a") | DELETE /v1/memories/mem_8f2c1a |
| Delete all data for a user | korely.delete_all(user_id=...) | DELETE /v1/users/{end_user}/memories |
| History / audit of a memory | korely.history("mem_8f2c1a") | GET /v1/memories/mem_8f2c1a/history |
| Read typed assertions / slots | korely.get_facts(entity=..., user_id=...) | GET /v1/facts?entity=...&user_id=... |
| Assemble prompt context | korely.get_context(query=..., user_id=...) | GET /v1/context?query=...&user_id=... |
| Bulk import | korely.batch([...]) + korely.batch_status(batch_id) | POST /v1/batch + GET /v1/batch/{id} |
Node.js SDK naming: methods use camelCase equivalents
of the Python names. get_context becomes
getContext, delete_all becomes
deleteAll, batch_status becomes
batchStatus, and so on. The parameters and return shapes
are identical.
What a DIY memory pipeline involves over time
None of this is exotic. It is ordinary engineering, and every team that runs its own retrieval stack does some version of it. The point is not that any single item is hard; the point is that together they are a standing maintenance contract, and the work recurs as your corpus, your models, and your user base change.
| Concern | What it takes over time |
|---|---|
| Chunking strategy | Pick split sizes and overlap, special-case headings, code blocks, and tables. When retrieval misses, the fix is often a re-chunk, and a re-chunk means a re-embed of everything downstream. |
| Embedding refresh | Embedding models get deprecated and replaced. A model swap means re-embedding the whole corpus, rebuilding the index, and versioning vectors so old and new never mix inside one query. |
| Hybrid search tuning | Vector-only search misses exact identifiers, ticket numbers, and names; full-text-only misses paraphrase. Fusing both introduces rank-fusion weights that you tune, and re-tune as the corpus grows. |
| Entity dedup | "ACME", "Acme Corp", and "acme-corp" are three different strings. Without alias canonicalization they become three disconnected islands of memory, and the dedup job becomes its own queue. |
| Contradiction handling | "The price is now 55" should supersede "the price is 50", not sit next to it with a similar score. Detecting the conflict, retiring the old statement, and keeping it as history is a write-time system of its own. |
| Per-user isolation |
Every read needs a tenant filter that can never be forgotten. One
missing WHERE user_id = ... clause puts one customer's
memory into another customer's prompt.
|
| Deletion flows | Deleting a user means rows, chunks, vectors, index entries, extracted facts, and caches, plus a record that proves it happened. "DELETE FROM memories" is the easy 20 percent. |
If memory is a feature of your product rather than the product itself, this table is overhead between you and the thing you are actually building.
The same outcomes in four calls
Korely runs that table as a service. The intelligence runs once, at write
time: korely.add chunks the content, embeds it, extracts
entities into a typed graph, and extracts (subject, predicate, object)
facts with contradiction checking and bi-temporal validity, about a tenth
of a cent of intelligence per memory, all included in your plan. Reads
are retrieval, not generation. The differentiator is on the read side:
get_context assembles your end user's active typed facts into
a prompt-ready block, and the fact store answers point-in-time questions
with deterministic SQL, all already scoped by user_id.
Semantic search (cosine over embeddings) is the secondary
recall path.
Here is the before and after. First, a lean version of the DIY path:
# before: your write + read path (pgvector + psycopg)import psycopgfrom pgvector.psycopg import register_vector
conn = psycopg.connect("postgresql://localhost/agentdb")register_vector(conn)
def remember(text: str, user_id: str): for chunk in split_into_chunks(text): # your chunking strategy vec = embed(chunk) # your embedding provider conn.execute( "INSERT INTO memories (user_id, content, embedding)" " VALUES (%s, %s, %s)", (user_id, chunk, vec), ) conn.commit()
def recall(query: str, user_id: str, limit: int = 5): vec = embed(query) return conn.execute( """ SELECT content, 1 - (embedding <=> %s) AS score FROM memories WHERE user_id = %s -- the filter you can never forget ORDER BY embedding <=> %s LIMIT %s """, (vec, user_id, vec, limit), ).fetchall()And that is the simple version: vector-only, no full-text leg, no entity graph, no contradiction handling, no deletion audit. Each row of the table above adds code here. The same write and read with Korely:
# after: the whole pipeline, in four lines
from korely_memory import Korely
korely = Korely(api_key="kor_live_...")
korely.add("Prefers invoices as PDF, replies fastest before 10am CET.", user_id="customer-4812")
results = korely.search("invoice preferences", user_id="customer-4812")
The Python and Node SDKs are live: pip install korely-memory /
npm install korely-memory. See the
SDK reference for the full method list.
Two more calls cover the layers the DIY snippet never had. The fact store answers point-in-time questions with deterministic SQL, typically under 50 ms, no model calls on the read path:
# typed facts with bi-temporal validity (works on every plan, incl. Hobby)facts = korely.get_facts(entity="Northwind Hosting", user_id="customer-4812")print(facts[0].subject, facts[0].predicate, facts[0].object)# Northwind Hosting costs 55 euro per monthprint(facts[0].invalid_at) # None — this is the live value# the superseded 50-euro fact is still readable with include_invalidated=True
facts = korely.get_facts(entity="Northwind Hosting", as_of="2026-05-01")print(facts[0].object) # 50 euro per month
And get_context assembles a prompt-ready block, profile plus
relevant facts plus relevant memories, within a token budget. Assembly is
deterministic retrieval and formatting, not generation; your agent's own
model does the reasoning:
ctx = korely.get_context( query="plan infra budget", user_id="customer-4812", token_budget=800,)
print(ctx.tokens) # 642print(ctx.sources) # ["fct_b91e", "mem_8f2c1a"]
messages = [ {"role": "system", "content": f"You are a helpful assistant.\n\n{ctx.context}"}, {"role": "user", "content": user_message},]Migrate your existing corpus
Your data is already in a database you control, so the export is a
SELECT. The import is POST /v1/batch: up to 500
memory objects per request, same shape as POST /v1/memories,
processed asynchronously. Each imported item runs the full write
pipeline, so the graph and the fact store fill in as the batch completes.
Import source documents, not your chunks. Korely chunks on ingest. If your table stores pre-split chunks, group them back into their parent document first; chunking chunks a second time fragments meaning and degrades retrieval. Import at the granularity a human would call "one memory" or "one document".
rows = conn.execute( "SELECT user_id, content, metadata FROM documents").fetchall()
memories = [ { "content": content, "user_id": user_id, "metadata": (metadata or {}) | {"source": "diy_migration"}, } for user_id, content, metadata in rows]
# up to 500 per request, async, returns a batch idfor i in range(0, len(memories), 500): job = korely.batch(memories[i : i + 500]) print(job.id, job.status, job.received) # batch_4e1aa0 processing 500
job = korely.batch_status("batch_4e1aa0")print(job.status, job.imported, job.failed) # completed 500 0
For a small corpus, or if you want per-item results synchronously, a
plain add loop does the same thing one memory at a time:
for user_id, content, metadata in rows: memory = korely.add(content, user_id=user_id, metadata=metadata or {}) print(memory.id, len(memory.facts)) # mem_3fa1c9 2Then verify with a query you know the old corpus can answer, scoped to one end user:
curl -X POST https://api.korely.ai/v1/memories/search \ -H "Authorization: Bearer kor_live_..." \ -H "Content-Type: application/json" \ -d '{"query": "invoice preferences", "user_id": "customer-4812", "limit": 5}'
# 200 OK{ "results": [ {"id": "mem_3fa1c9", "score": 0.91, "snippet": "Prefers invoices as PDF, replies fastest before 10am CET.", "metadata": {"source": "diy_migration"}} ]}
The scoping model carries over directly from your tenant column:
user_id is your end user (free-form, and end users are
unlimited on every tier), agent_id namespaces your
application, run_id sub-scopes one session. Every read
filtered by user_id returns only that user's memory, and
deleting a user is one call, korely.delete_all(user_id=...),
which invalidates every memory and fact in that scope and returns an
audit record. Your users can see and erase what the agent knows about
them; the full model is in
human in the loop.
Migration gotchas
Things that catch teams during the first week. None are surprising once you know them.
- 401 on every call. The header must be
Authorization: Bearer kor_live_.... A missingBearerprefix, a trailing space in the key, or passing the key as a query parameter all return 401{"code":"invalid_key","message":"Invalid or missing API key"}. Check withGET /v1/pingfirst (auth required); a 200 returns{"ok":true,"tier":"...","scopes":[...]}and confirms the key is valid. - 403 agent_cap_exceeded after the first few agent IDs.
Each plan has an agent cap: Hobby 2, Developer 10, Team 100, Scale 500.
If you used fine-grained
agent_idstrings in your DIY stack (one per feature, one per deployment, one per environment), you may hit the cap quickly. Consolidate to oneagent_idper logical application and userun_idfor sub-scoping. - 429 during bulk import. The Hobby plan allows 1 000
writes per month; Developer 5 000; Team 20 000. A corpus import counts
toward that quota. Import on the right plan or spread the import across
months. When you hit the cap, writes return 429
{"code":"quota_exceeded","message":"Monthly memory write limit reached (1000). Upgrade to add more."}. The write-quota 429 carries noRetry-Afterheader (the only fix is to upgrade or wait for the monthly reset); only the per-second rate-limit 429 returns aRetry-Afterheader in seconds. Reads are not affected. - 409 stale_write on concurrent updates. If two processes
call
updateon the same memory ID at roughly the same time, the second write returns 409stale_write. This is an optimistic concurrency guard. In the DIY world this was a silent last- write-wins; here it surfaces explicitly. Fetch the current version first and passexpected_updated_at, or retry the update after a short back-off. - The SDK add() convenience does not expose a timestamp.
Its parameters are
content,user_id,agent_id,run_id, andmetadata. If you are backfilling a corpus and want to preserve original creation times, callPOST /v1/memoriesdirectly: the REST body accepts atimestamp, which Korely stores as the fact'svalid_from(so you can backdate via the API). Thebatchimport has no per-item timestamp; if you need backdating at scale, loop direct RESTaddcalls with atimestampeach. - search() has no time_filter or include_history flag.
Parameters are
query,user_id,agent_id, andlimit. Point-in-time queries belong to the fact store viaget_facts(as_of=...), not to the memory search surface. - A superseded fact has vanished from your reads. Facts
that have been superseded by a contradiction are marked invalidated
(
invalid_atset), not deleted, and they drop out of the defaultget_factsresult. There is no per-id fact route; to see retired values passinclude_invalidated=Truetoget_facts(orGET /v1/facts?entity=...&include_invalidated=true), which surfaces the full supersede chain. A fact only disappears for good when the end user is forgotten via a GDPR delete.
When DIY is the right call
Sometimes owning the pipeline is correct, and pretending otherwise would make this page useless. Keep building your own when:
- Your data cannot leave your infrastructure. Korely's API is EU-hosted and your data stays in the EU, but it is our infrastructure, not yours. If the requirement is your own racks or an air-gapped network, a hosted memory API is the wrong shape no matter whose it is.
- Search is your core product. If ranking quality is the thing your customers pay you for, you want to own the ranker, the eval harness, and every tuning knob. A general-purpose memory layer is built for agents that need good retrieval, not for teams whose moat is retrieval itself.
- You need custom rankers or exotic retrieval. Domain-specific scoring functions, learned sparse retrieval, or ranking signals that come from your own product data are all reasons to keep the query path in your codebase.
If none of these describe you, the table at the top of this page is the maintenance contract you are choosing between. Four calls are cheaper.
Next steps
- Quickstart: wire Korely into your stack and run your first search in five minutes.
- Cookbook: a
chatbot that remembers:
get_contextplusaddin a working chat loop. - API reference: the complete REST contract behind every call on this page.
- Architecture: where the write-time intelligence runs and why the read path stays deterministic.
- Migrating a large corpus or an unusual schema? Email [email protected]. We read every message.