Migrate from DIY RAG

If your agent remembers things, somewhere in your codebase there is a vector table, an embedding call, and a similarity query. It probably took an afternoon to build and it works. This page is for the moment after that: it lays out what the pipeline asks of you over the following months, shows the same outcomes as a handful of API calls, and walks through importing the corpus you already have.

The short version: korely.add replaces your write pipeline, korely.search replaces your retrieval query, korely.get_facts and korely.get_context give you layers a vector table does not have. Migration is a SELECT from your store and a POST /v1/batch into ours, scoped by user_id. There is also an honest section at the end about when you should keep your own pipeline.

Concept mapping

Every DIY retrieval pipeline has the same moving parts under different names. Here is how they map to Korely concepts.

DIY concept	Korely equivalent	Notes
Vector row / embedding document	Memory (`mem_...`)	Korely chunks, embeds, and stores the canonical text. You never touch raw vectors.
Tenant column (`WHERE user_id = ...`)	`user_id` parameter	Passed as a query parameter on every read and write. End users are unlimited on every plan; the isolation is enforced at the API layer.
Application / service namespace	`agent_id` parameter	Optional second scope. Use it if you run multiple agents sharing one API key and want to keep their memories separate.
Session / conversation window	`run_id` parameter	Sub-scopes a single run inside `agent_id` + `user_id`. Useful for chatbots that want to separate distinct conversations.
Metadata dict / extra columns	`metadata` field on add / search	Arbitrary JSON attached at write time, returned in search results. Not indexed for semantic search; use it for source tags, timestamps, or any structured label you want back on retrieval.
Entity table / knowledge graph	Entity graph (automatic)	Built at write time via GLiNER extraction. Queried via `get_related` in the MCP surface and via the fact store. No separate write step required.
Assertion / slot store	Fact (`fct_...`)	Typed (subject, predicate, object) triple with bi-temporal validity. Extracted automatically on `add`; queried via `get_facts`. Superseded facts are retired, not deleted; time-travel queries (`as_of`) see the old value.
DELETE FROM memories WHERE user_id = ...	`delete_all(user_id=...)`	Invalidates all memories and facts in scope and returns an audit record. One call, one audit trail.
Bulk import / ETL job	`batch` / `POST /v1/batch`	Up to 500 objects per request, processed asynchronously. Poll status via `batch_status(batch_id)` or `GET /v1/batch/{id}`.

Call mapping

If you have existing code and want to see the direct substitution, the table below maps the most common DIY operations to their Korely SDK and REST equivalents.

DIY operation	Python SDK	REST
INSERT a new memory	`korely.add(content, user_id=...)`	`POST /v1/memories`
Similarity search	`korely.search(query, user_id=...)`	`POST /v1/memories/search`
Fetch one memory by id	`korely.get("mem_8f2c1a")`	`GET /v1/memories/mem_8f2c1a`
List all memories for a user	`korely.get_all(user_id=...)`	`GET /v1/memories?user_id=...`
Update a memory's text	`korely.update("mem_8f2c1a", content=...)`	`PATCH /v1/memories/mem_8f2c1a`
Delete one memory	`korely.delete("mem_8f2c1a")`	`DELETE /v1/memories/mem_8f2c1a`
Delete all data for a user	`korely.delete_all(user_id=...)`	`DELETE /v1/users/{end_user}/memories`
History / audit of a memory	`korely.history("mem_8f2c1a")`	`GET /v1/memories/mem_8f2c1a/history`
Read typed assertions / slots	`korely.get_facts(entity=..., user_id=...)`	`GET /v1/facts?entity=...&user_id=...`
Assemble prompt context	`korely.get_context(query=..., user_id=...)`	`GET /v1/context?query=...&user_id=...`
Bulk import	`korely.batch([...])` + `korely.batch_status(batch_id)`	`POST /v1/batch` + `GET /v1/batch/{id}`

Node.js SDK naming: methods use camelCase equivalents of the Python names. get_context becomes getContext, delete_all becomes deleteAll, batch_status becomes batchStatus, and so on. The parameters and return shapes are identical.

What a DIY memory pipeline involves over time

None of this is exotic. It is ordinary engineering, and every team that runs its own retrieval stack does some version of it. The point is not that any single item is hard; the point is that together they are a standing maintenance contract, and the work recurs as your corpus, your models, and your user base change.

Concern	What it takes over time
Chunking strategy	Pick split sizes and overlap, special-case headings, code blocks, and tables. When retrieval misses, the fix is often a re-chunk, and a re-chunk means a re-embed of everything downstream.
Embedding refresh	Embedding models get deprecated and replaced. A model swap means re-embedding the whole corpus, rebuilding the index, and versioning vectors so old and new never mix inside one query.
Hybrid search tuning	Vector-only search misses exact identifiers, ticket numbers, and names; full-text-only misses paraphrase. Fusing both introduces rank-fusion weights that you tune, and re-tune as the corpus grows.
Entity dedup	"ACME", "Acme Corp", and "acme-corp" are three different strings. Without alias canonicalization they become three disconnected islands of memory, and the dedup job becomes its own queue.
Contradiction handling	"The price is now 55" should supersede "the price is 50", not sit next to it with a similar score. Detecting the conflict, retiring the old statement, and keeping it as history is a write-time system of its own.
Per-user isolation	Every read needs a tenant filter that can never be forgotten. One missing `WHERE user_id = ...` clause puts one customer's memory into another customer's prompt.
Deletion flows	Deleting a user means rows, chunks, vectors, index entries, extracted facts, and caches, plus a record that proves it happened. "DELETE FROM memories" is the easy 20 percent.

If memory is a feature of your product rather than the product itself, this table is overhead between you and the thing you are actually building.

The same outcomes in four calls

Korely runs that table as a service. The intelligence runs once, at write time: korely.add chunks the content, embeds it, extracts entities into a typed graph, and extracts (subject, predicate, object) facts with contradiction checking and bi-temporal validity, about a tenth of a cent of intelligence per memory, all included in your plan. Reads are retrieval, not generation. The differentiator is on the read side: get_context assembles your end user's active typed facts into a prompt-ready block, and the fact store answers point-in-time questions with deterministic SQL, all already scoped by user_id. Semantic search (cosine over embeddings) is the secondary recall path.

Here is the before and after. First, a lean version of the DIY path:

# before: your write + read path (pgvector + psycopg)
import psycopg
from pgvector.psycopg import register_vector

conn = psycopg.connect("postgresql://localhost/agentdb")
register_vector(conn)

def remember(text: str, user_id: str):
    for chunk in split_into_chunks(text):     # your chunking strategy
        vec = embed(chunk)                    # your embedding provider
        conn.execute(
            "INSERT INTO memories (user_id, content, embedding)"
            " VALUES (%s, %s, %s)",
            (user_id, chunk, vec),
        )
    conn.commit()

def recall(query: str, user_id: str, limit: int = 5):
    vec = embed(query)
    return conn.execute(
        """
        SELECT content, 1 - (embedding <=> %s) AS score
        FROM memories
        WHERE user_id = %s          -- the filter you can never forget
        ORDER BY embedding <=> %s
        LIMIT %s
        """,
        (vec, user_id, vec, limit),
    ).fetchall()

And that is the simple version: vector-only, no full-text leg, no entity graph, no contradiction handling, no deletion audit. Each row of the table above adds code here. The same write and read with Korely:

korely_memory.py python

# after: the whole pipeline, in four lines
from korely_memory import Korely

korely = Korely(api_key="kor_live_...")
korely.add("Prefers invoices as PDF, replies fastest before 10am CET.", user_id="customer-4812")
results = korely.search("invoice preferences", user_id="customer-4812")

The Python and Node SDKs are live: pip install korely-memory / npm install korely-memory. See the SDK reference for the full method list.

Two more calls cover the layers the DIY snippet never had. The fact store answers point-in-time questions with deterministic SQL, typically under 50 ms, no model calls on the read path:

# typed facts with bi-temporal validity (works on every plan, incl. Hobby)
facts = korely.get_facts(entity="Northwind Hosting", user_id="customer-4812")
print(facts[0].subject, facts[0].predicate, facts[0].object)
# Northwind Hosting costs 55 euro per month
print(facts[0].invalid_at)   # None — this is the live value
# the superseded 50-euro fact is still readable with include_invalidated=True

facts = korely.get_facts(entity="Northwind Hosting", as_of="2026-05-01")
print(facts[0].object)       # 50 euro per month

And get_context assembles a prompt-ready block, profile plus relevant facts plus relevant memories, within a token budget. Assembly is deterministic retrieval and formatting, not generation; your agent's own model does the reasoning:

ctx = korely.get_context(
    query="plan infra budget",
    user_id="customer-4812",
    token_budget=800,
)

print(ctx.tokens)   # 642
print(ctx.sources)  # ["fct_b91e", "mem_8f2c1a"]

messages = [
    {"role": "system", "content": f"You are a helpful assistant.\n\n{ctx.context}"},
    {"role": "user", "content": user_message},
]

Migrate your existing corpus

Your data is already in a database you control, so the export is a SELECT. The import is POST /v1/batch: up to 500 memory objects per request, same shape as POST /v1/memories, processed asynchronously. Each imported item runs the full write pipeline, so the graph and the fact store fill in as the batch completes.

Import source documents, not your chunks. Korely chunks on ingest. If your table stores pre-split chunks, group them back into their parent document first; chunking chunks a second time fragments meaning and degrades retrieval. Import at the granularity a human would call "one memory" or "one document".

rows = conn.execute(
    "SELECT user_id, content, metadata FROM documents"
).fetchall()

memories = [
    {
        "content": content,
        "user_id": user_id,
        "metadata": (metadata or {}) | {"source": "diy_migration"},
    }
    for user_id, content, metadata in rows
]

# up to 500 per request, async, returns a batch id
for i in range(0, len(memories), 500):
    job = korely.batch(memories[i : i + 500])
    print(job.id, job.status, job.received)  # batch_4e1aa0 processing 500

job = korely.batch_status("batch_4e1aa0")
print(job.status, job.imported, job.failed)  # completed 500 0

For a small corpus, or if you want per-item results synchronously, a plain add loop does the same thing one memory at a time:

for user_id, content, metadata in rows:
    memory = korely.add(content, user_id=user_id, metadata=metadata or {})
    print(memory.id, len(memory.facts))  # mem_3fa1c9 2

Then verify with a query you know the old corpus can answer, scoped to one end user:

curl -X POST https://api.korely.ai/v1/memories/search \
  -H "Authorization: Bearer kor_live_..." \
  -H "Content-Type: application/json" \
  -d '{"query": "invoice preferences", "user_id": "customer-4812", "limit": 5}'

# 200 OK
{
  "results": [
    {"id": "mem_3fa1c9", "score": 0.91,
     "snippet": "Prefers invoices as PDF, replies fastest before 10am CET.",
     "metadata": {"source": "diy_migration"}}
  ]
}

The scoping model carries over directly from your tenant column: user_id is your end user (free-form, and end users are unlimited on every tier), agent_id namespaces your application, run_id sub-scopes one session. Every read filtered by user_id returns only that user's memory, and deleting a user is one call, korely.delete_all(user_id=...), which invalidates every memory and fact in that scope and returns an audit record. Your users can see and erase what the agent knows about them; the full model is in human in the loop.

Migration gotchas

Things that catch teams during the first week. None are surprising once you know them.

401 on every call. The header must be Authorization: Bearer kor_live_.... A missing Bearer prefix, a trailing space in the key, or passing the key as a query parameter all return 401 {"code":"invalid_key","message":"Invalid or missing API key"}. Check with GET /v1/ping first (auth required); a 200 returns {"ok":true,"tier":"...","scopes":[...]} and confirms the key is valid.
403 agent_cap_exceeded after the first few agent IDs. Each plan has an agent cap: Hobby 2, Developer 10, Team 100, Scale 500. If you used fine-grained agent_id strings in your DIY stack (one per feature, one per deployment, one per environment), you may hit the cap quickly. Consolidate to one agent_id per logical application and use run_id for sub-scoping.
429 during bulk import. The Hobby plan allows 1 000 writes per month; Developer 5 000; Team 20 000. A corpus import counts toward that quota. Import on the right plan or spread the import across months. When you hit the cap, writes return 429 {"code":"quota_exceeded","message":"Monthly memory write limit reached (1000). Upgrade to add more."}. The write-quota 429 carries no Retry-After header (the only fix is to upgrade or wait for the monthly reset); only the per-second rate-limit 429 returns a Retry-After header in seconds. Reads are not affected.
409 stale_write on concurrent updates. If two processes call update on the same memory ID at roughly the same time, the second write returns 409 stale_write. This is an optimistic concurrency guard. In the DIY world this was a silent last- write-wins; here it surfaces explicitly. Fetch the current version first and pass expected_updated_at, or retry the update after a short back-off.
The SDK add() convenience does not expose a timestamp. Its parameters are content, user_id, agent_id, run_id, and metadata. If you are backfilling a corpus and want to preserve original creation times, call POST /v1/memories directly: the REST body accepts a timestamp, which Korely stores as the fact's valid_from (so you can backdate via the API). The batch import has no per-item timestamp; if you need backdating at scale, loop direct REST add calls with a timestamp each.
search() has no time_filter or include_history flag. Parameters are query, user_id, agent_id, and limit. Point-in-time queries belong to the fact store via get_facts(as_of=...), not to the memory search surface.
A superseded fact has vanished from your reads. Facts that have been superseded by a contradiction are marked invalidated (invalid_at set), not deleted, and they drop out of the default get_facts result. There is no per-id fact route; to see retired values pass include_invalidated=True to get_facts (or GET /v1/facts?entity=...&include_invalidated=true), which surfaces the full supersede chain. A fact only disappears for good when the end user is forgotten via a GDPR delete.

When DIY is the right call

Sometimes owning the pipeline is correct, and pretending otherwise would make this page useless. Keep building your own when:

Your data cannot leave your infrastructure. Korely's API is EU-hosted and your data stays in the EU, but it is our infrastructure, not yours. If the requirement is your own racks or an air-gapped network, a hosted memory API is the wrong shape no matter whose it is.
Search is your core product. If ranking quality is the thing your customers pay you for, you want to own the ranker, the eval harness, and every tuning knob. A general-purpose memory layer is built for agents that need good retrieval, not for teams whose moat is retrieval itself.
You need custom rankers or exotic retrieval. Domain-specific scoring functions, learned sparse retrieval, or ranking signals that come from your own product data are all reasons to keep the query path in your codebase.

If none of these describe you, the table at the top of this page is the maintenance contract you are choosing between. Four calls are cheaper.

Next steps

Quickstart: wire Korely into your stack and run your first search in five minutes.
Cookbook: a chatbot that remembers: get_context plus add in a working chat loop.
API reference: the complete REST contract behind every call on this page.
Architecture: where the write-time intelligence runs and why the read path stays deterministic.
Migrating a large corpus or an unusual schema? Email [email protected]. We read every message.