Korely

A chatbot that remembers every customer

Giulia opens your support chat for the third time this month. A stateless bot asks for her email again, asks what plan she is on again, and has no idea she already reported the same shipping problem twice. She types the whole story a third time.

With Korely in the loop, the same bot opens with: "Hi Giulia, I can see your Advanced plan and the open shipping issue from last week. I'll follow up by email as usual, unless you'd prefer otherwise." Same LLM, same prompt template. The difference is four API calls. This cookbook walks through all four at the function level.

The snippets below use the Python SDK (pip install korely-memory). The same four calls work over the REST API and the Node SDK (npm install korely-memory).

The four calls

1. On chat open: load who she is

Before the first LLM turn, fetch the customer's active facts. This is a deterministic read, pure SQL and graph lookups with no model in the path, typically under 50 ms. It fits inside your time-to-first-token budget.

from korely_memory import Korely
korely = Korely(api_key="kor_live_...", region="eu")
facts = korely.get_facts(user_id="customer-giulia-4812")

The response is a flat list of her active facts: typed (subject, predicate, object) triples the graph extracts automatically. A fact is live while its invalid_at is null; predicate is the normalized verb (the raw phrasing she used is kept in predicate_raw), and each fact carries a predicate_family for grouping. (Need them grouped by family instead? Call get_profile.)

{
"facts": [
{ "id": "fct_a1", "subject": "customer-giulia-4812", "predicate": "has_plan",
"object": "Advanced", "predicate_family": "other", "predicate_raw": "has_plan",
"valid_from": "2026-05-02T09:14:00Z", "invalid_at": null,
"invalidated_by": null, "source_memory_id": "mem_91a2" },
{ "id": "fct_a2", "subject": "customer-giulia-4812", "predicate": "has_open_issue",
"object": "shipping delay, order #88412", "predicate_family": "events",
"predicate_raw": "has_open_issue", "valid_from": "2026-06-03T16:40:00Z",
"invalid_at": null, "invalidated_by": null, "source_memory_id": "mem_2b91d4" },
{ "id": "fct_a3", "subject": "customer-giulia-4812", "predicate": "likes",
"object": "email follow-ups", "predicate_family": "preferences",
"predicate_raw": "prefers", "valid_from": "2026-05-19T11:02:00Z",
"invalid_at": null, "invalidated_by": null, "source_memory_id": "mem_5c0d" }
],
"total": 3
}

Render that into your system prompt as a compact context block and the bot greets her like it knows her, because it does.

Turn 0

Giulia opens the chat

  • Third visit this month
  • Your bot resolves her to user_id "customer-giulia-4812"

Korely

get_facts(user_id=...)

  • Active typed facts only (invalid_at = null)
  • Deterministic read, typically under 50 ms

Turn 1

Personalized greeting

  • "I can see your Advanced plan and the open shipping issue"
  • Zero questions she already answered

2. During the chat: search her history

When she mentions the shipping problem, don't make the LLM guess. Search her memories, scoped to her and only her:

hits = korely.search(
"shipping complaint",
user_id="customer-giulia-4812",
limit=5,
)

What comes back is ranked retrieval, not generated text — semantic vector search (cosine over embeddings) scoped to her user_id. The only model call on the read path is the query embedding, a fraction of a hundredth of a cent. Your bot's own model does the reasoning over the results. Each hit is {id, score, snippet, user_id, agent_id, metadata} — the snippet is a short excerpt (≤280 chars), not the full memory:

{
"results": [
{ "id": "mem_2b91d4", "score": 0.93,
"snippet": "Order #88412 delayed at the Bologna hub, second report. Promised an email update within 48h.",
"user_id": "customer-giulia-4812", "agent_id": "support-bot", "metadata": {} },
{ "id": "mem_77c0ae", "score": 0.81,
"snippet": "First shipping complaint for order #88412. Courier marked the address as incomplete.",
"user_id": "customer-giulia-4812", "agent_id": "support-bot", "metadata": {} }
]
}

Filters are additive (AND): user_id alone searches everything stored about Giulia across every conversation; add run_id to narrow to one session. Another customer's memories can never leak into her results. The scope is enforced server-side, not by prompt discipline.

3. On new info: write it down

Giulia mentions she'd rather not be called on the phone. Store it:

result = korely.add(
"Prefers email follow-ups, not phone",
user_id="customer-giulia-4812",
agent_id="support-bot",
)

The call returns the stored memory immediately. Fact extraction runs asynchronously, so facts is often empty on the immediate response and populates a few seconds later (read it back with get_facts at the next chat open). This is the bi-temporal part: if Giulia previously preferred phone calls, that old fact is not deleted when the new one lands. It gets an invalid_at timestamp, stops being served by get_facts, and survives for audit and point-in-time queries (as_of). Each extracted fact on the write shape lists the ids it superseded in its invalidated array:

{
"id": "mem_d3f7",
"content": "Prefers email follow-ups, not phone",
"user_id": "customer-giulia-4812",
"agent_id": "support-bot",
"run_id": null,
"metadata": {},
"created_at": "2026-06-11T10:22:00Z",
"updated_at": "2026-06-11T10:22:00Z",
"facts": [
{ "id": "fct_e1", "subject": "customer-giulia-4812", "predicate": "likes",
"object": "email follow-ups", "predicate_family": "preferences",
"valid_from": "2026-06-11T10:22:00Z", "invalidated": ["fct_b0"] }
]
}

4. On account change: let the contradiction engine work

You don't write supersede logic. When Giulia upgrades and the bot (or your billing webhook) writes "Giulia upgraded to the Advanced plan", the two-stage contradiction detector finds the existing "Giulia has plan Basic" fact, same predicate with a conflicting object, and supersedes it:

{
"id": "mem_e8a1",
"content": "Giulia upgraded to the Advanced plan",
"user_id": "customer-giulia-4812",
"agent_id": "support-bot",
"metadata": {},
"created_at": "2026-06-11T10:25:00Z",
"updated_at": "2026-06-11T10:25:00Z",
"facts": [
{ "id": "fct_f2", "subject": "Giulia", "predicate": "has_plan",
"object": "Advanced", "predicate_family": "other",
"valid_from": "2026-06-11T10:25:00Z", "invalidated": ["fct_a0"] }
]
}

The write path is where the intelligence runs: document and chunk embeddings, entity extraction on our own infrastructure, typed-fact extraction with contradiction checking and bi-temporal validity. About a tenth of a cent per memory, all included. Nobody edits anything by hand.

Scoping: one agent, unlimited customers

The scoping model is three free-form strings, and it maps one-to-one onto the scoping you already use elsewhere (see the migration guide):

ParameterWhat it identifiesExample
user_idYour end user. Free-form string, you choose it."customer-giulia-4812"
agent_idYour application's namespace."support-bot"
run_idOne session or conversation."chat-2026-06-11-a"

The part that matters for your bill: a support bot serving 10,000 customers is one agent_id with 10,000 user_id values, and end users are unlimited on every tier, including the free one. What's metered is volume: memories written and searches per month (Hobby: 1k/25k, Developer €19: 5k/250k, Team €79: 25k/1M). Reads are retrieval, not generation, which is why the search quotas are an order of magnitude more generous than the write quotas. No overage billing: we email you at 80%, and past a +10% soft cap you get a clean 429, never a surprise invoice.

Why not just stuff the history into the prompt?

Prompt-stuffing is the default pattern, and it makes chatbots expensive and forgetful at the same time. Three concrete reasons:

  • Token cost. Re-sending a 20-turn history is thousands of tokens, every turn, forever, and it grows with every message. A facts block from get_facts is a few hundred tokens, fetched once at chat open, and it grows with what's worth remembering, not with conversation length.
  • Staleness. A raw history contains both "I'm on Basic" (March) and "I upgraded to Advanced" (May), and you're trusting the LLM to pick the right one every single time. Bi-temporal facts resolve this before the LLM sees the context: get_facts returns only the facts that are valid now.
  • Forgetting. When Giulia says "forget my data", deleting her from conversation logs scattered across your own database is a project. With Korely it's one call, DELETE /v1/users/customer-giulia-4812/memories, and every memory and fact scoped to her user_id is forgotten.

The trust angle

This deployment takes two shapes, and the data-ownership story differs:

  • B2C2C (your end users are also Korely users, think personal assistants writing into someone's own memory store): the end user sees what the bot remembers. The Memory Panel lists every fact, editable and forgettable by the user themselves. Correcting and erasing what the bot knows are product features the user controls, not a support ticket.
  • Pure B2B2C (the Giulia scenario, she's your customer, not ours): you own the data, and the deletion surface is API-side. Your privacy policy, your deletion endpoint wiring, our one-call bulk delete underneath. Data is EU-hosted on our own infrastructure either way.

One honest caveat: the write path (korely.add) runs the extraction pipeline asynchronously, so newly extracted facts land shortly after the write returns, not within the same request. Write fire-and-forget during the chat and rely on get_facts at the next chat open. The read path is the deterministic, fast one.

Where to go next

Temporal facts explains supersede and point-in-time queries under the hood; memory model covers the full scoping model; the REST API reference is the published contract the snippets above come from.