Architecture: reads without an LLM

Korely is a write-once, read-many memory store: all model work runs exactly once when you call add(), and every subsequent read is pure data retrieval with no generation step. The store holds three layers, vectors, a typed entity graph, and bi-temporal subject-predicate-object facts, all indexed at write time so that get_context(), get_facts(), and search() return structured data with no LLM in the loop. The primary recall path is get_context(): it assembles the active typed facts and the most relevant memories for a query into a ready-to-prompt block, which is the differentiator over a plain vector store. For an agent builder this means predictable latency, no surprise model charges on the read path, and a read-to-write quota ratio that matches how agents actually behave: many retrievals for every new memory.

Korely never puts a model between your agent and its memory. No read pays for a generation, adds a second or two of latency, or returns prose that another model wrote. The principle is simple: reads are retrieval, not generation. Your agent already has a model. It does not need ours to read a database.

This page explains exactly what runs on the read path, what runs on the write path, and why that split makes read quotas an order of magnitude more generous than write quotas.

flowchart TD
    A["your agent calls add<br/>POST /v1/memories"] --> B["chunk + embed<br/>document and chunks"]
    A --> C["entity extraction<br/>into typed graph"]
    A --> D["fact extraction<br/>two-stage contradiction check<br/>supersede, never delete<br/>valid_from / invalid_at stamped"]
    B --> P[("persisted store<br/>vectors · graph · facts")]
    C --> P
    D --> P

    P --> R1
    P --> R2
    P --> R3

    subgraph READ ["READ PATH — no model"]
      R1["get_context · get_facts<br/>typed facts assembled from store<br/>deterministic · zero AI"]
      R2["search<br/>query embedding then<br/>vector similarity"]
      R3["get / get_all / users / agents<br/>pure SQL<br/>zero AI"]
    end

    R1 --> OUT["structured JSON<br/>returned to your agent"]
    R2 --> OUT
    R3 --> OUT

    style READ fill:#0f172a,stroke:#334155,color:#e2e8f0
    style P fill:#1e293b,stroke:#475569,color:#e2e8f0
    style OUT fill:#052e16,stroke:#166534,color:#bbf7d0

Write pipeline vs read path — no model runs on the read side

The read path: SQL, vectors you already paid for, and one tiny exception

Korely exposes read operations over the REST API (GET /v1/context, GET /v1/facts, GET /v1/memories, POST /v1/memories/search) and mirrors them 1:1 in the SDK. Of these, all except search are pure SQL lookups with zero AI calls:

Operation	What runs
`GET /v1/context` — `korely.get_context()`	The moat recall path. Assembles the active typed facts and the relevant memories for a query into a ready-to-prompt block. Pure retrieval, no generation.
`GET /v1/facts` — `korely.get_facts()`	Deterministic filter and sort over the typed fact store. Typically under 50 ms.
`GET /v1/memories/{id}` — `korely.get(id)`	Primary-key lookup. One SQL query.
`GET /v1/memories/{id}/history`	Event log for a single memory. Pure SQL over the history table.
`GET /v1/memories` — `korely.get_all()`	Paginated SQL with filters.
`GET /v1/users` — `korely.users()`	List end-user IDs seen in your namespace. Pure SQL.
`GET /v1/agents`	List active agent IDs. Pure SQL.

The one exception is POST /v1/memories/search (SDK: korely.search(query, ...)). It embeds your query, which costs a fraction of a hundredth of a cent. Retrieval is semantic vector similarity (cosine) over the embeddings written at add() time. That query embedding is the only model call on the entire read surface.

No generative model ever composes output on the read path. There is no reranker LLM, no answer synthesis, no summarization layer. Every read endpoint returns structured JSON assembled directly from the store. Your agent's own model does the reasoning over what comes back, which means you choose the model quality, the tone, and the prompt, not us.

Request flow

korely architecture request flow

                  your agent (its own model)
                     |                  ^
           tool call |                  |  structured JSON
                     v                  |  (no LLM composed this)
       +---------------------------------------------+
       |                 READ PATH                    |
       |                                              |
       |  POST /v1/memories/search (korely.search)     |
       |    query embedding, then semantic vector     |
       |    similarity (cosine) over stored vectors    |
       |                                              |
       |  get_context / get_facts and all other       |
       |    read endpoints                            |
       |    pure SQL and fact assembly, zero AI       |
       +---------------------------------------------+
                     ^
                     |  reads only what writes indexed
       +---------------------------------------------+
       |                 WRITE PATH                   |
       |                                              |
       |  chunking -> document + chunk embeddings     |
       |  -> entity extraction -> typed-fact          |
       |  extraction with contradiction checking      |
       |  and bi-temporal validity stamps             |
       +---------------------------------------------+
                     ^
                     |  korely.add() / POST /v1/memories
                  your application

Worked round trip: a search

Your agent asks about a customer. The call, and the typed objects that come back:

korely sdk korely.search

# the call your agent makes
› hits = korely.search("acme renewal objections", limit=3)

# List[SearchHit] — structured objects, no generation
[
  SearchHit(id="mem_9f21ac", score=0.91,
            snippet="Main objection is the per-seat price after the team grew to 40.
   Dana asked for a usage-based quote before the board meeting..."),

  SearchHit(id="mem_b774e0", score=0.86,
            snippet="Renewal window opens July 1. Champion: Dana Reyes (VP Ops).
   Risk flagged: procurement now requires a security review upfront..."),

  SearchHit(id="mem_3c10d8", score=0.79,
            snippet="When a customer cites per-seat cost after headcount growth,
   lead with the usage tier comparison, not a discount..."),
]

Latency note: the query embedding is the only model call in this round trip. Everything after it is a semantic vector similarity lookup over the stored embeddings, returned as structured data. No reranker model runs, no answer is synthesized. Your agent reads the three snippets, decides that mem_9f21ac matters, and follows up with korely.get("mem_9f21ac") (REST: GET /v1/memories/mem_9f21ac), which is a single SQL query with zero AI calls.

Worked round trip: facts

Typed facts are the fastest read in the system because contradiction checking already happened at write time. A facts read is a deterministic filter and sort, typically under 50 ms even with thousands of facts in the store. GET /v1/facts is available on every plan, including the free Hobby tier.

korely sdk korely.get_facts

# the call your agent makes
› facts = korely.get_facts(entity="Dana Reyes")

# List[Fact] — deterministic filter+sort, zero AI, typically <50 ms
[
  Fact(subject="Dana Reyes", predicate="works_at", object="Acme",
       valid_from="2026-02-12", invalid_at=None),
  Fact(subject="Dana Reyes", predicate="role_is", object="VP Ops",
       valid_from="2026-04-03", invalid_at=None),
  # predicate is normalized; the raw verb is kept in predicate_raw
  Fact(subject="Dana Reyes", predicate="likes", predicate_raw="prefers",
       object="usage-based pricing",
       valid_from="2026-06-04", invalid_at=None),
]

Latency note: no embedding, no model, no semantic search. The read is deterministic, so two agents issuing the same call get byte-identical output, which makes facts safe to use inside tool chains that need reproducibility. How facts get their validity intervals is covered in temporal facts.

The write path: where the intelligence runs

All of the model work happens once, when a memory is written. A single korely.add() (REST: POST /v1/memories) triggers the full pipeline:

Chunking and embeddings. The document is split and both document-level and chunk-level embeddings are computed and stored. These are the same vectors that POST /v1/memories/search reuses on every later retrieval, free of charge.
Entity extraction. People, companies, places, and concepts are extracted on our own infrastructure and wired into the typed knowledge graph. See the graph.
Typed-fact extraction. Subject-predicate-object triples are extracted, checked against existing facts for contradictions, and stamped with bi-temporal validity. Conflicts are resolved here, at write time, so reads never have to arbitrate between a stale fact and a fresh one.

All of this costs about a tenth of a cent per memory, and it is all included in the per-memory write price. There are no surprise model charges on top.

Why read quotas are an order of magnitude more generous

The economics follow directly from the architecture. A write runs embeddings, entity extraction, and fact extraction. A read runs SQL, plus at most one tiny query embedding. So the pricing mirrors reality: write quotas are sized around the work writes actually do, and read quotas are an order of magnitude more generous, because reads are nearly free to serve.

For agent workloads this is the shape you want. A typical agent reads its memory many times for every time it writes, often dozens of retrievals per new memory. With Korely that ratio costs you almost nothing, and it never pushes you toward caching reads or batching lookups to stay inside a quota. See pricing for the exact numbers per plan.

Everything described on this page, the API, the vector store, the graph, and the write-time models, runs on our own infrastructure, EU-hosted.