Context & facts

Get context

Assemble a prompt-ready Markdown block, the end user's known facts plus query-relevant memories, under a token budget. Deterministic retrieval and formatting only; no LLM generation.

GET /v1/context

SDK: korely.get_context(query, ...). This is the one call you drop into a prompt. Korely cosine-ranks the end user's typed facts, searches the query-relevant memories, packs both under your token budget, and hands back a Markdown string you can paste straight into the model's context. Nothing is generated, the assembly is deterministic.

Authentication

HTTP header, required: Authorization: Bearer kor_live_.... The key must carry the memories:read scope.

Query parameters

Parameter	Type	Required	Description
`query`	string	Required	The query the context block is retrieved and ranked against, facts are cosine-ranked and memories are searched against it. `min_length=1`, `max_length=2000`.
`user_id`	string	Optional	The end user to build context for (maps to `end_user_id`). When omitted, facts are not filtered by end user and the memory search runs across all of the API key owner's end users. Default `null`.
`agent_id`	string	Optional	Optional agent scope filter applied to facts (`AgentFact.agent_id`). Default `null`.
`token_budget`	integer	Optional	Total token budget for the assembled block. Facts get roughly 50% of the budget; memories fill the rest. `ge=50`, `le=8000`. Default `800`.

Example request

curl -G https://api.korely.ai/v1/context \
  -H "Authorization: Bearer kor_live_..." \
  --data-urlencode "query=What does Giulia want for the weekly sync?" \
  --data-urlencode "user_id=customer-giulia-4812" \
  --data-urlencode "agent_id=support-bot" \
  --data-urlencode "token_budget=800"

from korely_memory import Korely

korely = Korely(api_key="kor_live_...", region="eu")

context = korely.get_context(
    query="What does Giulia want for the weekly sync?",
    user_id="customer-giulia-4812",
    agent_id="support-bot",
    token_budget=800,
)

print(context.context)   # paste straight into your prompt
print(context.tokens)    # 91
print(context.sources)   # ["fct_a1", "fct_a2", "mem_8f2c1a"]

import { Korely } from "korely-memory";

const korely = new Korely({ apiKey: "kor_live_..." });

const context = await korely.getContext({
  query: "What does Giulia want for the weekly sync?",
  user_id: "customer-giulia-4812",
  agent_id: "support-bot",
  token_budget: 800,
});

console.log(context.context); // paste straight into your prompt
console.log(context.tokens);  // 91
console.log(context.sources); // ["fct_a1", "fct_a2", "mem_8f2c1a"]

Response

200 OK. A prompt-ready Markdown block, its estimated token count, and the ordered source ids that contributed to it.

{
  "context": "_The facts below are a compact profile of the user; the memories are the verbatim source. If a fact does not directly answer the question, rely on the memories._

## Known facts
- Giulia prefers async standups (since 2026-05-12)
- Giulia works_at Acme Corp (since 2026-04-01)

## Relevant memories
- Giulia asked to move the weekly sync to Mondays and keep it under 30 minutes.",
  "tokens": 91,
  "sources": ["fct_a1", "fct_a2", "mem_8f2c1a"]
}

Field	Type	Description
`context`	string	The Markdown block: an optional reader-trust note, then `## Known facts` (top facts, cosine-ranked, capped at 10 or 50% of the budget), then `## Relevant memories`. Empty string when nothing fits.
`tokens`	integer	Estimated token count of the `context` string (a `char / 4` heuristic).
`sources`	array<string>	Ordered list of the source public ids that contributed to the block: fact ids (`fct_`) first, then memory ids (`mem_`).

Errors

Status	Code	Cause
`401`	`invalid_key`	The `Authorization: Bearer` credentials are missing, or the `kor_live_` key does not resolve to a live API key. Response carries `WWW-Authenticate: Bearer`.
`403`	`forbidden`	The API key does not carry the `memories:read` scope.
`429`	`rate_limit_exceeded`	Per-tier fixed-window minute/hour/day rate limit exceeded. Response carries `Retry-After`, `X-RateLimit-Limit`, and `X-RateLimit-Remaining` headers.
`429`	`quota_exceeded`	Monthly query quota reached (tier `queries_per_month` plus a 10% grace). Upgrade for more.
`422`	`invalid_request`	Request validation failed, `query` is missing or violates `min_length=1`/`max_length=2000`, or `token_budget` falls outside `ge=50`/`le=8000`.

Notes

Read-only. A plain GET, no soft-delete or pagination semantics on this endpoint. It is deterministic retrieval and formatting only; there is no LLM generation.
Budget split. Facts are capped at 50% of token_budget and at the top 10 by cosine relevance. Memories (the verbatim source) fill the remainder; the top memory may be truncated at a sentence boundary to fit.
Budget is clamped. token_budget is clamped server-side to [50, 8000] even though the query parameter already enforces the same bounds.
Reader-trust note. The leading guidance note is prepended only when facts exist and the budget is at least 150 tokens.
Graceful degradation. When embeddings are unavailable, fact ranking falls back to most-recent-active. If the memory search is unavailable, the block degrades to facts-only rather than erroring.
End-user scope. user_id is passed as the end-user scope (end_user_id); omitting it builds context across all of the API key owner's end users.
Counts as a read. Every successful call records usage and counts against your monthly read quota.

Get context, guide, the narrative walkthrough with context.
Get facts, the raw typed facts for an end user.
Search memories, the underlying memory search.
Add a memory, write the memories this block is built from.