Core operations

Get context

Retrieve a single, prompt-ready context block assembled from the most relevant memories and current facts — one call, right before your LLM generates.

get_context is the read call an agent makes immediately before sending a prompt to an LLM. It searches across all stored memories and current facts for the given query, assembles them into a single formatted string that fits inside your token budget, and returns the source ids so you can surface citations. The context string is ready to drop directly into a system prompt or user message — no post-processing required.

get_context is the primary recall path, and the one place Korely's moat shows up in a single call. It does not just rank raw text: it assembles the user's currently-valid typed facts — (subject, predicate, object) triples with bi-temporal validity, so superseded facts are excluded and only what holds right now is included — alongside the most relevant memories. The returned sources array mixes fact ids (fct_) and memory ids (mem_), showing exactly which assembled the block.

Unlike search, which runs semantic vector retrieval and returns a list of raw memory hits, get_context ranks, deduplicates, and formats the most relevant material — facts first — into a cohesive block. Use it when you want memory with zero glue code. Use search when you need the raw results to apply your own ranking or filtering logic.

flowchart LR
    A([Agent]) -->|"GET /v1/context?query=..."| B[Korely API]
    B --> C[(Memories)]
    B --> D[(Facts)]
    C --> E[Rank + dedup]
    D --> E
    E --> F[Format to token budget]
    F -->|"{ context, tokens, sources }"| A
    A -->|"system prompt + context"| G([LLM])

get_context assembles ranked memories and current facts into one block before generation.

Request

Endpoint: GET /v1/context. SDK: korely.get_context(*, query, user_id=None, agent_id=None, token_budget=800).

Param	Type	Notes
`query`	string	Required. The text to retrieve relevant context for. Typically the user's latest message or the topic your agent is about to reason over.
`user_id`	string	Optional. Scopes retrieval to memories belonging to a specific end-user. Pass your app's user identifier (e.g. `customer-4812`). If omitted, retrieval is scoped to the agent namespace only.
`agent_id`	string	Optional. Scopes retrieval to a specific agent within the namespace. Useful when multiple agent roles share the same API key.
`token_budget`	integer	Optional. Maximum approximate token count for the returned context string. Default: `800`. Korely packs the highest-scoring content that fits within this budget. Lower values keep context tight for small models; raise it for more coverage.

Example

from korely_memory import Korely

korely = Korely(api_key="kor_live_...")

# Call right before you build the LLM prompt
result = korely.get_context(
    query="What are this user's dietary preferences?",
    user_id="customer-4812",
    token_budget=600,
)

system_prompt = f"""You are a helpful nutrition assistant.

Relevant context about this user:
{result.context}

Answer using the context above when relevant."""

# result.tokens  → 312
# result.sources → ["mem_8f2c1a", "fct_b91e", "mem_3d0f44"]

import { Korely } from "korely-memory";

const korely = new Korely({ apiKey: "kor_live_..." });

// Call right before you build the LLM prompt
const result = await korely.getContext({
  query: "What are this user's dietary preferences?",
  user_id: "customer-4812",
  token_budget: 600,
});

const systemPrompt = `You are a helpful nutrition assistant.

Relevant context about this user:
${result.context}

Answer using the context above when relevant.`;

// result.tokens  → 312
// result.sources → ["mem_8f2c1a", "fct_b91e", "mem_3d0f44"]

# Assembled context block printed to stdout — pipe it into any prompt builder
korely context "What are this user's dietary preferences?" \
  --user-id customer-4812 \
  --token-budget 600

# With --json to get the full structured response
korely context "dietary preferences" \
  --user-id customer-4812 \
  --json

curl -G https://api.korely.ai/v1/context \
  -H "Authorization: Bearer kor_live_..." \
  --data-urlencode "query=What are this user's dietary preferences?" \
  --data-urlencode "user_id=customer-4812" \
  --data-urlencode "token_budget=600"

Response

{
  "context": "User is vegetarian and avoids gluten. Allergic to tree nuts (flagged 2025-11). Prefers high-protein meals. Recent sessions indicate interest in Mediterranean cuisine. Goals: maintain weight, increase energy.",
  "tokens": 312,
  "sources": [
    "mem_8f2c1a",
    "fct_b91e",
    "mem_3d0f44"
  ]
}

Field	Type	Description
`context`	string	A formatted, prompt-ready block of text containing the most relevant memories and facts. Ready to embed directly into a system or user message.
`tokens`	integer	Approximate token count of the returned context string. Always at or below the requested `token_budget`.
`sources`	string[]	Ids of the memories and facts included in the context block. Use these to power citations or to retrieve full objects via search or `GET /v1/memories/{id}`.

Errors

Status	Code string	When it happens
`401`	`invalid_key`	The `Authorization` header is missing, malformed, or the key has been revoked. Re-authenticate and retry.
`403`	`agent_cap_exceeded`	The `agent_id` you passed is new and your plan has reached its agent-namespace limit (2 on Hobby, 10 on Developer, 100 on Team, 500 on Scale). Use an existing `agent_id` or upgrade your plan.
`422`	`invalid_request`	Request validation failed. Typically a missing or wrong-type `query` value, or a non-integer `token_budget`. Like every Korely error, the body is the flat envelope `{"code": "invalid_request", "message": "query: Field required"}`.
`429`	`quota_exceeded`	Your monthly query quota (including the +10% grace) is exhausted. Upgrade your plan or wait for the reset. The monthly-quota 429 does not carry a `Retry-After` header — only a transient per-minute rate-limit 429 would, as integer seconds.

Notes

Idempotency. GET /v1/context is a pure read — calling it multiple times with the same parameters produces the same result (assuming no new memories have been written). It is safe to retry on transient network errors without risk of duplicate writes.
Scoping. Pass user_id in customer-facing agents to restrict retrieval to one end user. A call without user_id draws from all end users in the namespace, which is correct for an internal ops agent and wrong for a per-customer chat. agent_id adds a second, narrowing scope on top of user_id — it does not broaden.
Token budget. The budget is an approximation based on a 4-character-per-token estimate. The actual token count your LLM sees may differ by a few percent depending on the model's tokenizer. Leave a 10-20% buffer below your model's hard context limit.
Rate limiting. get_context counts against your monthly query quota, the same pool as search. Hobby plans include 25k queries per month; Developer 250k; Team 1M; Scale 10M. There is no per-minute rate limit on reads, but hitting the monthly cap returns 429 quota_exceeded.
Empty context. If no memories or facts are stored for the given scope, context is an empty string, tokens is 0, and sources is an empty array. This is a valid 200 response, not an error. Agents should handle the empty-context case gracefully by generating a neutral reply rather than surfacing an error to the user.

Search memories — returns raw ranked hits when you need to apply your own ranking or filtering before building a prompt.
Add a memory — write new content and run the full extraction pipeline so future context calls include it.
Delete a memory — soft-delete a memory so it is excluded from future context blocks.
API reference — complete endpoint contract with all parameters and response shapes.