Core operations

Search memories

Semantic vector retrieval that returns the raw memories closest to a query — in one call. For the recall path that assembles typed facts into a ready-to-inject block, reach for get_context first.

Reach for get_context first. The differentiator is fact-assembled recall: get_context resolves your end user's active typed facts — bi-temporal (subject, predicate, object) triples with contradiction resolution — and folds them, plus the most relevant memories, into a single prompt block. Raw search below is the secondary path: it ranks individual memories by semantic similarity when you want the verbatim source rather than an assembled answer.

Search returns the memories most similar to a query so an agent can surface source context before composing a reply. Given a natural-language or keyword query, Korely embeds the query and ranks stored memories by cosine similarity over their embeddings, then returns the top hits. Each hit carries a relevance score and a short snippet so your agent can decide whether to fetch the full memory or pass the snippet directly into the prompt. Search is semantic vector retrieval only — it does not run a keyword index or a graph walk. The typed-fact graph is reached through get_context and get_facts, not through this endpoint.

An agent typically calls search when it needs the verbatim memory behind a topic, scoped to the current end user. To assemble a compact, fact-aware memory block for the system prompt in one call, use get_context. For a full time-ordered list of every memory you've stored for a user, see get_all.

flowchart LR
    Q([query]) --> EMB[Embed<br/>query]
    EMB --> VEC[Cosine similarity<br/>over memory embeddings]
    VEC --> R([ranked hits])

A query is embedded and ranked by cosine similarity into a single list of memory hits.

Request

Endpoint: POST /v1/memories/search. SDK: korely.search(query, *, user_id=None, agent_id=None, limit=15).

Parameter	Type	Required	Description
`query`	string	Required	The search query. Keyword-style (1-5 words) works best; longer conversational strings are also accepted. The query is embedded and ranked by cosine similarity against stored memories.
`user_id`	string	Optional	Strongly recommended in multi-tenant products. Scopes results to one end user. Without it, the search spans all end users in the namespace.
`agent_id`	string	Optional	Further narrows to a specific agent surface. Omit to search across all agents you own within the namespace.
`limit`	integer	Optional	Number of hits to return. Default `15`, maximum `50`. Hits are ordered by descending relevance score.

Always pass user_id in customer-facing agents. A search without user_id spans every end user stored in the namespace. That is correct for an internal ops tool and wrong for a per-customer chat. Filters are additive (AND): adding agent_id on top of user_id narrows further, it does not broaden.

Example

from korely_memory import Korely

korely = Korely(api_key="kor_live_...", region="eu")

results = korely.search(
    "northwind pricing",
    user_id="customer-4812",
    limit=5,
)

for hit in results:
    print(hit.id, hit.score, hit.snippet)
    # mem_8f2c1a  0.91  Northwind Hosting costs 50 euro per month since the June upgrade.
    # mem_3c77d9  0.74  The team agreed to renegotiate the hosting contract in Q3.

import { Korely } from "korely-memory";

const korely = new Korely({ apiKey: "kor_live_...", region: "eu" });

const results = await korely.search(
  "northwind pricing",
  { user_id: "customer-4812", limit: 5 },
);

for (const hit of results) {
  console.log(hit.id, hit.score, hit.snippet);
  // mem_8f2c1a  0.91  Northwind Hosting costs 50 euro per month since the June upgrade.
  // mem_3c77d9  0.74  The team agreed to renegotiate the hosting contract in Q3.
}

curl -X POST https://api.korely.ai/v1/memories/search \
  -H "Authorization: Bearer kor_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "northwind pricing",
    "user_id": "customer-4812",
    "limit": 5
  }'

Response

{
  "results": [
    {
      "id": "mem_8f2c1a",
      "score": 0.91,
      "snippet": "Northwind Hosting costs 50 euro per month since the June upgrade.",
      "user_id": "customer-4812",
      "agent_id": "infra-bot",
      "metadata": { "source": "slack" }
    },
    {
      "id": "mem_3c77d9",
      "score": 0.74,
      "snippet": "The team agreed to renegotiate the hosting contract in Q3.",
      "user_id": "customer-4812",
      "agent_id": "infra-bot",
      "metadata": {}
    }
  ]
}

Response fields

Field	Type	Description
`results`	array	Ranked list of memory hits, best score first.
`results[].id`	string	Memory ID. Pass to `korely.get(id)` to retrieve the full content and extracted facts.
`results[].score`	float	Cosine-similarity relevance score from 0.0 to 1.0. Higher means closer to the query embedding.
`results[].snippet`	string	Short excerpt (up to 280 characters) of the stored memory, suitable for direct inclusion in a prompt or for display in a UI. For the complete text, fetch the memory by ID.
`results[].user_id`	string	The end user the memory belongs to.
`results[].agent_id`	string	The agent surface that wrote the memory, or `null` if none was set.
`results[].metadata`	object	The metadata object passed when the memory was written. Empty object if none was set.

The response contains only the results array. There is no total field and no memories wrapper. An empty result is {"results": []}.

The read path is zero-generation. Search embeds your query (a fraction of a cent) and ranks stored memories by cosine similarity. No model ever composes or rewrites the snippets you receive. Your agent's own model does the reasoning over what comes back. Reads are retrieval, not generation.

Fetching full content

Snippets are truncated for speed. When a hit's score is high enough to include in the prompt, fetch the complete memory to avoid cutting off important context:

results = korely.search("northwind pricing", user_id="customer-4812", limit=3)

# Pull full content for the top hit
if results and results[0].score > 0.8:
    memory = korely.get(results[0].id)
    print(memory.content)
    # Northwind Hosting costs 50 euro per month since the June upgrade.
    # Contract renewed through December 2026. See invoice INV-2026-0611.

Timeline and fact history

Search returns only current, active memories. There is no include_history or time_filter parameter on this endpoint. For the full lifecycle of a single memory (edits, supersessions, deletion), use history. To query the typed fact store with temporal filters — including superseded facts — use get_facts(include_invalidated=True).

Errors

Search is a read-only operation — it counts against your monthly query quota, never the write quota. The table below lists every status code this endpoint can return.

Status	Code string	Cause
`200`	—	Success. Results may be an empty array if no memories match; this is not an error.
`401`	`invalid_key`	The `Authorization` header is missing, malformed, or the key has been revoked.
`422`	`invalid_request`	Request validation failed. The most common cause is a missing or empty `query` field, or `limit` outside the 1-50 range. The body is the flat `{"code": "invalid_request", "message": "query: Field required"}` envelope.
`429`	`quota_exceeded`	Monthly query quota is exhausted (past the grace allowance). The quota 429 carries no `Retry-After` — it resets on your billing cycle date. (A separate per-second rate-limit 429 does carry a `Retry-After` header, in seconds.) Upgrade your plan or wait for the monthly reset.

Every error response uses the same envelope: {"code": "<slug>", "message": "<text>"}. A 401 is {"code": "invalid_key", "message": "Invalid or missing API key"}. There is no error or detail field, and quota information is never returned in the body.

Notes

Read-only, non-destructive. Search never modifies stored memories or facts. Calling it any number of times with the same parameters is safe and produces the same ranked result set given the same corpus.
Scoping is additive. Filters narrow results: user_id alone returns all memories for that user across every agent; adding agent_id on top narrows to that agent's memories for the user. You cannot broaden results by combining filters.
No cross-workspace access. The API key determines the namespace. A search with user_id="customer-4812" only touches end users stored under that key's workspace — it cannot reach another customer's data even if they share the same user_id string.
Semantic vector ranking. Hits are ranked by cosine similarity between the query embedding and stored memory embeddings. There is no keyword index and no graph walk on this endpoint; for typed-fact recall use get_context or get_facts.
Rate-limit behaviour. Each search call counts as one query against your plan's monthly query quota. Hobby (25 k/month), Developer (250 k/month), Team (1 M/month), Scale (10 M/month). When the monthly quota is exhausted the API returns 429 quota_exceeded with no Retry-After (it resets on your billing cycle). A separate per-second rate-limit can also return 429 — that one carries a Retry-After header in seconds.
Empty results are not errors. A 200 response with "results": [] means no memories matched the query for the given scope. This is normal for a new end user or a very specific query.

Add a memory — store content and run the full write pipeline before searching for it.
Get context — the primary recall path: assembles a ready-to-inject prompt block from active typed facts plus the most relevant memories.
Delete a memory — forget a memory by id so it no longer appears in search results.
API reference — full endpoint contract including the GET /v1/memories/{id}/history and GET /v1/facts endpoints.