Core operations
Search memories
Semantic vector retrieval that returns the raw memories closest to a query — in one call. For the recall path that assembles typed facts into a ready-to-inject block, reach for get_context first.
Reach for get_context first. The differentiator
is fact-assembled recall: get_context resolves your end user's
active typed facts — bi-temporal (subject, predicate, object)
triples with contradiction resolution — and folds them, plus the most
relevant memories, into a single prompt block. Raw search below
is the secondary path: it ranks individual memories by semantic similarity
when you want the verbatim source rather than an assembled answer.
Search returns the memories most similar to a query so an agent can surface
source context before composing a reply. Given a natural-language or keyword
query, Korely embeds the query and ranks stored memories by cosine similarity
over their embeddings, then returns the top hits. Each hit carries a relevance
score and a short snippet so your agent can decide whether to fetch the full
memory or pass the snippet directly into the prompt. Search is
semantic vector retrieval only — it does not run a keyword
index or a graph walk. The typed-fact graph is reached through
get_context and
get_facts, not through this endpoint.
An agent typically calls search when it needs the verbatim memory behind a topic, scoped to the current end user. To assemble a compact, fact-aware memory block for the system prompt in one call, use get_context. For a full time-ordered list of every memory you've stored for a user, see get_all.
flowchart LR
Q([query]) --> EMB[Embed<br/>query]
EMB --> VEC[Cosine similarity<br/>over memory embeddings]
VEC --> R([ranked hits]) Request
Endpoint: POST /v1/memories/search. SDK: korely.search(query, *, user_id=None, agent_id=None, limit=15).
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Required | The search query. Keyword-style (1-5 words) works best; longer conversational strings are also accepted. The query is embedded and ranked by cosine similarity against stored memories. |
user_id | string | Optional | Strongly recommended in multi-tenant products. Scopes results to one end user. Without it, the search spans all end users in the namespace. |
agent_id | string | Optional | Further narrows to a specific agent surface. Omit to search across all agents you own within the namespace. |
limit | integer | Optional | Number of hits to return. Default 15, maximum 50. Hits are ordered by descending relevance score. |
Always pass user_id in customer-facing agents.
A search without user_id spans every end user stored in the
namespace. That is correct for an internal ops tool and wrong for a
per-customer chat. Filters are additive (AND): adding agent_id
on top of user_id narrows further, it does not broaden.
Example
from korely_memory import Korely
korely = Korely(api_key="kor_live_...", region="eu")
results = korely.search( "northwind pricing", user_id="customer-4812", limit=5,)
for hit in results: print(hit.id, hit.score, hit.snippet) # mem_8f2c1a 0.91 Northwind Hosting costs 50 euro per month since the June upgrade. # mem_3c77d9 0.74 The team agreed to renegotiate the hosting contract in Q3.Response
{ "results": [ { "id": "mem_8f2c1a", "score": 0.91, "snippet": "Northwind Hosting costs 50 euro per month since the June upgrade.", "user_id": "customer-4812", "agent_id": "infra-bot", "metadata": { "source": "slack" } }, { "id": "mem_3c77d9", "score": 0.74, "snippet": "The team agreed to renegotiate the hosting contract in Q3.", "user_id": "customer-4812", "agent_id": "infra-bot", "metadata": {} } ]}Response fields
| Field | Type | Description |
|---|---|---|
results | array | Ranked list of memory hits, best score first. |
results[].id | string | Memory ID. Pass to korely.get(id) to retrieve the full content and extracted facts. |
results[].score | float | Cosine-similarity relevance score from 0.0 to 1.0. Higher means closer to the query embedding. |
results[].snippet | string | Short excerpt (up to 280 characters) of the stored memory, suitable for direct inclusion in a prompt or for display in a UI. For the complete text, fetch the memory by ID. |
results[].user_id | string | The end user the memory belongs to. |
results[].agent_id | string | The agent surface that wrote the memory, or null if none was set. |
results[].metadata | object | The metadata object passed when the memory was written. Empty object if none was set. |
The response contains only the results array. There is no
total field and no memories wrapper. An empty
result is {"results": []}.
The read path is zero-generation. Search embeds your query (a fraction of a cent) and ranks stored memories by cosine similarity. No model ever composes or rewrites the snippets you receive. Your agent's own model does the reasoning over what comes back. Reads are retrieval, not generation.
Fetching full content
Snippets are truncated for speed. When a hit's score is high enough to include in the prompt, fetch the complete memory to avoid cutting off important context:
results = korely.search("northwind pricing", user_id="customer-4812", limit=3)
# Pull full content for the top hitif results and results[0].score > 0.8: memory = korely.get(results[0].id) print(memory.content) # Northwind Hosting costs 50 euro per month since the June upgrade. # Contract renewed through December 2026. See invoice INV-2026-0611.Timeline and fact history
Search returns only current, active memories. There is no
include_history or time_filter parameter on this
endpoint. For the full lifecycle of a single memory (edits, supersessions,
deletion), use history. To query
the typed fact store with temporal filters — including superseded facts — use
get_facts(include_invalidated=True).
Errors
Search is a read-only operation — it counts against your monthly query quota, never the write quota. The table below lists every status code this endpoint can return.
| Status | Code string | Cause |
|---|---|---|
200 | — | Success. Results may be an empty array if no memories match; this is not an error. |
401 | invalid_key | The Authorization header is missing, malformed, or the key has been revoked. |
422 | invalid_request | Request validation failed. The most common cause is a missing or empty query field, or limit outside the 1-50 range. The body is the flat {"code": "invalid_request", "message": "query: Field required"} envelope. |
429 | quota_exceeded | Monthly query quota is exhausted (past the grace allowance). The quota 429 carries no Retry-After — it resets on your billing cycle date. (A separate per-second rate-limit 429 does carry a Retry-After header, in seconds.) Upgrade your plan or wait for the monthly reset. |
Every error response uses the same envelope: {"code": "<slug>", "message": "<text>"}.
A 401 is {"code": "invalid_key", "message": "Invalid or missing API key"}.
There is no error or detail field, and quota
information is never returned in the body.
Notes
- Read-only, non-destructive. Search never modifies stored memories or facts. Calling it any number of times with the same parameters is safe and produces the same ranked result set given the same corpus.
- Scoping is additive. Filters narrow results:
user_idalone returns all memories for that user across every agent; addingagent_idon top narrows to that agent's memories for the user. You cannot broaden results by combining filters. - No cross-workspace access. The API key determines the namespace. A search with
user_id="customer-4812"only touches end users stored under that key's workspace — it cannot reach another customer's data even if they share the sameuser_idstring. - Semantic vector ranking. Hits are ranked by cosine similarity between the query embedding and stored memory embeddings. There is no keyword index and no graph walk on this endpoint; for typed-fact recall use get_context or
get_facts. - Rate-limit behaviour. Each search call counts as one query against your plan's monthly query quota. Hobby (25 k/month), Developer (250 k/month), Team (1 M/month), Scale (10 M/month). When the monthly quota is exhausted the API returns
429 quota_exceededwith noRetry-After(it resets on your billing cycle). A separate per-second rate-limit can also return429— that one carries aRetry-Afterheader in seconds. - Empty results are not errors. A
200response with"results": []means no memories matched the query for the given scope. This is normal for a new end user or a very specific query.
Related
- Add a memory — store content and run the full write pipeline before searching for it.
- Get context — the primary recall path: assembles a ready-to-inject prompt block from active typed facts plus the most relevant memories.
- Delete a memory — forget a memory by id so it no longer appears in search results.
- API reference — full endpoint contract including the
GET /v1/memories/{id}/historyandGET /v1/factsendpoints.