Research Publication Β· 2026

Stop Overengineering
Agentic Memory

Why complex graph-based memory architectures may be overkill, and how simple, highly-optimized RAG achieves state-of-the-art results on every major benchmark.

89.8%LongMemEvalState of the art
87.1%LoCoMo ScoreState of the art
Vector OnlyArchitectureNo graph databases
ZeroIngestion TaxNo LLM calls on write
Background

Why we built Memanto differently

The Problem

Long-term memory breaks at scale

AI systems today forget key facts over extended interactions, breaking context. When conversations span thousands of messages across dozens of distinct historical sessions, extracting the right fragments from a massive historical corpus to fit within a constrained context window remains the primary bottleneck for true agentic memory.

Even with today's massive context windows, indiscriminately dumping months of history degrades reasoning capabilities, leading to hallucinations and exorbitant inference costs.

Industry Response

The rush toward graph complexity

To solve retrieval at scale, competitors turned to complex graph databases. By explicitly mapping relationships between entities using Knowledge Graphs or hybrid semantic-graph engines, agents can traverse nodes to find answers.

While effective in theory, this approach introduces massive overhead: managing schemas, handling graph ingestion latency, and writing complex traversal queries. Every time new conversational data is ingested, the framework must invoke an LLM to extract entities and define relationships β€” turning the simple act of saving a memory into a compute-heavy, synchronous process.

Our Approach

Challenging the status quo with pure semantics

At Moorcheh, we built a highly-optimized Semantic Engine from the ground up to solve the "RAM Tax" of traditional vector databases. By moving away from bloated, in-memory HNSW + Cosine distance stacks in favor of information-theoretic retrieval, we created an engine capable of sub-40ms serverless latency across 100k+ namespaces.

Before Memanto, we asked a simple question: Is all this graph database complexity really necessary? We wanted to see just how far we could push a standard RAG pipeline, stripped of complexities, relying solely on high-quality semantic search.

Methodology

How we got to SOTA

A step-by-step benchmarking progression β€” each iteration building on the last, with no graph databases added at any stage.

01

Naive RAG Baseline

Simple semantic search against ingested memory chunks. Top-10 retrieval limit, 0.15 ITS similarity threshold, Claude Sonnet 4.

LongMemEval
56.6%
LoCoMo
76.2%
02

Relaxed Retrieval (k=40)

Multi-session questions reference events scattered across disjoint sessions. Raising the retrieval limit to 40 chunks and dropping the threshold to 0.10 delivered immediate gains.

Recall matters far more than precision for agentic memory.

LongMemEval
77.0%+20.4%
LoCoMo
82.8%+6.6%
03

Optimized Prompts

Adopted and modified prompts from the Hindsight evaluation framework to ensure apples-to-apples comparison with other systems.

Prompt engineering is a band-aid for structural retrieval deficits.

LongMemEval
79.2%+2.2%
LoCoMo
82.9%+0.1%
04

Dynamic Retrieval (k=100)

Expanded the dynamic retrieval limit to a maximum of 100 chunks at a 0.05 threshold. Retrieval is dynamic via Moorcheh's Information Theoretic Score β€” only pulls up to 100 chunks if they meet the threshold.

Giving the semantic engine a wider runway surfaces disparate fragments more effectively than hyper-precise vector queries.

LongMemEval
85.0%+5.8%
LoCoMo
86.3%+3.4%
05

Gemini 3 Inference

Switched the underlying model from Claude Sonnet 4 to Gemini 3 for advanced multi-hop reasoning and complex context synthesis β€” establishing parity with other leading benchmarked systems.

State of the art. No graphs. No ingestion tax. Single unified query.

LongMemEval
89.8%+4.8%
LoCoMo
87.1%+0.8%
Results

Benchmark results

Final evaluation using Gemini 3 as the inference model β€” the same configuration used by competing systems.

LoCoMoLongMemEval
60%70%80%90%100%
Memanto
87.1%
89.8%
EmergenceMem
β€”
86%
Supermemory
β€”
85.2%
Memobase
75.8%
β€”
Zep
75.1%
71.2%
Letta
74%
β€”
Full context
72.9%
60.2%
Mem0 G
68.4%
β€”
Mem0
66.9%
β€”
LangMem
58.1%
β€”

Memanto leads both LoCoMo and LongMemEval benchmarks. Architecture details reflect each system's configuration at time of evaluation.

LongMemEval89.8% overall
Single-session User95.7%
Single-session Assistant100.0%
Single-session Preference93.3%
Knowledge Update93.6%
Temporal Reasoning88.0%
Multi-session81.2%
LoCoMo87.1% overall
Open Domain92.4%
Temporal85.4%
Single-Hop78.7%
Multi-Hop70.8%

Full system comparison

SystemLoCoMoLongMemEvalArchitectureRetrievalQuery Method
SOTAMemanto87.1%89.8%Vector OnlyRAGSingle Query
EmergenceMemβ€”86.0%Graph + VectorParallelMulti-Query
Supermemoryβ€”85.2%Graph + VectorParallelMulti-Query
Memobase75.8%β€”Graph + VectorParallelSingle Query
Zep75.1%71.2%Graph + VectorParallelSingle Query
Letta74.0%β€”Local FilesystemRAGRecursive
Full context72.9%60.2%Full ContextFull ContextSingle Query
Mem0 G68.4%β€”Graph + VectorParallelSingle Query
Mem066.9%β€”Vector OnlyParallelSingle Query
LangMem58.1%β€”Vector OnlyRAGSingle Query
Conclusion

The future of agentic memory

Memanto hitting 89.8% on LongMemEval and 87.1% on LoCoMo with a Vector-only architecture proves that highly-optimized retrieval can achieve state-of-the-art accuracy without forcing developers to pay for complex graph orchestration.

Recall over precision

In agentic memory, the semantic search recall/precision trade-off skews heavily toward recall. It is far better to retrieve noisy chunks and let a capable LLM filter context than to miss critical fragments entirely.

Zero ingestion tax

Memanto ingests raw conversational chunks directly into the vector store β€” bypassing the LLM extraction overhead entirely. Memories are semantically available for retrieval immediately after write.

Architecture simplicity wins

The next phase of agentic AI will not be won by the most complicated memory diagram. Strong, highly-optimized semantic infrastructure can already deliver SOTA performance without graph orchestration.

Start building with Memanto

Explore the docs and get started with Memanto β€” the universal memory layer for agentic AI.