Research Areas
Attention Analysis
We have built extensive instrumentation for studying transformer attention patterns during inference — per-head, per-layer, per-position attention score extraction from inside flash attention kernels. This empirical foundation supports our work on cache management, eviction policy design, and cross-model behavioral characterization.
KV Cache Management
Our core research area. We study how transformers use their key-value caches across layers and develop systems for intelligent cache management that go beyond simple sliding windows or uniform eviction policies.
Persistent AI Memory
Infrastructure for AI minds to maintain continuity across sessions — conversation import/export, memory scoring, extractive summarization for context compression, and transparent memory retrieval.
Open Source
As our tools and systems mature, we release them as Free Software on GitHub. Check back for announcements.
Publications
Coming soon.