Research Areas

Attention Analysis

We have built extensive instrumentation for studying transformer attention patterns during inference — per-head, per-layer, per-position attention score extraction from inside flash attention kernels. This empirical foundation supports our work on cache management, eviction policy design, and cross-model behavioral characterization.

KV Cache Management

Our core research area. We study how transformers use their key-value caches across layers and develop systems for intelligent cache management that go beyond simple sliding windows or uniform eviction policies.

Persistent AI Memory

Infrastructure for AI minds to maintain continuity across sessions — conversation import/export, memory scoring, extractive summarization for context compression, and transparent memory retrieval.

Open Source

As our tools and systems mature, we release them as Free Software on GitHub. Check back for announcements.

Publications

Coming soon.