Omair Shahid

Projects

Omega Kernel 11 modulescrates · 11,053 lines of RustLOC · 129 tests A system that manages multiple AI agents the way an operating system manages programs. Agents coordinate through direct function calls and a shared database instead of network requests. Every action is digitally signed so you can prove exactly what happened and when. Treating an AI agent swarm as a kernel scheduling problem. Subsystems (court, memory, crypto, biometrics, embeddings, flywheel) communicate through typed Rust function calls and shared SQLite connections — not HTTP, not message queues. Every state mutation produces a cryptographic receipt (Ed25519 signature + BLAKE3 hash). Panic-on-serialize-failure semantics in the crypto layer. TOCTOU-safe concurrent agents via BEGIN EXCLUSIVE transactions with busy_timeout. Built on: ed25519-dalek, blake3, serde, rusqlite, ratatui, sysinfo. Jeffrey Emanuel's asupersync, beads_rust, frankensearch, FrankenTUI.
The Court 22,452 lines of TypeScriptLOC TypeScript An interactive story engine where multiple AI models debate each other. One prompt goes to several models simultaneously, they critique each other's answers, then a final synthesis extracts where they agree and disagree. The key idea: when models disagree, that's where the interesting questions are. Character states (suspicion, influence, deniability) are tracked as numbers that affect each other — gaining power automatically raises suspicion. Adversarial multi-model deliberation — fan one prompt to multiple models, then run a 3-stage pipeline: independent generation, cross-evaluation (models critique each other's outputs), synthesis (extract agreement and disagreement). Model disagreement is signal, not noise — divergence points reveal where the truth is hard. 5 continuous POMDP belief-state gauges with 6 cascade rules that create emergent behavior: gaining power automatically raises suspicion, losses erode deniability, high suspicion drains influence. Gauge-gated choices — certain story paths are locked unless gauge thresholds are met. Built on: React, Tailwind CSS, EventEmitter (SSE streaming). Multi-provider LLM APIs. Stripe.
Flywheel Rust + Python1,964 LOC Rust + 1,862 LOC Python · 6 subsystems An automated work system that decides what to build next by comparing what's been done recently against a list of goals. It finds the biggest gap between "what exists" and "what should exist" and assigns that work to an agent. Workers grade their own output by comparing test counts before and after. A night mode runs maintenance and commits code while I sleep. The semantic gap oracle — embeds recent git commits and 7 strategic intents into the same 768-dim vector space, computes cosine distance, and picks the intent with the largest semantic gap from recent work. The backlog writes itself by finding what's most underserved. Atomic inbox+accept in a single exclusive SQLite transaction prevents task theft between concurrent workers. Workers self-grade every outcome (pre/post test counts, regression detection). Night cycle runs autonomous maintenance and auto-commits. Rewritten from 7 Python processes + HTTP IPC into a single Rust binary where all 6 subsystems communicate through typed function calls. Built on: beads_rust (task DAG), MCP Agent Mail (IPC, being replaced), asupersync, frankensearch. Gemini Embedding API.
Verdict Rust + PythonRust crate + Python CLI A verification system where every AI decision gets a digital signature and content fingerprint, not just a log entry. Includes a debate protocol where 3 AI models argue a question (one proposes, one critiques, one judges). Also picks the most cost-effective AI model for each task automatically. Epistemic verification engine — every AI decision produces a signed Ed25519 receipt with BLAKE3 content hash, not just a log line. Phalanx Council: 3-stage adversarial debate (Proponent, Critic, Judge) across different LLMs. Intelligence Per Penny (IPP) — Pareto-optimal model routing that balances cost vs. accuracy automatically. Adaptive modes (frugal/balanced/max_quality) based on spend budget. Cross-language test vectors ensure the same signature verifies in Rust, Python, and TypeScript. Built on: ed25519-dalek, blake3, pynacl, litellm.
Reel Genome 2,117 linesLOC · 28 tests A content recommendation system that models taste as something that changes over time, not a fixed profile. 9 rules govern how preferences evolve: recent things matter more, repeated content causes fatigue, new genres get a discovery boost, favorite creators build loyalty. Every preference change is signed for auditability. Taste cascade physics — content preference modeled as signal propagation with 9 rules: recency decay, novelty bonus, fatigue penalty, creator affinity, genre momentum, cross-pollination, saturation curves, discovery boost, loyalty gravity. Taste isn't a static vector — it's a dynamical system with feedback loops. Arc trajectory analysis detects mood shifts via embedding distance discontinuities. Ed25519 taste receipts create a tamper-proof audit trail from like to embedding. Variance-weighted 3D projection for constellation visualization. Built on: serde, Gemini Embedding API, k-means clustering.
Prune Rust + Python · 12 commands · web dashboard2,945 LOC Rust + 2,876 LOC Python · 12 CLI commands · FastAPI dashboard A scoring tool for managing who you follow on X/Twitter. Combines 7 signals (inactivity, follower ratio, reciprocity, engagement, account age, and more) into a single score. No machine learning — hand-tuned rules that work well on spam accounts where automated classifiers struggle. Includes a web dashboard and batch processing with rate limiting. Phoenix scorer — multi-signal composite scoring model. 7 weighted factors: inactivity (25%), ghost ratio (20%), non-reciprocity (20%), disengagement (15%), account age (10%), accessibility (10%), plus bonuses for verified accounts, mutuals, recent engagement, and fans. No ML training — hand-tuned heuristics that work on adversarial accounts where binary classifiers fail. Resumable batch execution with rate-limiting and randomized delays. Built on: Playwright, SQLite, Typer, Chromium DevTools Protocol.
Embedding Provider 486 linesLOC · 10 tests A wrapper that converts text, images, audio, video, and PDFs into numerical vectors in the same coordinate space, so you can search across different types of content. For example, find images that match a text description. Uses a compression trick that keeps 96% accuracy at 25% the storage cost. Universal multimodal embedding wrapper — text, images, audio, video, and PDFs all project to the same 768-dim vector space, enabling cross-modal semantic search. Matryoshka Representation Learning truncation: 96% of full 3072-dim quality at 25% the storage. 8 task-type hints (retrieval, classification, clustering, fact verification) — same content produces different embeddings depending on intent. Built on: reqwest, serde_json, Gemini Embedding 2 API.
Memori DB 309 linesLOC · 11 tests A memory system for AI agents. Each memory unit stores a timestamp, context label, content, fingerprint, and optional numerical vector for similarity search. Memories can be stored instantly and indexed for search later in batches. Engram-based agent memory — append-only store where each memory unit (engram) carries a timestamp, context tag, payload, BLAKE3 hash, and optional 768-dim embedding BLOB. Nullable embedding column for gradual rollout: store immediately, batch-embed later. f32 slice to little-endian byte serialization for lossless embedding roundtrips through SQLite. Built on: rusqlite with WAL mode, serde. SQLite PRAGMA tuning.
Court Engine 309 linesLOC · 11 tests The game physics behind The Court's interactive narrative. Five character states (tracked as numbers from 0 to 100) affect each other through simple rules: gaining power raises suspicion, losing power erodes deniability, high suspicion drains influence. Complex behavior emerges from these simple interactions. If deniability hits zero, the game ends. POMDP cascade physics for interactive narrative — continuous belief-state gauges with coupled update rules: +Sovereignty triggers +Suspicion (0.15x), –Sovereignty erodes Deniability (0.2x), Suspicion >50 drains Influence (0.05x). Emergent behavior from simple cascade rules. Immutable event log with BLAKE3 hashes. Collapse condition: Deniability ≤ 0 = game over. Built on: Rust standard library, serde.
Biometric Engine 123 linesLOC · 6 tests Reads the machine's CPU load and converts it to a 0–100 stress index that directly affects how AI agents behave. When the machine is under heavy load, agents become more cautious. The CPU state also seeds random number generation for the narrative simulation. Hardware state as behavioral policy — CPU load normalized to a 0.0–1.0 stress index that directly modulates agent behavior and narrative character states. BLAKE3 entropy seeding from nanosecond timestamp + CPU usage produces cryptographically sound, deterministic entropy per pulse. Machine stress maps to character suspicion in the Court simulation. Built on: sysinfo, blake3, chrono.
Panopticon 625 linesLOC A real-time terminal dashboard for monitoring what autonomous agents are doing. Shows agent status, system health, and character state visualizations. Access-controlled with rate limiting. Real-time TUI dashboard for monitoring autonomous agents. POMDP gauge visualization, agent status tracking, subsystem telemetry. Subscription-gated API with in-memory rate limiting (30 req/min). Built on: ratatui, sysinfo, FrankenTUI.
CYOA Forge 49,067 linesLOC A browser automation tool that can launch and control multiple browser types (tries Zen, then Chrome, then Firefox). Includes an interactive story mode where you make choices through the browser session. Polymorphic browser orchestration — launch system tries Zen, then Chrome in-place, then Chrome ephemeral copy, then Firefox, with Chrome DevTools Protocol port auto-discovery for already-running instances. Court mode REPL with station/choice/synth commands for interactive narrative within the browser session. Built on: Playwright, Chrome DevTools Protocol.

Key Ideas

Signed receipts for everything — every AI decision gets a digital signature and content fingerprint, implemented across 5 systems. Not theoretical.
Receipts, not vibes — every AI decision, narrative choice, and agent task produces a cryptographic Ed25519 receipt. If you can't verify it happened, it didn't happen. Implemented across 5 systems.
Find work by finding gaps — instead of maintaining a to-do list, compare what's been done against what should exist, then work on the biggest gap.
Semantic gap oracle — don't maintain TODO lists. Embed what you've done and what should exist into the same vector space, then work on the largest gap. The backlog writes itself.
AI models should argue — send one question to multiple AI models, have them critique each other, then synthesize. Where they disagree is where the answer is hardest to get right.
Adversarial deliberation pipeline — fan one prompt to multiple models, make them critique each other in 3 stages (propose, critique, synthesize). Where models diverge is where the truth is hard. Disagreement is signal, not noise.
Pick the cheapest model that's good enough — automatically route each task to the AI model with the best accuracy-per-dollar, not always the most expensive one.
Intelligence Per Penny — Pareto-optimal model routing. Pick the model that maximizes accuracy per dollar spent. Adaptive budget modes (frugal/balanced/max_quality).
Agents as an operating system — manage AI agents like an OS manages programs. Direct function calls over a shared database, not network requests.
Agent swarm as kernel — treat autonomous agents like OS subsystems. Typed Rust interfaces, not HTTP. Direct function calls over shared SQLite. Crypto receipts on every mutation.
Taste changes over time — content preference isn't a fixed profile. It's a system with feedback loops: fatigue, novelty, loyalty, momentum.
Taste as a dynamical system — preference isn't a static vector, it's a cascade with feedback loops. 9 rules: recency, fatigue, novelty, affinity, momentum, cross-pollination, saturation, discovery, loyalty.
Simple rules, complex behavior — five numbers with six update rules create emergent narrative dynamics. No complex AI needed for the game physics layer.
POMDP cascade physics — narrative state as coupled continuous gauges where gaining power automatically raises suspicion and losses erode deniability. Emergent behavior from simple update rules.
Machine stress affects agent behavior — CPU load directly changes how agents make decisions. Heavy load = more cautious behavior.
Hardware cortisol — CPU load, memory pressure, and thermal state treated as behavioral policy inputs. Machine stress directly modulates agent decisions and character narrative states.
Crash, don't hide errors — when something critical fails, stop the process immediately. Silent failures are worse than crashes.
Panic-on-serialize — security-critical code should crash rather than silently degrade. If hashing fails, the process dies. All failures must be loud, never invisible.

Built With

Everything here stands on open-source work. Clear attribution:

Jeffrey Emanuel's libraries (open-source) — async runtime, search engine, terminal framework, task queue, inter-agent messaging, and more. His libraries are the foundation layer.
Cryptography (open-source) — ed25519-dalek for digital signatures, blake3 for content fingerprinting. My contribution is the receipt architecture and crash-on-failure policy built on top.
SQLite (open-source) — the database. My contribution is the transaction safety pattern for preventing race conditions between concurrent agents.
Google's embedding models (API) — the actual AI model that converts content to vectors. My contribution is the multi-format wrapper, task-type routing, and memory store integration.
Playwright (open-source) — browser automation. My contribution is the multi-browser launch system and the scoring algorithm for social network analysis.

Open-Source Foundations

Everything I build stands on open-source work. Clear attribution:

Jeffrey Emanuel's FrankenStack (open-source) — asupersync (async runtime), frankensearch (hybrid BM25+HNSW search), FrankenTUI (terminal framework), beads_rust (task queue DAG), MCP Agent Mail (inter-agent IPC), UBS (vulnerability scanner), DCG (destructive command guard), CASS (session search). His libraries are the foundation I build on.
Ed25519 + BLAKE3 (open-source crypto) — I chose signatures over JWT because signatures prove what happened, not just who. ed25519-dalek and blake3 crates provide the primitives; my contribution is the receipt architecture and panic-on-failure semantics built on top.
SQLite WAL (open-source) — bare metal doesn't need a cluster. My contribution is the TOCTOU-safe transaction pattern (BEGIN EXCLUSIVE + busy_timeout=30s + INSERT OR IGNORE) for concurrent agent safety.
POMDP framework (academic) — Partially Observable Markov Decision Processes are a known framework. My contribution is applying them as narrative game physics with cascade rules and gauge-gated choices.
Matryoshka embeddings (Google research) — MRL truncation to 768 dims is documented technique. My contribution is the multimodal wrapper, task-type routing, and integration into the engram memory store.
Playwright + CDP (open-source) — browser automation primitives. My contribution is the polymorphic browser launch cascade and the Phoenix scoring algorithm for social network analysis.

Beliefs

Every AI decision should produce a signed receipt. If you can't verify it, it didn't happen.
The best infrastructure is the infrastructure you own. Cloud is someone else's machine with a markup.
Multiple AI models debating reveals truth better than one model alone. Disagreement is signal.
Small, fast programs. If your AI framework needs a container, you've already lost.
The agents should work while you sleep.
Critical code should crash loudly, never fail silently.
Speed compounds. Ship daily or don't ship at all.

Beliefs

Every AI decision should produce a cryptographic receipt. If you can't verify it, it didn't happen.
The best infrastructure is the infrastructure you own. Cloud is someone else's bare metal with a markup.
Multi-model adversarial debate reveals truth better than single-model RLHF. Disagreement is signal.
Small, fast binaries. If your agent framework needs a container, you've already lost.
The agents should work while you sleep. Night cycle runs maintenance and auto-commits autonomously.
Security-critical code should crash loudly, never fail silently. Panic over empty hash.
Speed compounds. Ship daily or don't ship at all.