Nwanguma Emmanuel

$ cat engineering_philosophy.txt

01_

Production Over Prototypes

"The gap between a notebook that runs and a system that performs is where most ML projects die. I design for reliability from the start: modular components, observable failure modes, and infrastructure that breaks explicitly rather than silently."

02_

Evaluation Over Accuracy

"Evaluation-driven ML systems outlast accurate ones. I build rigorous evaluation harnesses (regression detection, behavioral diffing, drift monitoring) because a model that can't be measured can't be trusted in production."

03_

Lineage and Observability

"A prediction without a lineage is a liability. I build systems where every inference can be traced to the exact data, pipeline, and model version that produced it — and where drift, latency, and token cost are visible before they become problems."

04_

Build for Other Engineers

"The real test of infrastructure isn't whether it works in your repo — it's whether other engineers adopt it in theirs. I publish to PyPI, npm, and Homebrew because tools that live only on a branch aren't tools yet."

$ ls projects/

// ML Infrastructure

// Applied AI

// Data Science

$ tree ai_system_stack/

Skills organized by system architecture layer.

tree — portfolio

ai_system_stack/

├──Advanced EDA & profiling

├──Feature engineering

├──Statistical testing & experiment design

├──Time series forecasting

├──SQL (PostgreSQL, MySQL, BigQuery)

└──ETL workflows

├──Supervised ML

├──Cross-validation

├──Hyperparameter optimization

├──Imbalanced data handling

├──SHAP interpretability

├──Error analysis

├──PyTorch & HuggingFace

└──LLM fine-tuning (QLoRA/PEFT)

├──FastAPI model serving

├──REST design

├──JWT authentication

├──Structured JSON outputs

└──Latency-aware inference

├──Docker

├──Redis

├──MLflow

├──Model versioning

├──GitHub Actions CI/CD

└──PgBouncer & connection pooling

├──Multi-provider LLM integration

├──Prompt engineering

├──System prompt design

├──Function/tool calling

├──RAG pipelines

├──Embeddings & semantic retrieval

├──Vector databases (pgvector, FAISS, ChromaDB)

├──Prompt regression testing

└──LLM evaluation frameworks

├──Multi-agent shared memory systems

├──Agent behavioral testing & assertions

├──MCP tool integration

├──Async task queues (Celery)

├──OSS SDK publishing (PyPI · npm · Homebrew)

├──Go CLI tooling

└──TypeScript SDK authoring

$ git log --contributions

aden-hive/hive10,000+ GitHub stars · 215 contributors

14 merged PRs across 215 contributors · 458 tests added · 33 tool READMEs written · Named contributor in aden-hive/hive v0.7.0 release notes.

BigQuery MCP tool integration (new feature)Credential exception handling (bug fix)EventBus comprehensive test coverageSecurity scanning test suites (7 tools)Cross-platform CI improvements (Windows + macOS)Resource leak preventionAPI integration unit tests (3 tools)

$ curl medium_feed

// Recent Writing

Towards AI

The MLOps Component Nobody Builds in Their Portfolio (And Why It Matters Most)

Towards AI

Can You Trace Any Prediction Back to the Exact Data That Caused It?

NextGenAI

Everyone's Building AI Wrappers. Few Are Solving Real Data Infrastructure Problems.

// More from Medium

NextGenAI

Mar 11, 2026

I Tested 5 RAG Search Strategies on the Same Dataset. Here Are the Real Latency Numbers.

#python#data-science

DataDriveninvestor

Mar 9, 2026

I Added Real-Time Drift Detection to My ML Classifier. Here’s What Actually Broke First.

#artificial-intelligence#machine-learning

AI in Plain English

Mar 3, 2026

I Trained My Own Financial AI From Scratch. Here’s What a 69% Improvement Actually Looks Like

#nlp#llm

NextGenAI

Feb 23, 2026

I Open-Sourced an LLM Regression Testing Framework. Here’s Why Every AI Team Needs One.

#python#artificial-intelligence

$ ls thoughts/

Why most ML projects fail in production

Most teams spend 80% of their time on the model and 20% on everything else. But production ML fails at the seams — training-serving skew, silent drift, predictions that can't be traced back to the data that caused them. The model is rarely the problem.

Evaluation > Accuracy

A model that scores 99% on a benchmark but silently degrades after a prompt change isn't 99% accurate — it's untested. Real evaluation means regression gates, not leaderboard metrics.

Agents don't have a prompting problem

Most agent failures aren't prompt failures. They're memory failures — agents that forget context between sessions, lose nuance between handoffs, and drift from the original goal by step three. Better prompts don't fix broken memory architecture.

Your agent works in the notebook. That's the easy part.

Demo agents pass every test because demos don't have tool failures, prompt regressions, or infinite loops. Production agents do. The gap between an agent that works in a notebook and one that works for users is a testing framework, not a better model.

$ contact --info

Open to ML Engineer, AI Engineer, Data Scientist,
and MLOps Engineer roles — remote worldwide.

github.sh linkedin.log send_mail.exe

$ cat engineering_philosophy.txt

Production Over Prototypes

Evaluation Over Accuracy

Lineage and Observability

Build for Other Engineers

$ ls projects/

Production ML Feature Store: Dual-Store Architecture

ML Pipeline Lineage Tracker: End-to-End Audit Trail

Remembr: Shared Memory Layer for Multi-Agent AI Systems

evalflow: pytest for LLMs

playagent: pytest for AI Agents

Phi-4 Mini Finance Fine-tuning: +69% ROUGE-L

E-Commerce Product Classifier: Production NLP with Drift Detection

Production RAG Pipeline: Intelligent Document Q&A

locksmith: Dangerous Postgres Migration Checker

Heart Disease Risk Prediction: Production ML System

Telecom Churn Prediction: 80.4% Accuracy

Hybrid Recommendation System: Collaborative + Content-Based

$ tree ai_system_stack/

$ git log --contributions

$ curl medium_feed

The MLOps Component Nobody Builds in Their Portfolio (And Why It Matters Most)

Can You Trace Any Prediction Back to the Exact Data That Caused It?

Everyone's Building AI Wrappers. Few Are Solving Real Data Infrastructure Problems.

I Tested 5 RAG Search Strategies on the Same Dataset. Here Are the Real Latency Numbers.

I Added Real-Time Drift Detection to My ML Classifier. Here’s What Actually Broke First.

I Trained My Own Financial AI From Scratch. Here’s What a 69% Improvement Actually Looks Like

I Open-Sourced an LLM Regression Testing Framework. Here’s Why Every AI Team Needs One.

$ ls thoughts/

Why most ML projects fail in production

Evaluation > Accuracy

Agents don't have a prompting problem

Your agent works in the notebook. That's the easy part.

$ contact --info

Open to ML Engineer, AI Engineer, Data Scientist,
and MLOps Engineer roles — remote worldwide.

Nwanguma Emmanuel

$ cat engineering_philosophy.txt

Production Over Prototypes

Evaluation Over Accuracy

Lineage and Observability

Build for Other Engineers

$ ls projects/

Production ML Feature Store: Dual-Store Architecture

ML Pipeline Lineage Tracker: End-to-End Audit Trail

Remembr: Shared Memory Layer for Multi-Agent AI Systems

evalflow: pytest for LLMs

playagent: pytest for AI Agents

Phi-4 Mini Finance Fine-tuning: +69% ROUGE-L

E-Commerce Product Classifier: Production NLP with Drift Detection

Production RAG Pipeline: Intelligent Document Q&A

locksmith: Dangerous Postgres Migration Checker

Heart Disease Risk Prediction: Production ML System

Telecom Churn Prediction: 80.4% Accuracy

Hybrid Recommendation System: Collaborative + Content-Based

$ tree ai_system_stack/

$ git log --contributions

$ curl medium_feed

The MLOps Component Nobody Builds in Their Portfolio (And Why It Matters Most)

Can You Trace Any Prediction Back to the Exact Data That Caused It?

Everyone's Building AI Wrappers. Few Are Solving Real Data Infrastructure Problems.

I Tested 5 RAG Search Strategies on the Same Dataset. Here Are the Real Latency Numbers.

I Added Real-Time Drift Detection to My ML Classifier. Here’s What Actually Broke First.

I Trained My Own Financial AI From Scratch. Here’s What a 69% Improvement Actually Looks Like

I Open-Sourced an LLM Regression Testing Framework. Here’s Why Every AI Team Needs One.

$ ls thoughts/

Why most ML projects fail in production

Evaluation > Accuracy

Agents don't have a prompting problem

Your agent works in the notebook. That's the easy part.

$ contact --info

Open to ML Engineer, AI Engineer, Data Scientist, and MLOps Engineer roles — remote worldwide.

Open to ML Engineer, AI Engineer, Data Scientist,
and MLOps Engineer roles — remote worldwide.