Data Scientist

Nwanguma Emmanuel

_

I build ML systems across the full stack — from data science and model fine-tuning to the production infrastructure that keeps them reliable: evaluation frameworks, feature stores, lineage tracking, and agent systems.

Scroll to explore

$ cat engineering_philosophy.txt

01_

Production Over Prototypes

"The gap between a notebook that runs and a system that performs is where most ML projects die. I design for reliability from the start: modular components, observable failure modes, and infrastructure that breaks explicitly rather than silently."

02_

Evaluation Over Accuracy

"Evaluation-driven ML systems outlast accurate ones. I build rigorous evaluation harnesses (regression detection, behavioral diffing, drift monitoring) because a model that can't be measured can't be trusted in production."

03_

Lineage and Observability

"A prediction without a lineage is a liability. I build systems where every inference can be traced to the exact data, pipeline, and model version that produced it — and where drift, latency, and token cost are visible before they become problems."

04_

Build for Other Engineers

"The real test of infrastructure isn't whether it works in your repo — it's whether other engineers adopt it in theirs. I publish to PyPI, npm, and Homebrew because tools that live only on a branch aren't tools yet."

$ ls projects/

// ML Infrastructure
// Applied AI
// Data Science

$ tree ai_system_stack/

Skills organized by system architecture layer.

tree — portfolio
ai_system_stack/
├──Advanced EDA & profiling
├──Feature engineering
├──Statistical testing & experiment design
├──Time series forecasting
├──SQL (PostgreSQL, MySQL, BigQuery)
└──ETL workflows
├──Supervised ML
├──Cross-validation
├──Hyperparameter optimization
├──Imbalanced data handling
├──SHAP interpretability
├──Error analysis
├──PyTorch & HuggingFace
└──LLM fine-tuning (QLoRA/PEFT)
├──FastAPI model serving
├──REST design
├──JWT authentication
├──Structured JSON outputs
└──Latency-aware inference
├──Docker
├──Redis
├──MLflow
├──Model versioning
├──GitHub Actions CI/CD
└──PgBouncer & connection pooling
├──Multi-provider LLM integration
├──Prompt engineering
├──System prompt design
├──Function/tool calling
├──RAG pipelines
├──Embeddings & semantic retrieval
├──Vector databases (pgvector, FAISS, ChromaDB)
├──Prompt regression testing
└──LLM evaluation frameworks
├──Multi-agent shared memory systems
├──Agent behavioral testing & assertions
├──MCP tool integration
├──Async task queues (Celery)
├──OSS SDK publishing (PyPI · npm · Homebrew)
├──Go CLI tooling
└──TypeScript SDK authoring

$ git log --contributions

aden-hive/hive10,000+ GitHub stars · 215 contributors

14 merged PRs across 215 contributors · 458 tests added · 33 tool READMEs written · Named contributor in aden-hive/hive v0.7.0 release notes.

BigQuery MCP tool integration (new feature)Credential exception handling (bug fix)EventBus comprehensive test coverageSecurity scanning test suites (7 tools)Cross-platform CI improvements (Windows + macOS)Resource leak preventionAPI integration unit tests (3 tools)

$ curl medium_feed

// More from Medium

$ ls thoughts/

$ contact --info

Open to ML Engineer, AI Engineer, Data Scientist, and MLOps Engineer roles — remote worldwide.