Skip to main content

AI & Data Systems

Senior Data & AI Engineer building reliable systems for data, documents, models, and automation.

I design and implement pipelines and applications that ingest messy information, structure it, validate it, enrich it with models or LLMs, and expose it through search, APIs, reports, dashboards, or publication workflows.


Core Capabilities

Data and AI Pipelines

I build reproducible pipelines for ingesting, cleaning, transforming, validating, and materializing data from heterogeneous sources.

Examples include:

  • JSONL and structured-data pipelines for document and conversation processing
  • Python and SQL workflows for analytical datasets
  • OCR and parsing workflows for financial and administrative documents
  • run records, logs, validation checks, and materialized artifacts for reproducibility

GenAI, RAG, and Knowledge Infrastructure

I build systems that use LLMs, embeddings, and retrieval to turn unstructured information into usable knowledge.

Examples include:

  • document chunking and metadata generation
  • embedding pipelines and vector stores such as ChromaDB and FAISS
  • semantic search and lightweight retrieval interfaces
  • LLM-assisted summarization, classification, routing, and digest generation
  • separation between deterministic processing and AI-assisted reasoning
M.I. Journal thumbnail
Deployment: M.I. Journal
Quartz knowledge hub with tags, monthly logs, and semantic navigation.

Analytics, Metrics, and Decision Support

I connect data infrastructure with analytical and product questions.

Examples include:

  • socioeconomic indicators and poverty measurement
  • financial data normalization and reporting
  • monitoring systems and operational dashboards
  • metric taxonomies for decision-making and evaluation

Automation and Operational Reliability

I care about systems that can be rerun, inspected, debugged, and maintained.

Examples include:

  • CLI wrappers and Makefile-based workflows
  • CI/CD and GitHub Actions
  • static-site and JSON API publication pipelines
  • runbooks, architecture notes, validation layers, and failure recovery patterns

Stack

Python, SQL, Pandas, NumPy, scikit-learn, LLM APIs, embeddings, RAG, ChromaDB, FAISS, SQLite, BigQuery/GCP, Docker, GitHub Actions, OCR pipelines, Docusaurus, Streamlit, FastAPI-style APIs, Markdown/MDX publishing workflows.


Focus

I am most interested in roles where data science, AI engineering, and software systems meet: AI Engineer, Data Engineer, Senior Data Scientist Engineer, ML/LLMOps-oriented roles, and technical lead positions around reliable data and AI infrastructure.