reliability infra @ meta

Building distributed systems that stay up.

I'm Prabhav Nalhe - six years across backend and infrastructure, currently building service-dependency reliability at Meta. Interested in large-scale streaming systems and applying LLM agents to infrastructure problems.

Get in touch

GitHub LinkedIn Résumé

9-figure$revenue exposure protected via fault-injection program

20GB/stelemetry stream-fusion in Rust across six sources

1,000+services adopted the dependency protection program

2-digit%share of top-severity incidents addressed by the platform

Experience

2024 - now

Fidelity Investments

Software Engineer

Built a 15M-document search and indexing pipeline (AWS Lambda, SQS, Solr) and a real-time reporting system for financial research.

2018 - 21

Quest Global

Software Engineer

Voice authentication and OTA update infrastructure for automotive ECUs, serving 10K+ users.

Selected Work

Dependency Service ↗

~20 GB/s

ruststreaminggraphs

Problem

No single source of truth for how services depend on each other - six telemetry systems, each partial and inconsistent.

Approach

Rust streaming graph fusing all six sources in real time, reconciling conflicts into one canonical graph.

Result

The platform's source of truth - powering notifications, protection, and risk classification fleet-wide.

Dependency Explanations ↗

~$0.15 / pair

llm agentsevals

Problem

Raw dependency edges don't tell engineers why a dependency exists or whether it's safe to remove.

Approach

LLM agent harness over 16 purpose-built tools that investigates each edge and writes a human-readable explanation.

Result

Near-100% coverage - explanation and classification at fleet scale.

Fault-Injection Loop ↗

ground truth

chaos engml labels

Problem

Dependency criticality labels were heuristic guesses - no ground truth about what actually breaks.

Approach

Chaos experiments injecting faults under controlled conditions, producing ground-truth labels for the classifier.

Result

A self-improving loop - every experiment sharpens the classifier and retires risky dependencies.

⚡

Prefer to feel it instead?Break my website - a fault-injection sandbox where you take services down and watch the cascade.

→

Writing

The Service Graph Is a Lower Bound

The dependency graph you draw from RPC traffic is real but incomplete. The dependencies that take you down are…

5 min2026

Streaming Ingest in Rust

When you fuse many noisy event streams into one model, the hard problems are not throughput - they are identity and…

7 min2026

Closing the Loop

A pattern worth reusing: when an LLM makes a judgment at scale, do not let it grade itself. Pair a cheap predictor…

6 min2026

Agent Harness vs RAG

Two ways to feed a model context: retrieve fuzzy passages by similarity, or call purpose-built tools that return…

6 min2026

Browse the full graph13 interconnected notes · systems, reliability & llms

→

Skills & Background

Stack

languages: Python, Rust, C/C++, Java, TypeScript, SQL
systems: Distributed systems, streaming, fault injection, SLO/SLI
cloud / data: AWS, Kubernetes, Docker, Kafka, Redis, PostgreSQL, Solr
ai / ml: LLM agents, MCP, evals, prompt caching, inference routing

Education & Publication

Stony Brook UniversityM.S. Computer Science · GPA 3.92 · 2021-22

Pune Institute of Computer TechnologyB.E. Computer Science · 2014-18

IEEE TASLP · 2022Co-author · Deep Learning Driven NL Text to SQL Query Conversion

ask me about breaking things on purpose, professionally the 2018 RNN that answered questions in SQL how a live stream reaches a billion people

Quick Facts

nprabhav:~$ nprabhav --brief # for recruiters & hiring managers

role: Software Engineer - distributed systems & reliability infrastructure
now: Meta · Reliability Infra · 2024-present
experience: ~6 years - Meta · Fidelity Investments · Quest Global
location: San Jose, CA
core stack: Rust · Python · streaming systems · fault injection · LLM agents
proof: ~20 GB/s ingest · 1,000+ services protected · 9-figure exposure covered
education: M.S. CS, Stony Brook University · GPA 3.92
published: IEEE TASLP 2022 · NL text → SQL
start with: Dependency Service case study · the service-graph note

↧ résumé pdf in linkedin

# the copy button drops a clean plaintext summary - built for ATS fields & intro emails▊

Have something reliable to build?

Let's talk.

Or just say hi - nprabhav111@gmail.com