reliability infra @ meta

Building distributed systems that stay up.

I'm Prabhav Nalhe — six years across backend and infrastructure, currently building service-dependency reliability at Meta. Interested in large-scale streaming systems and applying LLM agents to infrastructure problems.

9-figure$revenue exposure protected via fault-injection program
20GB/stelemetry stream-fusion in Rust across six sources
1,000+services adopted the dependency protection program
2-digit%share of top-severity incidents addressed by the platform

Experience

01
2024 — now

Meta

Software Engineer, Reliability Infra

Service-dependency reliability — a platform addressing a double-digit share of the company's top-severity incidents.

  • Built the canonical dependency graph in Rust — stream-fusion over six telemetry sources at ~20 GB/s.
  • Designed dependency notifications and a protection program adopted by 1,000+ services.
  • Led a fault-injection program that eliminated risky dependencies, protecting nine-figure revenue exposure.
  • Brought LLM agents into the stack to explain and classify dependency risk at fleet scale.
2023

Fidelity Investments

Software Engineer

Built a 15M-document search and indexing pipeline (AWS Lambda, SQS, Solr) and a real-time reporting system for financial research.

2018 — 21

Quest Global

Software Engineer

Voice authentication and OTA update infrastructure for automotive ECUs, serving 10K+ users.

Selected Work

02

Writing

03

The Service Graph Is a Lower Bound

The dependency graph you draw from RPC traffic is real but incomplete. The dependencies that take you down are…

5 min2026

Streaming Ingest in Rust

When you fuse many noisy event streams into one model, the hard problems are not throughput - they are identity and…

7 min2026

Closing the Loop

A pattern worth reusing: when an LLM makes a judgment at scale, do not let it grade itself. Pair a cheap predictor…

6 min2026

Agent Harness vs RAG

Two ways to feed a model context: retrieve fuzzy passages by similarity, or call purpose-built tools that return…

6 min2026
All notes

Skills & Background

04

Stack

languages
Python, Rust, C/C++, Java, TypeScript, SQL
systems
Distributed systems, streaming, fault injection, SLO/SLI
cloud / data
AWS, Kubernetes, Docker, Kafka, Redis, PostgreSQL, Solr
ai / ml
LLM agents, MCP, evals, prompt caching, inference routing

Education & Publication

Stony Brook UniversityM.S. Computer Science · GPA 3.92 · 2021–22
Pune Institute of Computer TechnologyB.E. Computer Science · 2014–18

Have something reliable to build?

Let's talk.

Or just say hi — nprabhav111@gmail.com