I'm Prabhav Nalhe — six years across backend and infrastructure, currently building service-dependency reliability at Meta. Interested in large-scale streaming systems and applying LLM agents to infrastructure problems.
Software Engineer, Reliability Infra
Service-dependency reliability — a platform addressing a double-digit share of the company's top-severity incidents.
Software Engineer
Built a 15M-document search and indexing pipeline (AWS Lambda, SQS, Solr) and a real-time reporting system for financial research.
Software Engineer
Voice authentication and OTA update infrastructure for automotive ECUs, serving 10K+ users.
No single source of truth for how services depend on each other — six telemetry systems, each partial and inconsistent.
Rust streaming graph fusing all six sources in real time, reconciling conflicts into one canonical graph.
The platform's source of truth — powering notifications, protection, and risk classification fleet-wide.
Raw dependency edges don't tell engineers why a dependency exists or whether it's safe to remove.
LLM agent harness over 16 purpose-built tools that investigates each edge and writes a human-readable explanation.
Near-100% coverage — explanation and classification at fleet scale.
Dependency criticality labels were heuristic guesses — no ground truth about what actually breaks.
Chaos experiments injecting faults under controlled conditions, producing ground-truth labels for the classifier.
A self-improving loop — every experiment sharpens the classifier and retires risky dependencies.
The dependency graph you draw from RPC traffic is real but incomplete. The dependencies that take you down are…
When you fuse many noisy event streams into one model, the hard problems are not throughput - they are identity and…
A pattern worth reusing: when an LLM makes a judgment at scale, do not let it grade itself. Pair a cheap predictor…
Two ways to feed a model context: retrieve fuzzy passages by similarity, or call purpose-built tools that return…