Pradyumna Yadav

AI Engineer

AI Engineer specializing in agentic systems, AI infrastructure, and production-scale deployments. Currently at FuturePath AI building multi-modal AI products for Fortune 500 clients - from autonomous agents to real-time voice and knowledge systems.

Available for full-time roles and consulting projects

What I Work On

Explore my work

Agentic Systems

Multi-step agent workflows
Multi-tenant orchestration
Tool Use & Function Calling
Agent-as-a-Service platforms

RAG & Knowledge Systems

Real-time KB ingestion
Document clustering & routing
Semantic search & retrieval
Continuous sync pipelines

Voice AI

Real-time audio streaming
Cisco & Genesys Telephony
STT/TTS Pipelines
Post-call AI Summarization

GPU Inference

Multi-stream video inference
TensorRT Optimization
FP16/INT8 quantization
Edge Deployment (Jetson)

LLM Infrastructure

LLM gateway & routing
Multi-cloud LLM deployments
Prompt Engineering & DSPy
Observability & tracing

DevOps & Infra

Kubernetes (EKS/GKE/AKS)
Docker & GitHub Actions
Container security hardening
CI/CD & schema migrations

Experience

Where I've worked

LiteLLM

LiteLLM

Feb 2026 - Present

Open Source Contributor

Contributing to LiteLLM's unified API gateway across 100+ LLM providers - fixing bugs and improving provider integrations for Anthropic, Bedrock, Vertex AI, and Moonshot Kimi.

FuturePath AI

FuturePath AI

Nov 2024 – Present

AI Engineer

Building real-time voice assistants for enterprise IT operations - phone agents that run on production Cisco and Genesys infrastructure. Also owns the knowledge layer: RAG pipelines that stay current with live SharePoint and ServiceNow data so agents always have the right context.

Graymatics

Graymatics

Apr 2023 – Nov 2024

AI Engineer

Took video intelligence from prototype to production - GPU pipelines processing 40+ concurrent streams on NVIDIA Deepstream, TensorRT-optimized to hit 117 QPS on YOLOv7. Also brought the same models to edge hardware via FP16 quantization on Jetson Nano.

Ensuredit

Ensuredit

Apr 2022 – Aug 2022

Data Science Intern

Fine-tuned document understanding models on insurance policy PDFs and built an rPPG system that reads vitals - heart rate and blood pressure - from nothing but a webcam feed.

NIT Trichy

NIT Trichy

Oct 2021 – Feb 2022

Student Research Intern

Researched travel-time prediction using GNNs on road network graphs. First real exposure to applying ML on structured graph data - also built autoencoder pipelines for feature extraction on traffic datasets.

Projects

O

OpenClaw SaaS

2026

Managed SaaS that lets users run OpenClaw on cloud instead of their own machines. Container-per-tenant isolation on AWS ECS Fargate, wildcard ALB routing for personal subdomains, and EventBridge + Lambda for real-time container state sync.

AWS FargateECSPrismaNext.jsMulti-tenant
L

LLM Gateway

2025

Enterprise AI chat platform with document-grounded RAG. All LLM requests route through LiteLLM Proxy for provider abstraction and rate limiting. Celery workers handle async document processing, Qdrant for vector search, and Langfuse + Phoenix for observability.

FastAPILiteLLMQdrantCeleryRAG
V

Virtual Try-On

2024

Implemented Virtual Try-On using diffusion model checkpoints and CLIP prompt tuning; matched competitive baseline results.

Diffusion ModelsCLIPPyTorch

About

I build AI systems that ship - agentic workflows, RAG pipelines, GPU inference infrastructure, and the glue that holds them together in production. My work centers on closing the gap between what AI can do in a notebook and what it takes to run reliably at enterprise scale.

Currently at FuturePath AI, building autonomous agents and real-time AI products for Fortune 500 clients. Previously at Graymatics, building video intelligence infrastructure running 40+ concurrent streams on NVIDIA Deepstream.

Capabilities

  • Voice AI & Real-Time Systems
  • RAG & Knowledge Pipelines
  • GPU Inference at Scale
  • Agentic Systems & Orchestration
  • LLM Infrastructure & Multi-Provider Routing
  • Edge Deployment (Jetson, TensorRT)

Tools

  • Python, TypeScript, C++, CUDA
  • LlamaIndex, LangChain, LangGraph, DSPy
  • LiveKit, Cisco/Genesys Telephony
  • NVIDIA Deepstream, Triton, TensorRT, ONNX
  • FastAPI, Celery, Prisma
  • Docker, Kubernetes (EKS/GKE/AKS), GitHub Actions
  • OpenAI, Azure OpenAI, AWS Bedrock, Vertex AI
  • PostgreSQL, pgvector, Qdrant, Redis

Education

2023

IIIT Naya Raipur

B.Tech, Electronics & Communication Engineering

CGPA 8.25

Contact

Let's build something.

Currently available for full-time roles and consulting projects.

© 2026 Pradyumna Yadav