Pradyumna Yadav

AI Engineer

AI Engineer specializing in agentic systems, AI infrastructure, and production-scale deployments. Currently at FuturePath AI building multi-modal AI products for Fortune 500 clients - from autonomous agents to real-time voice and knowledge systems.

Available for full-time roles and consulting projects

What I Work On

Explore my work

Agentic Systems

Multi-step agent workflows

Multi-tenant orchestration

Tool Use & Function Calling

Agent-as-a-Service platforms

RAG & Knowledge Systems

Real-time KB ingestion

Document clustering & routing

Semantic search & retrieval

Continuous sync pipelines

Voice AI

Real-time audio streaming

Cisco & Genesys Telephony

STT/TTS Pipelines

Post-call AI Summarization

GPU Inference

Multi-stream video inference

TensorRT Optimization

FP16/INT8 quantization

Edge Deployment (Jetson)

LLM Infrastructure

LLM gateway & routing

Multi-cloud LLM deployments

Prompt Engineering & DSPy

Observability & tracing

DevOps & Infra

Kubernetes (EKS/GKE/AKS)

Docker & GitHub Actions

Container security hardening

CI/CD & schema migrations

Experience

Where I've worked

LiteLLM

Feb 2026 - Present

Open Source Contributor

Contributing to LiteLLM's unified API gateway across 100+ LLM providers - fixing bugs and improving provider integrations for Anthropic, Bedrock, Vertex AI, and Moonshot Kimi.

FuturePath AI

Nov 2024 – Present

AI Engineer

Building real-time voice assistants for enterprise IT operations - phone agents that run on production Cisco and Genesys infrastructure. Also owns the knowledge layer: RAG pipelines that stay current with live SharePoint and ServiceNow data so agents always have the right context.

Graymatics

Apr 2023 – Nov 2024

AI Engineer

Took video intelligence from prototype to production - GPU pipelines processing 40+ concurrent streams on NVIDIA Deepstream, TensorRT-optimized to hit 117 QPS on YOLOv7. Also brought the same models to edge hardware via FP16 quantization on Jetson Nano.

Ensuredit

Apr 2022 – Aug 2022

Data Science Intern

Fine-tuned document understanding models on insurance policy PDFs and built an rPPG system that reads vitals - heart rate and blood pressure - from nothing but a webcam feed.

NIT Trichy

Oct 2021 – Feb 2022

Student Research Intern

Researched travel-time prediction using GNNs on road network graphs. First real exposure to applying ML on structured graph data - also built autoencoder pipelines for feature extraction on traffic datasets.

Projects

OpenClaw SaaS

2026

Managed SaaS that lets users run OpenClaw on cloud instead of their own machines. Container-per-tenant isolation on AWS ECS Fargate, wildcard ALB routing for personal subdomains, and EventBridge + Lambda for real-time container state sync.

AWS FargateECSPrismaNext.jsMulti-tenant

LLM Gateway

2025

Enterprise AI chat platform with document-grounded RAG. All LLM requests route through LiteLLM Proxy for provider abstraction and rate limiting. Celery workers handle async document processing, Qdrant for vector search, and Langfuse + Phoenix for observability.

FastAPILiteLLMQdrantCeleryRAG

Virtual Try-On

2024

Implemented Virtual Try-On using diffusion model checkpoints and CLIP prompt tuning; matched competitive baseline results.

Diffusion ModelsCLIPPyTorch

About

I build AI systems that ship - agentic workflows, RAG pipelines, GPU inference infrastructure, and the glue that holds them together in production. My work centers on closing the gap between what AI can do in a notebook and what it takes to run reliably at enterprise scale.

Currently at FuturePath AI, building autonomous agents and real-time AI products for Fortune 500 clients. Previously at Graymatics, building video intelligence infrastructure running 40+ concurrent streams on NVIDIA Deepstream.

Capabilities

•Voice AI & Real-Time Systems
•RAG & Knowledge Pipelines
•GPU Inference at Scale
•Agentic Systems & Orchestration
•LLM Infrastructure & Multi-Provider Routing
•Edge Deployment (Jetson, TensorRT)

Tools

•Python, TypeScript, C++, CUDA
•LlamaIndex, LangChain, LangGraph, DSPy
•LiveKit, Cisco/Genesys Telephony
•NVIDIA Deepstream, Triton, TensorRT, ONNX
•FastAPI, Celery, Prisma
•Docker, Kubernetes (EKS/GKE/AKS), GitHub Actions
•OpenAI, Azure OpenAI, AWS Bedrock, Vertex AI
•PostgreSQL, pgvector, Qdrant, Redis

Education

2023

IIIT Naya Raipur

B.Tech, Electronics & Communication Engineering

CGPA 8.25

Contact

Let's build something.

Currently available for full-time roles and consulting projects.

Book a call pradyumna.aky@gmail.com GitHub LinkedIn X