Summary
I build ML infrastructure that runs on billions of devices.
Currently leading teams at Qualcomm working on LLM quantization and on-device AI deployment.
Previously,
Built AI Hub (SaaS platform) to enable model optimizations and deployment for edge.
Helped shape Apple's CoreML stack powering all on-device ML use cases in Apple ecosystem.
Worked on NVIDIA's optimizing and code-gen compiler that powers ML training and inference via CUDA.
I enjoy working at the intersection of compilers and machine learning - making models smaller, faster, and accessible to developers worldwide.
Work
Qualcomm
- Leading state-of-the-art quantization tool AIMET; Bringing in advanced LLM quantization techniques
- Shipping LLM quantization recipes with AIMET; Making them accessible to developers worldwide through AI Hub
- Improving qualcomm developer workflow: quantization, compilation, debugging and deployment on Snapdragon
- Performance optimizations: latency, memory footprint, graph infrastructure for GenAI
Tetra AI (acquired by Qualcomm)
- Launched AI Hub - first model zoo focused on on-device optimized models and deployment
- Led Microsoft Teams AI use-cases; Segmentation, Audio, Video-Codec
- Developed graph infrastructure and graph-to-graph transformations for iOS platform on CoreML Tools
- Launched SaaS platform for model optimization and deployment
Apple
- Designed MIL intermediate language - core to all CoreML model deployment
- Led ONNX-CoreML converter; Core contributor to CoreML Tools
- Enabled Stable Diffusion and GenAI models on-device
- Led auto-upgrade tool for model format migration; On-boarded Vision Pro
NVIDIA
- Developed compiler optimization controller for phase ordering and parameter tuning
NVIDIA
- LLVM compiler optimizations for Tegra Graphics and CUDA
- DWARF 2.0 debug frame support for CUDA 9.0
NVIDIA
- PBQP register allocator - improved 98% of graphics/compute use cases
Education
MS Computer Science
Stony Brook University
2017 - 2019
BTech Computer Engineering
VIT Pune
2011 - 2015
Skills
C++
Python
PyTorch
LLVM
ONNX
CoreML
TensorFlow
Quantization
Compilers
On-device ML