Summary
I build ML infrastructure across the entire stack - from apps to compilers.
My work runs on billions of devices worldwide.
Leads quantization at Qualcomm, powering on-device GenAI via AI Hub Workbench.
Previously,
Built AI Hub - first on-device model zoo, SaaS platform, and apps.
Shaped Apple's on-device ML stack - conversion tools, CoreML Framework, ML Runtime.
Built GPU compilers at NVIDIA - CUDA and SPIR-V.
Excited about ML efficiency across the spectrum - on-device, edge, and data center scale.
Work
Qualcomm
- Leading state-of-the-art quantization tool AIMET; Bringing in advanced LLM quantization techniques
- Shipping LLM quantization recipes with AIMET; Making them accessible to developers worldwide through AI Hub
- Improving qualcomm developer workflow: quantization, compilation, debugging and deployment on Snapdragon
- Performance optimizations: latency, memory footprint, graph infrastructure for GenAI
Tetra AI (acquired by Qualcomm)
- Launched AI Hub - first model zoo focused on on-device optimized models and deployment
- Led Microsoft Teams AI use-cases; Segmentation, Audio, Video-Codec
- Developed graph infrastructure and graph-to-graph transformations for iOS platform on CoreML Tools
- Launched SaaS platform for model optimization and deployment
Apple
- Designed MIL intermediate language - core to all CoreML model deployment
- Led ONNX-CoreML converter; Core contributor to CoreML Tools
- Enabled Stable Diffusion and GenAI models on-device
- Led auto-upgrade tool for model format migration; On-boarded Vision Pro
NVIDIA
- Developed compiler optimization controller for phase ordering and parameter tuning
NVIDIA
- LLVM compiler optimizations for Tegra Graphics and CUDA
- DWARF 2.0 debug frame support for CUDA 9.0
NVIDIA
- PBQP register allocator - improved 98% of graphics/compute use cases
Education
MS Computer Science
Stony Brook University
2017 - 2019
BTech Computer Engineering
VIT Pune
2011 - 2015
Skills
C++
Python
PyTorch
LLVM
ONNX
CoreML
TensorFlow
Quantization
Compilers
On-device ML