I am a 3-year experienced ML Engineer specializing in making AI models lightweight, fast, and efficient. My primary focus is on Efficient AI, particularly Quantization. I optimize various models (Vision, Audio, LLM) for mobile, GPU, and NPU platforms.
06/2022 - Present
- Optimizing models for target hardware & platforms
 - Enhancing performance-speed trade-offs through PTQ and QAT
 - Conducted benchmarking of vLLM and TensorRT-LLM serving
 
07/2021 - 08/2021
- Built AWS 3-tier web service using Terraform
 
08/2023 - Present
[Website] [Github] [OwLite Examples]
- Developed a framework for easy model quantization from PyTorch to TensorRT
 - Implemented various quantization algorithms and simulations
 - Produced various examples and identified optimization patterns
 
02/2024 - 06/2024
[Website]
- Conducted comprehensive performance benchmarking of LLM serving frameworks
 - Implemented evaluation module
 - Wrote blog post, [vLLM vs TensorRTLLM] weight-activation quantization
 
02/2024 - 06/2024
- Presented poster at Interspeech 2024 
RepTor: Re-parameterizable Temporal Convolution for Keyword Spotting via Differentiable Kernel Search - Developed CNN-based KWS model using structural reparameterization
 - Implemented Latency-aware Neural Architecture Search
 - Achieved 97.9% accuracy with 183μs latency on Galaxy S10 CPU
 
- Bachelor's in IT Convergence Engineering
 - 03/2016 - 09/2023
 
- 03/2014 - 02/2016
 



