Projects

LLM Inference Optimization Engine

A from-scratch inference engine for transformer-based language models, implementing KV caching, paged attention, speculative decoding, FlashAttention, and low-precision matrix multiply to maximize throughput and reduce latency.

CUDA Python C++ FlashAttention Quantization
View project →

Machine Learning Projects

A collection of ML experiments spanning classification, data mining, and model experimentation — exploring techniques from traditional statistical learning through deep neural networks.

PyTorch Python scikit-learn Data Mining
View project →