01
Efficient AI Inference
I care about real deployment behavior, not only benchmark numbers. That includes latency, memory movement, and inference efficiency under practical constraints.
AI Systems / Efficient Inference / Hardware-Aware Execution
I study and build efficient AI systems across models, inference pipelines, and hardware-aware execution. My current interests include multimodal learning, CUDA optimization, runtime scheduling, and edge-oriented AI infrastructure.
Beyond model usage, toward systems thinking.
I am interested in how AI models actually run in practice, including inference efficiency, memory behavior, runtime scheduling, and the coupling between models and hardware.
From CUDA kernel optimization and RAG-based code analysis to multimodal research and system-level inference thinking.
Focus
My current focus is how modern AI models can run more efficiently in real systems, especially under constraints of memory, latency, and hardware resources.
01
I care about real deployment behavior, not only benchmark numbers. That includes latency, memory movement, and inference efficiency under practical constraints.
02
I am interested in treating quantization, operator fusion, code generation, and runtime scheduling as one connected systems problem.
03
My perspective is shaped by memory hierarchy, data locality, register pressure, instruction throughput, and the realities of edge and GPU platforms.
Work
These projects show how I approach AI systems problems through both research reasoning and hands-on implementation.
01
AI SystemsImplemented and optimized GEMM kernels with shared memory tiling, register blocking, and profiling-guided analysis to improve arithmetic intensity and execution performance.
02
AI SystemsBuilt a repository analysis system that combines LLMs, AST-based chunking, vector retrieval, and CUDA-aware parsing for structured code understanding.
03
AI SystemsDesigned a multimodal video-to-text pipeline with transformer-based alignment and efficiency-oriented system thinking.
Profile
My background combines research-oriented study with practical engineering experience across data systems, maintenance, networking, and technical tool building.
Capability Thread
01-05Links
Currently focused on machine learning systems, low-level operator optimization, GPU programming, and AI infrastructure engineering.