Le Duc Anh Tuan

Hi, I’m Charles! Interested in LLM Systems, with experience spanning AI Research (ML, CV, Speech) to Applications (LLMs), and across the stack - from low-level GPU kernel optimization (CUDA, HIP) to high-level LLM system design, training to serving.

Industry:

Implemented a pure HIP C++ of OpenAI’s MoE GPT-OSS from scratch on AMD GPUs, optimized model loading, continuous batching, multi-streaming, multi-GPU communication, MoE scheduling, CPU-GPU–SRAM memory access, FlashAttention, and MFMA GEMM kernels; achieved 30K TPS (20B) and 10K TPS (120B) on a single node with 8× AMD MI250 GPUs and featured on r/LocalLLaMA.
Architected Leo - an end-to-end LLMOps system integrating offline and online pipelines for data, training, and inference; built offline pipelines for GET data, ETL, SFT, feature generation, and hybrid RAG ingestion enhanced by a contextual agent, all steps managed by an orchestrator; implemented an agentic RAG-based online system served via API with a React FE and proxy-connected BE, supporting scalable BE services, replica database nodes, vLLM inference engine; incorporated Redis prompt caching, short- and long-term memory modules, CI/CD RAG evaluation triggers (retrieval & generation), and real-time token streaming via SSE.
Developed a multimodal, multi-agent conversational recommendation system with vision and speech-to-speech interaction; integrated AdaptiveICL, synthetic data generation, retrieval-ranking pipelines and achieved Top 1 in track DSAI at Viettel Digital Talent 2024.

Research:

Researched Speechless (Ichigo family LLM model), aiming to generate synthetic semantic audio representations from multimodal inputs, trained on semantic tokens generated by Ichigo Whisper; published a paper accepted at Interspeech 2025 (CORE Rank A conference in Speech Processing).
Proposed “Efficient Continual Detection Transformer”, leveraging pseudo-labeling, knowledge distillation, and LoRA; achieved high performance with only 3% trainable parameters compared to RT-DETR on COCO dataset.

Community:

Released and maintained open-source projects for developers, including gpt-oss-amd (150 ⭐️), nvims (100 ⭐️), leo, gemino; contributed to leading projects like Ichigo (2.4k ⭐️) and WhisperSpeech (4.5k ⭐️).
Organized multiple technical events for the developer community, such as VinAI Day, Google I/O Extended, Google DevFest, International Women’s Day x Flutter Forward Extended, and Google Build with AI; spoke at Google DevFest 2022 on “Detecting Cheating in Examinations” (highlighted on VTV24); certified by Google’s Global Headquarters.

News 📰

Feb 4, 2026	Served as a Member of the Machine Learning Graduation Thesis Evaluation Committee (Term 2026.1) at HUST
Oct 10, 2025	Introduced `gpt-oss-amd` - a pure HIP C++ implementation from scratch (no rocBLAS/hipBLAS) of OpenAI’s MoE GPT-OSS, achieving over 30k TPS (20B) and 10k TPS (120B) on single node 8× AMD MI250 GPUs, featured on r/LocalLLaMA.
Jun 22, 2025	Released nvims — a lightning-fast, AI-powered editor with a beautiful UI and VSCode vibes in the terminal, featured on J2Team, MLCB, MiAI, and LinkedIn.
May 20, 2025	Our paper, “Speech Instruction Training Without Speech for Low-Resource Languages”, was accepted at Interspeech, a CORE Rank A conference in speech processing.
Oct 10, 2024	Top 1 in the DSAI track of Viettel Digital Talent 2024
Sep 28, 2024	Graduated from Hanoi University of Science and Technology (HUST)
Jun 20, 2024	Honored to receive the Best Presentation Award in Round 1 of the Viettel Digital Talent program
Feb 17, 2024	Excited to receive the physical GDG organizer certificate from Google’s Global Headquarters
Oct 3, 2023	Thrilled to share that my SOTA “3DNeRV” paper has advanced to phase 2 in AAAI 2024