Le Duc Anh Tuan

Senior AI Engineer

charles.JPG

Hi, I’m Charles! Interested in LLM Systems, with experience spanning AI Research (ML, CV, Speech) to Applications (LLMs), and across the stack - from low-level GPU kernel optimization (CUDA, HIP) to high-level LLM system design, training to serving.

Industry:

  • Implemented a pure HIP C++ of OpenAI’s MoE GPT-OSS from scratch on AMD GPUs, optimized model loading, continuous batching, multi-streaming, multi-GPU communication, MoE scheduling, CPU-GPU–SRAM memory access, FlashAttention, and MFMA GEMM kernels; achieved 30K TPS (20B) and 10K TPS (120B) on a single node with 8× AMD MI250 GPUs and featured on r/LocalLLaMA.

  • Architected Leo - an end-to-end LLMOps system integrating offline and online pipelines for data, training, and inference; built offline pipelines for GET data, ETL, SFT, feature generation, and hybrid RAG ingestion enhanced by a contextual agent, all steps managed by an orchestrator; implemented an agentic RAG-based online system served via API with a React FE and proxy-connected BE, supporting scalable BE services, replica database nodes, vLLM inference engine; incorporated Redis prompt caching, short- and long-term memory modules, CI/CD RAG evaluation triggers (retrieval & generation), and real-time token streaming via SSE.

  • Developed a multimodal, multi-agent conversational recommendation system with vision and speech-to-speech interaction; integrated AdaptiveICL, synthetic data generation, retrieval-ranking pipelines and achieved Top 1 in track DSAI at Viettel Digital Talent 2024.

Research:

Community:

News 📰

Feb 4, 2026 Served as a Member of the Machine Learning Graduation Thesis Evaluation Committee (Term 2026.1) at HUST
Oct 10, 2025 Introduced gpt-oss-amd - a pure HIP C++ implementation from scratch (no rocBLAS/hipBLAS) of OpenAI’s MoE GPT-OSS, achieving over 30k TPS (20B) and 10k TPS (120B) on single node 8× AMD MI250 GPUs, featured on r/LocalLLaMA.
Jun 22, 2025 Released nvims — a lightning-fast, AI-powered editor with a beautiful UI and VSCode vibes in the terminal, featured on J2Team, MLCB, MiAI, and LinkedIn.
May 20, 2025 Our paper, “Speech Instruction Training Without Speech for Low-Resource Languages”, was accepted at Interspeech, a CORE Rank A conference in speech processing.
Oct 10, 2024 Top 1 in the DSAI track of Viettel Digital Talent 2024
Sep 28, 2024 Graduated from Hanoi University of Science and Technology (HUST)
Jun 20, 2024 Honored to receive the Best Presentation Award in Round 1 of the Viettel Digital Talent program
Feb 17, 2024 Excited to receive the physical GDG organizer certificate from Google’s Global Headquarters
Oct 3, 2023 Thrilled to share that my SOTA “3DNeRV” paper has advanced to phase 2 in AAAI 2024