Le Duc Anh Tuan

Senior AI Engineer

charles.JPG

Hi, I’m Charles! Interested in LLM Systems, with experience spanning AI Research (ML, CV, Speech) to Applications (LLMs), and across the stack - from low-level GPU kernel optimization (CUDA, HIP) to high-level LLM system design, training to serving.

Industry:

  • Implemented a pure HIP C++ of OpenAI’s MoE GPT-OSS from scratch on AMD GPUs, optimized model loading, continuous batching, multi-streaming, multi-GPU communication, MoE scheduling, CPU-GPU–SRAM memory access, FlashAttention, and MFMA GEMM kernels; achieved 30K TPS (20B) and 10K TPS (120B) on a single node with 8× AMD MI250 GPUs and featured on r/LocalLLaMA.

  • Architected Leo - an end-to-end LLMOps system integrating offline and online pipelines for data, training, and inference; built offline pipelines for GET data, ETL, SFT, feature generation, and hybrid RAG ingestion enhanced by a contextual agent, all steps managed by an orchestrator; implemented an agentic RAG-based online system served via API with a React FE and proxy-connected BE, supporting scalable BE services, replica database nodes, vLLM inference engine; incorporated Redis prompt caching, short- and long-term memory modules, CI/CD RAG evaluation triggers (retrieval & generation), and real-time token streaming via SSE.

  • Developed a multimodal, multi-agent conversational recommendation system with vision and speech-to-speech interaction; integrated AdaptiveICL, synthetic data generation, retrieval-ranking pipelines and achieved Top 1 in track DSAI at Viettel Digital Talent 2024.

Research:

Community:

News 📰

Oct 10, 2025 Introducing gpt-oss-amd - a pure HIP C++ implementation from scratch (no rocBLAS/hipBLAS) of OpenAI’s MoE GPT-OSS, achieving over 30k TPS (20B) and 10k TPS (120B) on single node 8× AMD MI250 GPUs, featured on r/LocalLLaMA.
Jun 22, 2025 Releasing nvims — a lightning-fast, AI-powered editor with a beautiful UI and VSCode vibes in the terminal, featured on J2Team, MLCB, MiAI, and LinkedIn.
May 20, 2025 Our paper, “Speech Instruction Training Without Speech for Low-Resource Languages”, was accepted at Interspeech, a CORE Rank A conference in speech processing.
Dec 23, 2024 Trained Speech Tokenizer to support multiple Asian languages, achieving SOTA results on viVoice and LibriTTS-R
Oct 28, 2024 Joining Homebrew (Singapore) as an LLM Researcher
Oct 10, 2024 Top 1 in the DSAI track of Viettel Digital Talent 2024
Sep 28, 2024 Graduated from Hanoi University of Science and Technology (HUST)
Jul 26, 2024 Onboarding as a Data Scientist at the Data Analytics Center, Viettel Telecom
Jul 6, 2024 Thrilled to be ranked among the top in the DSAI track and in the overall Top 50 within Viettel Group in the VDT’2024
Jun 21, 2024 Delighted to announce that the graduation thesis, ECOD, has achieved SOTA results on the COCO dataset with little trainable parameters
Jun 20, 2024 Honored to receive the Best Presentation Award in Round 1 of the Viettel Digital Talent program
Apr 4, 2024 Rejoining Viettel Group as a Digital Talent 2024
Feb 17, 2024 Honored to receive the physical GDG organizer certificate from Google’s Global Headquarters
Nov 11, 2023 Honored to be an Ambassador for MT Leader Nestlé
Oct 3, 2023 Thrilled to share that my SOTA “3DNeRV” paper has advanced to phase 2 in AAAI 2024
Oct 1, 2023 Proud to be an AI Mentor at the SheCodes Hackathon 2023
Sep 29, 2023 Embarking on an exciting journey as an Applied Scientist with VinBrain
Mar 3, 2023 Proudly stepping into the role of Computer Vision Researcher at the Camera Center, Viettel High Tech