Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script — ROCm Blogs
Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script — ROCm Blogs
Learn how to improve model performance with hipBLASLt offline tuning in our easy-to-use Day 0 tool for developers to optimize GEMM efficiency
November 04, 2025
High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs — ROCm Blogs
High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs — ROCm Blogs
Learn to leverage AMD Quark for efficient MXFP4/MXFP6 quantization on AMD Instinct accelerators with high accuracy retention.
October 28, 2025
Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation — ROCm Blogs
Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation — ROCm Blogs
Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.
October 23, 2025
Making Telcos Run Leaner and AI-Ready with AMD
Making Telcos Run Leaner and AI-Ready with AMD
Learn how telecommunication companies are running leaner and becoming AI-ready with AMD processors
October 20, 2025
AI in the Data Center: Strategies for Efficiency and Sustainability
AI in the Data Center: Strategies for Efficiency and Sustainability
Data centers already consume nearly 2% of global electricity, and AI adoption is set to accelerate that figure. Set your team up for success and scale…
October 17, 2025
Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs — ROCm Blogs
Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs — ROCm Blogs
Learn how AMD Instinct MI355 Series GPUs deliver competitive Kimi-K2 inference with faster TTFT, lower latency, and strong throughput.
October 15, 2025
Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More — ROCm Blogs
Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More — ROCm Blogs
Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.
October 13, 2025
GEMM Tuning within hipBLASLt– Part 2 — ROCm Blogs
GEMM Tuning within hipBLASLt– Part 2 — ROCm Blogs
Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.
October 08, 2025