DeepSeek has launched FlashMLA, an open-source MLA decoding kernel optimized for Hopper GPUs. It supports BF16, offers high performance with 3000 GB/s memory ba… … Read More