Autoregressive Generation

19h

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%

DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.

Developer Tech

NVIDIA: DFlash block diffusion accelerates autoregressive LLMs

Deploying DFlash block diffusion on NVIDIA hardware accelerates autoregressive LLMs during latency-sensitive inference.

19d

Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes

Google's open-source diffusion language model generates 256 tokens in parallel and self-corrects, hitting 4x speed on one GPU ...

techtimes

DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster

In this photo illustration, the DeepSeek app is displayed on an iPhone screen on January 27, 2025 in San Anselmo, California. Newly launched Chinese AI app DeepSeek has surged to number one in Apple's ...

Forbes

Beyond Autoregression: A New Model For Text Generation

Every time a language model like GPT-4, Claude or Mistral generates a sentence, it does something deceptively simple: It picks one word at a time. This word-by-word approach is what gives ...

20d

Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster

Another day, another AI model from Google. This time, Google DeepMind has released a new member of the Gemma 4 open model ...

XDA Developers on MSN

I tried Google's new DiffusionGemma, and watching it generate text like an image is unlike any local LLM

Google recently released DiffusionGemma, and it's weird in the best way.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results