What is Google’s new AI algorithm that has sent stocks of Samsung and other biggest memory makers plummeting

Google’s new TurboQuant algorithm drastically cuts AI model memory needs, impacting memory chip stocks like SK Hynix and Kioxia. This innovation targets the AI’s ‘memory’ cache, compressing it significantly without quality loss. While flash memory faces pressure, HBM remains largely unaffected, with some analysts predicting increased demand due to enhanced AI capabilities.

Google has released a new compression algorithm this week that it says can shrink the memory an AI model needs during inference by at least six times—and the announcement rattled one of the hottest trades in tech. Shares of SK Hynix fell as much as 6.4% on the Korea Exchange. Samsung dropped nearly 5%. Japan’s Kioxia—which had surged over 700% since August on AI-fuelled optimism—slid sharply. Micron and Sandisk both took hits in New York.The two-day selloff cracked open a fault line that had been quietly building under memory stocks: the assumption that AI demand for chips would only ever go up.

The algorithm targets the part of AI that ‘remembers’ your conversation

The algorithm, called TurboQuant, goes after something called the key-value (KV) cache. Every time a large language model processes a conversation, it stores past calculations in this cache—a digital cheat sheet—so it doesn’t have to recompute them from scratch. The longer the conversation, the heavier this cheat sheet gets, and the faster it eats into GPU memory.TurboQuant compresses that cheat sheet aggressively using two techniques. The first, PolarQuant, converts the high-dimensional vectors that AI models rely on from standard XYZ coordinates into polar form—the difference between saying “Go 3 blocks East, 4 blocks North” and “Go 5 blocks at a 37-degree angle.” Same destination, fewer numbers.The second technique, called Quantized Johnson-Lindenstrauss (QJL), applies a 1-bit error-correction pass to clean up whatever inaccuracies PolarQuant leaves behind. The result: models can run at just 3-bit precision with no quality loss and no retraining. On Nvidia H100 accelerators, Google recorded an 8x speedup in computing attention logits—the process by which a model decides what in a prompt actually matters.

Flash memory got crushed; HBM walked away mostly fine

Not every corner of the memory market was equally spooked, and analysts were quick to make the distinction. TurboQuant’s efficiency gains are specific to inference and the KV cache, which means the real threat falls on NAND flash memory—not high-bandwidth memory (HBM).HBM is what sits inside Nvidia’s AI accelerators and powers the training infrastructure at companies like Microsoft and Meta. Bloomberg Intelligence analyst Jake Silverman wrote that HBM demand, along with DRAM made by Micron, would “likely be unaffected.” Morgan Stanley made the same call. It’s Kioxia and Sandisk—both heavily exposed to NAND—that absorbed the worst of the damage.

Why some analysts think this actually means more chip demand, not less

Several analysts pushed back on the panic entirely, reaching for a 19th-century economic theory to make their case. The Jevons Paradox, originally about coal, holds that making a resource more efficient tends to increase its consumption—because efficiency opens up use cases that didn’t exist before.JPMorgan’s trading desk cited the paradox in a note, arguing there’s no near-term threat to memory consumption. SemiAnalysis analyst Ray Wang told CNBC that fixing a bottleneck makes AI hardware more capable, and more capable models will eventually need more memory, not less. Quilter Cheviot’s Ben Barringer put it plainly: TurboQuant is “evolutionary, not revolutionary.”

Developers already running it locally—and it works

Away from the market noise, the algorithm has immediate practical value for anyone running AI outside a data centre. Google released TurboQuant publicly with no licensing restrictions and no retraining requirement—meaning it can be dropped into existing models immediately.Within 24 hours, developers had already ported it to local AI frameworks, including MLX for Apple Silicon. One community benchmark ran the Qwen3.5-35B model across context lengths up to 64,000 tokens at 2.5-bit TurboQuant and found perfect accuracy across the board. For enterprises with strict data privacy needs, or developers pushing the limits of on-device AI, this is the kind of software efficiency gain that quietly changes what’s possible on existing hardware.

Source link

Soch Times

Soch Times

What is Google’s new AI algorithm that has sent stocks of Samsung and other biggest memory makers plummeting

The algorithm targets the part of AI that ‘remembers’ your conversation

Flash memory got crushed; HBM walked away mostly fine

Why some analysts think this actually means more chip demand, not less

Developers already running it locally—and it works

Sipu Jha

Related Posts

Telangana hockey chief ‘Bangaru Babu’ held with gold worth Rs 2.3 crore; 1.4 kg seized at Delhi airport | Hyderabad News

Aneet Padda’s sister Reet Padda goes private on Instagram after ‘Dhurandhar Propaganda’ backlash | Hindi Movie News

Leave a Reply Cancel reply

You Missed

Telangana hockey chief ‘Bangaru Babu’ held with gold worth Rs 2.3 crore; 1.4 kg seized at Delhi airport | Hyderabad News

West Bengal Assembly Elections 2026: Cash, care and constant connect: How the ‘Didi model’ is shaping Bengal’s ballot | India News

Aneet Padda’s sister Reet Padda goes private on Instagram after ‘Dhurandhar Propaganda’ backlash | Hindi Movie News

Two months after announcing Gemini deal for Siri, Apple hires ex-Google executive to head AI push

How a leaked AI model that Anthropic is reportedly ‘scared’ to launch wiped out $14.5 billion from cybersecurity stocks in one day

Explained: Why RCB players will wear black armbands against SRH in IPL 2026 opener | Cricket News