Google's TurboQuant algorithm, The changing tides of tech liability, Controlling your AI agent on the go

“
6Pages is a fantastic source for quickly gaining a deep understanding of a topic. I use their briefs for driving conversations with industry players.
— Associate Investment Director, Cambridge Associates
“
6Pages write-ups are some of the most comprehensive and insightful I’ve come across – they lay out a path to the future that businesses need to pay attention to.
— Head of Deloitte Pixel
“
At 500 Startups, we’ve found 6Pages briefs to be super helpful in staying smart on a wide range of key issues and shaping discussions with founders and partners.
— Thomas Jeng, Director of Innovation & Partnerships, 500 Startups
“
6Pages is a fantastic source for quickly gaining a deep understanding of a topic. I use their briefs for driving conversations with industry players.
— Associate Investment Director, Cambridge Associates
“
6Pages write-ups are some of the most comprehensive and insightful I’ve come across – they lay out a path to the future that businesses need to pay attention to.
— Head of Deloitte Pixel

“

6Pages write-ups are some of the most comprehensive and insightful I’ve come across – they lay out a path to the future that businesses need to pay attention to.

— Head of Deloitte Pixel

“

At 500 Startups, we’ve found 6Pages briefs to be super helpful in staying smart on a wide range of key issues and shaping discussions with founders and partners.

— Thomas Jeng, Director of Innovation & Partnerships, 500 Startups

“

6Pages is a fantastic source for quickly gaining a deep understanding of a topic. I use their briefs for driving conversations with industry players.

— Associate Investment Director, Cambridge Associates

Read by

Used at top MBA programs including

Mar 20 2026

3 Shifts Edition (Mar 20 2026): The resurgence of US service businesses, Amazon returns to ultra-fast deliveries, Payment protocols for agents

Mar 13 2026

3 Shifts Edition (Mar 13 2026): Deal-making and inference chips, AI players tying up with consulting, Nasdaq and NYSE want to tokenize stocks

Mar 6 2026

3 Shifts Edition (Mar 6 2026): Consumers drawn to stablecoin yields, Google's big app-store changes, From copper to fiber-optic interconnects

Feb 27 2026

3 Shifts Edition (Feb 27 2026): Claude Cowork's new skills and features, Apple products made in the USA, Nvidia is returning to consumer PCs

Feb 20 2026

3 Shifts Edition (Feb 20 2026): India’s big AI ambitions, Content marketplaces for AI, Uber's $100M bet on AV fast-charging stations

Feb 13 2026

3 Shifts Edition (Feb 13 2026): End-to-end SWE and AI fatigue, Boosting the output of existing nuclear plants, The consolidation of the lidar industry

Feb 6 2026

3 Shifts Edition (Feb 6 2026): The sudden merger of SpaceX and xAI, Direct-to-patient drugs is becoming the norm, Software and the legal sector

Jan 30 2026

3 Shifts Edition (Jan 30 2026): OpenClaw & open-source AI agents, AI is making new formulations & materials, The world model race

Jan 23 2026

3 Shifts Edition (Jan 23 2026): Vibe-coding’s inflection point, Chinese EVs are taking over, Data centers in space

Jan 16 2026

3 Shifts Edition (Jan 16 2026): Siri finally gets direction with Gemini, AI players aim for healthcare, Walmart takes drone delivery mainstream

All Briefs

Mar 27 2026

Chips Regulation AI

3 Shifts Edition (Mar 27 2026): Google's TurboQuant algorithm, The changing tides of tech liability, Controlling your AI agent on the go

10 min read

Listen on:

1. Google's TurboQuant algorithm

On Tuesday, Google Research described a software breakthrough called TurboQuant that’s been shaking up parts of the AI ecosystem. TurboQuant is an AI memory compression algorithm that Google claims is dramatically more efficient. In Google’s words, TurboQuant “reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency.” It has CDN (content delivery network) Cloudflare’s CEO, Matthew Prince, calling this “Google’s DeepSeek” moment. Despite TurboQuant still being in the lab, it has already sent memory-chip stocks tumbling by nearly $100B this week amid a broader decline. Micron’s stock price alone is down 11% since Tuesday.

Background: AI generally works by manipulating high-dimensional vectors – data structures that serve as mathematical representations of complex information like a word or image. Because these vectors are so memory-intensive, a key-value (KV) cache is used during an AI chat as a running "digital cheat sheet" of prior context/attention – avoiding repeat calculations and allowing for faster output generation. The KV cache for large models can become massive as chats progress, which means it becomes a limiting factor as it uses up memory, slows down responsiveness, and raises costs (e.g. hardware, power). This is one of the reasons (but not the only) why an AI chatbot can start breaking down during a long chat.

The process of vector quantization – mapping a larger input set (of continuous values) to a smaller output set (of discrete values) – can reduce high-dimensional vectors but typically at a cost in data loss, speed and/or memory overhead. TurboQuant’s promise of zero accuracy loss, an 8x speedup, and 6x reduction in needed memory seems to break this tradeoff.

It has industry watchers drawing similarities to the fictional Pied Piper lossless compression algorithm in the Silicon Valley TV series. Notably, other engineers have already undertaken a from-scratch PyTorch implementation of TurboQuant from the paper (Apr 2025), and found Google’s "zero accuracy loss" claim to be at least plausible. There hasn’t yet been a production, large-model implementation though.

To achieve its outcomes, TurboQuant leans on two other algorithms – PolarQuant and Quantized Johnson-Lindenstrauss (QJL). PolarQuant is used for high-quality compression, applying a clever technique in which the vector is converted into polar coordinates and randomly rotated in a way that maps the data onto a more predictable sphere-like grid. The other algorithm, QJL, is an efficient error-checker that addresses the tiny amount of error left from PolarQuant.

According to Google, TurboQuant is particularly useful in two areas – (1) compressing the KV cache during inference, and (2) for vector search. The first use case is being given the most attention right now, since the KV cache has become a key bottleneck in today’s AI production deployments. On the second use case, Google believes TurboQuant can be useful for vector search, which would be important for faster and more efficient semantic search at Google’s scale.

TurboQuant’s ability to compress the KV cache is particularly important during the decode stage of inference. During decode, a much smaller KV cache means less data needs to be moved from the memory to the processors for every token generated, so the processors aren’t sitting idle as long waiting for the data. It doesn’t remove the memory bottleneck (i.e. decode doesn’t become compute-bound) but it does raise the ceiling.

Just earlier this week, Menlo Ventures justified its investment in multi-silicon inference startup Gimlet Labs based on inference’s heterogeneous needs, saying: “Each stage requires different hardware: Prefill [translating queries into tokens] is compute-bound; decode [generating output] is memory-bound; and tool calls are network-bound.” People are now asking the question: Will decode continue to be memory-bound in the future?

Driving down the cost of inference is critical for the AI industry’s economics. Inference (vs. training) will be the larger part of the AI market. By one estimate, as many as 80-90% of GPUs (graphics processing units) are already being used for inference – up from about 40% two years ago. TurboQuant comes at a time when the industry is experiencing major memory-chip shortages, which have been driving up the cost of AI chipsets as well as the price of electronics from laptops to Playstation 5s. Earlier this month, memory-chip maker SK Group’s chairman indicated that he expected shortages to last for at least 4-5 years.

TurboQuant has the potential to impact the economics of the AI business, but maybe not as much as implied by the market reaction. First, it has less of an impact on training, as well as the prefill stage of inference. In the decode stage of inference, if KV cache is notionally 40-60% of memory in a datacenter, then a 6x reduction in KV cache would roughly translate into a 2x reduction overall. (This would be even less if actual KV cache memory savings were more like 2.7x, as some analysts believe, rather than 6x.) While the cost of serving could fall and make deployment more profitable for AI players, memory would remain a bottleneck, despite the higher ceiling.

Also, keep in mind that memory-chip shortages – given the current supply-demand imbalance and the extended time to stand up more capacity – are likely to persist in the near term. And further out, cheaper inference could mean even more usage (Jevons Paradox) and demand – especially if compression algorithms allow larger models and long-context AI to be run on consumer edge devices like laptops and phones.

Related Content:

Mar 13 2026 (3 Shifts): Deal-making and inference chips

Dec 19 2025 (3 Shifts): Memory-chip shortages & higher prices

Become an All-Access Member to read the full brief here

All-Access Members get unlimited access to the full 6Pages Repository of886 market shifts.

Become a Member

Already a Member?

Disclosure: Contributors have financial interests in Meta, Microsoft, Alphabet, OpenAI, Anthropic, and Discord. Google and OpenAI are vendors of 6Pages.

Have a comment about this brief or a topic you'd like to see us cover? Send us a note at tips@6pages.com.

Mar 20 2026

Economy AI Ecommerce

3 Shifts Edition (Mar 20 2026): The resurgence of US service businesses, Amazon returns to ultra-fast deliveries, Payment protocols for agents

Mar 13 2026

Chips AI Blockchain

3 Shifts Edition (Mar 13 2026): Deal-making and inference chips, AI players tying up with consulting, Nasdaq and NYSE want to tokenize stocks

Mar 6 2026

Stablecoin Data Centers App Stores

3 Shifts Edition (Mar 6 2026): Consumers drawn to stablecoin yields, Google's big app-store changes, From copper to fiber-optic interconnects

Feb 27 2026

Manufacturing Chips AI