What are Diffusion LLMs (dLLMs)?

Diffusion LLMs are a new type of large language model that generate many tokens in parallel instead of one token at a time. This parallel generation approach makes them dramatically faster and more efficient for real-time AI applications.

What is Mercury from Inception?

Mercury is a family of diffusion LLMs from Inception Labs. Mercury models are designed for ultra-fast, high-quality text generation and can deliver over 1,000 tokens per second on modern GPUs.

How does Mercury work inside SearchBlox SearchAI?

SearchAI acts as the GenAI platform and orchestration layer for enterprise search, assistants and agents. Mercury runs as the LLM engine inside SearchAI, generating answers at diffusion speed while SearchAI handles data access, permissions, and trusted retrieval.

Is Mercury compatible with OpenAI-style APIs?

Yes. Mercury supports an OpenAI-compatible API, which means SearchAI can connect to Mercury using standard OpenAI-style configuration without changing your applications.

Can Mercury be deployed privately for enterprise use?

Yes. Mercury supports private deployments in customer environments such as a VPC or private cloud. This allows enterprises to keep prompts, responses and data in their own security boundary.

What context length do Mercury diffusion LLMs support?

Mercury diffusion LLMs support long-context workloads, including up to a 128K context window for processing large documents and multi-file RAG tasks.

What enterprise use cases benefit most from SearchAI with Mercury?

The joint solution is ideal for AI search, chatbots, knowledge assistants, and AI agents that require fast, trusted answers from enterprise data. It is especially valuable where latency, privacy, and predictable costs are critical.

Home

SearchAI

11

Enterprise Search

2

Solutions

4

Pricing

Resources

11

About

Home

SearchAI

11

Enterprise Search

2

Solutions

4

Pricing

Resources

11

About

SearchAI Partner Ecosystem

SearchBlox SearchAI + Mercury Diffusion LLMs

Enterprise GenAI. Now at Diffusion Speed.

SearchBlox has integrated Mercury diffusion models from Inception as a premier inference partner. You can now choose the Mercury Model to power your SearchAI platform - delivering private, on-premise generation at unprecedented speeds.

Enterprise GenAI. Now at Diffusion Speed.

SearchBlox has integrated Mercury diffusion models from Inception as a premier inference partner. You can now choose the Mercury Model to power your SearchAI platform - delivering private, on-premise generation at unprecedented speeds.

Book a SearchAI + Mercury demo

Up to 10× faster

Quicker generation than today’s speed-optimized LLMs

Lower Inference Cost

Higher GPU efficiency reduces per-request cost.

128k CONTEXT

Handles long contracts, PDFs, and multi-doc queries

Faster Search. Lower Costs.
Powered by Diffusion.

Faster Search. Lower Costs. Powered by Diffusion.

Why "Diffusion" Changes Everything.

Old Way: One Word at a Time (Autoregressive)

Traditional models (like GPT-4) act like a slow typist. They guess the next word, then the next, in a straight line. This is called "Auto-regressive," and it creates a speed limit you can't break.

Old Way: One Word at a Time (Autoregressive)

Traditional models (like GPT-4) act like a slow typist. They guess the next word, then the next, in a straight line. This is called "Auto-regressive," and it creates a speed limit you can't break.

New Way: The Whole Idea at Once (Diffusion)

Mercury diffusion LLMs (dLLMs) generate multiple tokens in parallel instead of one at a time. The model starts from noisy text and refines it through a few denoising steps until the final answer is ready.

New Way: The Whole Idea at Once (Diffusion)

Mercury diffusion LLMs (dLLMs) generate multiple tokens in parallel instead of one at a time. The model starts from noisy text and refines it through a few denoising steps until the final answer is ready.

The Result: Answers appear instantly, not word-by-word

SearchAI + Mercury Demo with your Own Data

Why Speed Matters.

Latency is the #1 friction point for Enterprise GenAI adoption.

Feature

Document enrichment with metadata and automatic tagging (100 web pages)

Generation Architecture

Free Tokens

Data Privacy

OpenAI (GPT-5.1)

13m 04s

Auto-Regressive

None

Public Cloud

OpenAI (GPT-5.1)

Mercury Diffusion LLM (dLLM)

9m 43s

Diffusion (Parallel)

10 Million

Private deployment in your VPC or private cloud

Inception (Mercury)

Where Speed Meets Action.

Speed is the next UX. Make it an advantage across customer and employee experiences

SearchAI Assist powered by Mercury diffusion models accelerates legal workflows, delivering 50% faster review cycles while summarizing 100-page contracts side-by-side.

SearchAI ChatBot with Mercury diffusion models achieves 60–70% higher resolution rates by instantly answering customer tickets with zero-latency diffusion generation.

SearchAI ChatBot with Mercury diffusion models achieves 60–70% higher resolution rates by instantly answering customer tickets with zero-latency diffusion generation.

SmartFAQs drives 40% more organic traffic by using high-speed diffusion to auto-generate schema-ready questions for 10,000+ pages overnight.

SmartSuggest reduces cart abandonment by up to 15% by leveraging diffusion speed to predict complex user intent mid-sentence as they type.

Teenage asian woman drinking coffee while sitting on kitchen counter and working on smart phone in morning at home.

SearchAI Agents result in 30% fewer helpdesk tickets by using instant diffusion reasoning to route and resolve support issues autonomously.

Everything you'll need to know about SearchAI

Schedule A Demo

SearchAI Partner Ecosystem

SearchBlox SearchAI + Mercury Diffusion LLMs

Enterprise GenAI. Now at Diffusion Speed.

SearchBlox has integrated Mercury diffusion models from Inception as a premier inference partner. You can now choose the Mercury Model to power your SearchAI platform - delivering private, on-premise generation at unprecedented speeds.

Book a SearchAI + Mercury demo