Introducing Mercury 2 – Inception | 56kode - AI & Frontend Development Blog

Inception introduces Mercury 2, a reasoning language model built for speed in production AI environments. It addresses the latency bottleneck of autoregressive models by using diffusion for parallel token generation.

Key Features and Benefits

Speed: Achieves 1,009 tokens/sec on NVIDIA Blackwell GPUs.
Cost: Priced at $0.25/1M input tokens and $0.75/1M output tokens.
Quality: Competitive with other speed-optimized models.
Features: Tunable reasoning, 128K context window, native tool use, and schema-aligned JSON output.

Mercury 2 optimizes for responsiveness, low latency under high concurrency, consistent behavior, and stable throughput.

Use Cases

Mercury 2 is suited for latency-sensitive applications:

Coding and Editing: Autocomplete, suggestions, and interactive code agents.
Agentic Loops: Optimizing campaign execution and enhancing delivery in real-time.
Real-time Voice and Interaction: Enabling natural and human-like voice interfaces.
Search and RAG Pipelines: Adding reasoning to search loops without exceeding latency budgets.

Getting Started

Mercury 2 is available now and is OpenAI API compatible, requiring no code rewrites. Inception offers partnerships for enterprise evaluations, workload fit, and performance validation.