Skip to content

Introducing Mercury 2 – Inception

Published: at 10:37 AM
(Unknown)

Inception introduces Mercury 2, a reasoning language model built for speed in production AI environments. It addresses the latency bottleneck of autoregressive models by using diffusion for parallel token generation.

Key Features and Benefits

  • Speed: Achieves 1,009 tokens/sec on NVIDIA Blackwell GPUs.
  • Cost: Priced at $0.25/1M input tokens and $0.75/1M output tokens.
  • Quality: Competitive with other speed-optimized models.
  • Features: Tunable reasoning, 128K context window, native tool use, and schema-aligned JSON output.

Mercury 2 optimizes for responsiveness, low latency under high concurrency, consistent behavior, and stable throughput.

Use Cases

Mercury 2 is suited for latency-sensitive applications:

  1. Coding and Editing: Autocomplete, suggestions, and interactive code agents.
  2. Agentic Loops: Optimizing campaign execution and enhancing delivery in real-time.
  3. Real-time Voice and Interaction: Enabling natural and human-like voice interfaces.
  4. Search and RAG Pipelines: Adding reasoning to search loops without exceeding latency budgets.

Getting Started

Mercury 2 is available now and is OpenAI API compatible, requiring no code rewrites. Inception offers partnerships for enterprise evaluations, workload fit, and performance validation.

Read the original article