Inception introduces Mercury 2, a reasoning language model built for speed in production AI environments. It addresses the latency bottleneck of autoregressive models by using diffusion for parallel token generation.
Key Features and Benefits
- Speed: Achieves 1,009 tokens/sec on NVIDIA Blackwell GPUs.
- Cost: Priced at $0.25/1M input tokens and $0.75/1M output tokens.
- Quality: Competitive with other speed-optimized models.
- Features: Tunable reasoning, 128K context window, native tool use, and schema-aligned JSON output.
Mercury 2 optimizes for responsiveness, low latency under high concurrency, consistent behavior, and stable throughput.
Use Cases
Mercury 2 is suited for latency-sensitive applications:
- Coding and Editing: Autocomplete, suggestions, and interactive code agents.
- Agentic Loops: Optimizing campaign execution and enhancing delivery in real-time.
- Real-time Voice and Interaction: Enabling natural and human-like voice interfaces.
- Search and RAG Pipelines: Adding reasoning to search loops without exceeding latency budgets.
Getting Started
Mercury 2 is available now and is OpenAI API compatible, requiring no code rewrites. Inception offers partnerships for enterprise evaluations, workload fit, and performance validation.