Inference · OpenRouter

OpenRouter

One OpenAI-compatible API in front of 300+ models from every provider.

FREEMIUMCloudAPIWeb

A unified gateway that routes a single endpoint and API key to models from Anthropic, OpenAI, Google, Meta, DeepSeek, xAI, and more — swap models by changing one parameter, with automatic fallbacks and one consolidated bill. Pass-through token pricing plus dozens of free models.

Model support

Multi-model

Claude
GPT
Gemini
Llama
DeepSeek
Grok

Single OpenAI-compatible endpoint fronting 300+ models; routing + fallbacks.

Where it runs

Tags

#gateway
#routing
#multi-model
#fallbacks

Open OpenRouter Docs Pricing

Related in Inference

View Together AI details
InferenceFREEMIUMVetted
Together AI
Together
Fine-tuning + inference for open-weights models. Broad coverage.
Hosted inference and fine-tuning across hundreds of open-weights models (Llama, Mistral, DeepSeek, Qwen, etc.). Strong pricing for inference-at-scale; LoRA + full fine-tuning supported.
- inference
- fine-tuning
- open-weights
- lora
Open
View fal details
InferenceFREEMIUM
fal
fal
Serverless inference API for image, video, audio, and 3D models.
A generative-media inference platform exposing FLUX, Kling, Veo, Wan, Stable Diffusion, and 600+ image/video/audio/3D models through one fast, serverless API — no GPUs to manage and near-zero cold starts. Pay per output or per GPU-second; free starter credits to test. Popular as the production backend for AI media features.
- generative-media
- image-gen
- video-gen
- serverless
Open
View Groq details
InferenceFREEMIUM
Groq
Groq
Ultra-fast inference on custom LPU chips. Open-weights at 500+ tokens/sec.
GroqCloud serves open-weights models (Llama, DeepSeek, Qwen, Kimi) on Groq's purpose-built LPU hardware, hitting hundreds of tokens per second where GPUs manage tens. OpenAI-compatible API with a free tier; the default when token latency is the product.
- inference
- low-latency
- lpu
- open-weights
Open
View LM Studio details
InferenceFREE
LM Studio
LM Studio
Desktop app to discover, download, and run local LLMs privately.
A GUI for running open-weight models on your own hardware — browse and download GGUF/MLX models, chat offline, and expose an OpenAI- and Anthropic-compatible local server for your apps. Includes RAG over local files, MCP tool-use support, and dual llama.cpp + Apple MLX runtimes. Free for personal and commercial use; the app itself is proprietary.
- local
- llm-runner
- gui
- privacy
Open

Open OpenRouter

Multi-model

Together AI

fal

Groq

LM Studio