Skip to content

Inference · Ollama

Ollama

Run open-weight LLMs locally with one command. OpenAI-compatible API.

FREEMIUMOpen sourceLocalmacOSWindowsLinuxCLIAPI

The de-facto way to pull and run open-weight models (Llama, Qwen, Gemma, DeepSeek, gpt-oss) on your own machine — no API key, no data leaving the device. Ships native macOS/Windows/Linux apps, an OpenAI-compatible server, and official Python/JS libraries. MIT-licensed and free locally; an optional paid Ollama Cloud runs larger models.

Model support

Multi-model

  • Llama
  • Qwen
  • Gemma
  • DeepSeek
  • Mistral
  • gpt-oss

Runs open-weight models locally; OpenAI-compatible API. Optional cloud for larger models.

Where it runs

  • macOS
  • Windows
  • Linux
  • CLI
  • API

Tags

  • #local
  • #open-source
  • #llm-runner
  • #self-hosted
Open OllamaGitHubDocs

Related in Inference

  • View Together AI details
    InferenceFREEMIUMVetted

    Together AI

    Together

    Fine-tuning + inference for open-weights models. Broad coverage.

    Hosted inference and fine-tuning across hundreds of open-weights models (Llama, Mistral, DeepSeek, Qwen, etc.). Strong pricing for inference-at-scale; LoRA + full fine-tuning supported.

    • inference
    • fine-tuning
    • open-weights
    • lora
  • View fal details
    InferenceFREEMIUM

    fal

    fal

    Serverless inference API for image, video, audio, and 3D models.

    A generative-media inference platform exposing FLUX, Kling, Veo, Wan, Stable Diffusion, and 600+ image/video/audio/3D models through one fast, serverless API — no GPUs to manage and near-zero cold starts. Pay per output or per GPU-second; free starter credits to test. Popular as the production backend for AI media features.

    • generative-media
    • image-gen
    • video-gen
    • serverless
  • View Groq details
    InferenceFREEMIUM

    Groq

    Groq

    Ultra-fast inference on custom LPU chips. Open-weights at 500+ tokens/sec.

    GroqCloud serves open-weights models (Llama, DeepSeek, Qwen, Kimi) on Groq's purpose-built LPU hardware, hitting hundreds of tokens per second where GPUs manage tens. OpenAI-compatible API with a free tier; the default when token latency is the product.

    • inference
    • low-latency
    • lpu
    • open-weights
  • View LM Studio details
    InferenceFREE

    LM Studio

    LM Studio

    Desktop app to discover, download, and run local LLMs privately.

    A GUI for running open-weight models on your own hardware — browse and download GGUF/MLX models, chat offline, and expose an OpenAI- and Anthropic-compatible local server for your apps. Includes RAG over local files, MCP tool-use support, and dual llama.cpp + Apple MLX runtimes. Free for personal and commercial use; the app itself is proprietary.

    • local
    • llm-runner
    • gui
    • privacy