Loading…
Vision · M87 Labs
Tiny open vision-language model for efficient image understanding.
An open-weights family of small vision-language models for captioning, visual Q&A, pointing, counting, and object detection — small enough to run on-device (checkpoints down to 0.5B on Hugging Face). Run it locally with the Photon engine, or call Moondream Cloud's OpenAI-compatible API with a free monthly credit tier and pay-per-image pricing.
Model support
Ships its own open vision-language weights.
Where it runs
Tags