Vision Lab
Point your camera, ask a question. A vision-language model runs entirely on your device via WebGPU. Pick SmolVLM-500M for low-RAM machines or Moondream2 for richer answers. Frames never leave the browser.
One-time setup
- Downloads the model from HuggingFace's CDN β happens once.
- Cached in your browser's persistent storage (Origin Private File System).
- Subsequent visits load in ~3 seconds with no network.
- Frames are processed on-device. The model's text answer is the only thing that leaves your browser (only when you press "Find compatible skills").
- Use a normal tab. Private/Incognito browsing wipes OPFS storage on reload, so the model would re-download every time.
Checking your browserβ¦
Bandwidth tip: the download is large. On cellular it'll eat your data plan; prefer Wi-Fi.
Loading modelβ¦
Connecting to HuggingFace CDNβ¦
First load takes 1β3 minutes. Future visits skip this entirely.
frame frozen β release to resume