Here is an AI agent that uses computers like a human. Open Computer Agent can be run in your browser with no installation. You can ask it to use Google Maps to find you a place, go to Wikipedia and get information, and do everything in between. This tool is powered by smolagents, Qwen2-VL-72B, and E2B Desktop, which are open source.
Test it here: https://t.co/RTBitVeuKD
Open Computer Agent runs on a cloud Linux VM with apps like Firefox.
Give it a task like “Find Hugging Face HQ on Google Maps,” and it clicks, types, and navigates like a person would.
But…it’s slow. It struggles with CAPTCHAs. And…
— Ihtesham Haider (@ihteshamit) May 7, 2025
The Qwen2-VL-72B model has great understanding of images of various resolutions and ratios. It can understand videos up to 20 minutes.
[HT]