In the past couple of days, there has been a lot of talk about DeepSeek but plenty of others have been releasing exciting AI models. The Qwen2.5-VL Vision which was announced recently has advanced visual understanding. It is also agentic, so it can interact with computers and phones.
π ζεεθ΄’π§§π As we welcome the Chinese New Year, weβre thrilled to announce the launch of Qwen2.5-VL , our latest flagship vision-language model! π
π Qwen Chat: https://t.co/T0nMBnRVBB
π Blog: https://t.co/FU7qEgE46j
π€ Hugging Face: https://t.co/N9XSslZX8d
π€ ModelScope:β¦ pic.twitter.com/KgjC2lHcvRβ Qwen (@Alibaba_Qwen) January 27, 2025
This model can handle videos up to 1 hour long. It can also generate bounding boxes and JSON outputs for object detection. It offers structured data outputs. The above video just gives you a taste of what this model is capable of. You can find these models on Hugging Face at this point.
[HT]