The Qwen 3 VL model is a multimodal AI model from Alibaba Cloud that can use advanced reasoning to process text, images, and videos. It is available in various sizes. Even the 4B version requires a decent GPU. The good news is it is possible to run Qwen 3 VL on an iPhone 17 Pro with MLX. This model has better visual understanding and OCR without compromise on text performance.
The latest Qwen 3 VL by @Alibaba_Qwen running on iPhone 17 Pro with MLX
Qwen 3 VL brings upgraded visual understanding, recognition, and OCR capabilities without sacrificing text performance like previous models
The 4B model here is close to Qwen 2.5 VL 72B in many benchmarks pic.twitter.com/yRpZdT1z7b
— Adrien Grondin (@adrgrondin) October 18, 2025
As Adrien Grondin explains, the 4B model is close to Qwen 2.5 VL 72B in many benchmarks.