OmniParser V2

OmniParser V2

Turn any LLM into a Computer Use Agent

5.0
1 review

334 followers

OmniParser ‘tokenizes’ UI screenshots from pixel spaces into structured elements in the screenshot that are interpretable by LLMs. This enables the LLMs to do retrieval based next action prediction given a set of parsed interactable elements.
OmniParser V2 gallery image
OmniParser V2 gallery image
OmniParser V2 gallery image
Free
Launch Team

What do you think? …

Chris Messina
Hunter
📌

Microsoft Research has unveiled their own Computer Use model trained on a ton of labeled screenshots.


The v2 achieves a 60% improvement in latency compared to V1 (avg latency: 0.6s/frame on A100, 0.8s on single 4090).

André J

Really cool! Hopefully it will be ported to more languages soon!

Shivam Singh

Congrats on the launch and lots of wins to the team :)