OmniParser V2

Turn any LLM into a Computer Use Agent

5.0•1 review•

334 followers

Turn any LLM into a Computer Use Agent

5.0•1 review•

334 followers

Visit website

OmniParser ‘tokenizes’ UI screenshots from pixel spaces into structured elements in the screenshot that are interpretable by LLMs. This enables the LLMs to do retrieval based next action prediction given a set of parsed interactable elements.

Free

Launch tags:

User Experience•Artificial Intelligence•GitHub

Launch Team

Chris Messina

Ambassador

Hunter

📌

Microsoft Research has unveiled their own Computer Use model trained on a ton of labeled screenshots.

The v2 achieves a 60% improvement in latency compared to V1 (avg latency: 0.6s/frame on A100, 0.8s on single 4090).

Report

6mo ago

André J

Really cool! Hopefully it will be ported to more languages soon!

Report

6mo ago

Shivam Singh

Shram

Congrats on the launch and lots of wins to the team :)

Report

6mo ago

Auth0 — Make login our problem, not yours. Get started today.

Make login our problem, not yours. Get started today.

Promoted

Do you use OmniParser V2?

5.0

Based on 1 review

Review OmniParser V2?

Reviews

Helpful

Blog Newsletter Apps About FAQ Terms Privacy & Cookies Advertise

OmniParser V2

Turn any LLM into a Computer Use Agent

Turn any LLM into a Computer Use Agent

Do you use OmniParser V2?

Engineering & Development

AI

Work & Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

Product add-ons

Trending categories

Top reviewed

Trending products

Top forum threads

Do you use OmniParser V2?

Engineering & Development

AI

Work & Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

Product add-ons

Trending categories

Top reviewed

Trending products

Top forum threads