Building the next evolution of digital connection.

Meta Perception Encoder - Vision encoder setting new standards in image & video tasks

A vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models.

Replies

Best

Nika

Ambassador

Hey guys, can you please upload the video for this launch again? (ATM, it doesn't show the thumbnail)

After republishing, the bug should be removed.

P.S.: This is very interesting. Something similar to the understanding of videos I saw 2 days ago, hunted by @zaczuo – Twelvelabs + some kind of "video reading" have seen in Notebooks.app by @dev_singh

Report

3mo ago

Ankit Sharma

Hunter

@busmark_w_nika I have just edited the video.

Report

3mo ago

Nika

Ambassador

@saaswarrior I can see the final result, good job! :)

Report

3mo ago

Ankit Sharma

Hunter

👋 Hey Hunters!

Introducing Meta Perception Encoder — Meta FAIR's powerful new family of vision-language models!

From zero-shot classification to multimodal reasoning, PE pushes the boundaries of what's possible in computer vision. With variants like PE-Core, PE-Lang, and PE-Spatial, it’s designed to tackle everything from image understanding to dense spatial tasks — all using a single contrastive objective.

What’s exciting?

✅ Intermediate embeddings for richer representations

✅ Advanced alignment techniques

✅ Strong zero-shot and retrieval performance

✅ Open-source and research-friendly!

Built for researchers, developers, and AI enthusiasts alike — let’s reimagine visual understanding together.

Would love your feedback! 💬👇

Report

3mo ago

Ambika Vaish

@saaswarrior Super impressive launch! Love the focus on visual understanding. How beginner-friendly is it for someone just getting into AI?

Report

3mo ago

Kyrylo Silin

Telebugs

Congrats on the launch! Curious to see what models it surpasses

Report

3mo ago

Erliza. P

Impressive benchmarks on zero-shot tasks! The vision encoder's performance suggests Meta has made significant architectural innovations in cross-modal representation learning. Particularly curious about the training methodology - is this leveraging a new paradigm beyond contrastive learning?

Report

3mo ago