Ankit Sharma

Meta Perception Encoder - Vision encoder setting new standards in image & video tasks

A vision encoder setting new standards in image & video tasks. It excels in zero-shot classification & retrieval, surpassing existing models.

Add a comment

Replies

Best
Nika
Ambassador

Hey guys, can you please upload the video for this launch again? (ATM, it doesn't show the thumbnail)

After republishing, the bug should be removed.

P.S.: This is very interesting. Something similar to the understanding of videos I saw 2 days ago, hunted by @zaczuoTwelvelabs + some kind of "video reading" have seen in Notebooks.app by @dev_singh

Ankit Sharma

@busmark_w_nika I have just edited the video.

Nika
Ambassador

@saaswarrior I can see the final result, good job! :)

Zac Zuo

@busmark_w_nika For 12Labs it should be @jaelee_ to take all the credits!🙌

Ankit Sharma

👋 Hey Hunters!

Introducing Meta Perception Encoder — Meta FAIR's powerful new family of vision-language models!

From zero-shot classification to multimodal reasoning, PE pushes the boundaries of what's possible in computer vision. With variants like PE-Core, PE-Lang, and PE-Spatial, it’s designed to tackle everything from image understanding to dense spatial tasks — all using a single contrastive objective.

What’s exciting?

✅ Intermediate embeddings for richer representations

✅ Advanced alignment techniques

✅ Strong zero-shot and retrieval performance

✅ Open-source and research-friendly!

Built for researchers, developers, and AI enthusiasts alike — let’s reimagine visual understanding together.

Would love your feedback! 💬👇

Ambika Vaish

@saaswarrior Super impressive launch! Love the focus on visual understanding. How beginner-friendly is it for someone just getting into AI?

Kyrylo Silin

Congrats on the launch! Curious to see what models it surpasses