Hey guys, can you please upload the video for this launch again? (ATM, it doesn't show the thumbnail)
After republishing, the bug should be removed.
P.S.: This is very interesting. Something similar to the understanding of videos I saw 2 days ago, hunted by @zaczuo – Twelvelabs + some kind of "video reading" have seen in Notebooks.app by @dev_singh
Introducing Meta Perception Encoder — Meta FAIR's powerful new family of vision-language models!
From zero-shot classification to multimodal reasoning, PE pushes the boundaries of what's possible in computer vision. With variants like PE-Core, PE-Lang, and PE-Spatial, it’s designed to tackle everything from image understanding to dense spatial tasks — all using a single contrastive objective.
What’s exciting?
✅ Intermediate embeddings for richer representations
✅ Advanced alignment techniques
✅ Strong zero-shot and retrieval performance
✅ Open-source and research-friendly!
Built for researchers, developers, and AI enthusiasts alike — let’s reimagine visual understanding together.
Impressive benchmarks on zero-shot tasks! The vision encoder's performance suggests Meta has made significant architectural innovations in cross-modal representation learning. Particularly curious about the training methodology - is this leveraging a new paradigm beyond contrastive learning?
Replies
Hey guys, can you please upload the video for this launch again? (ATM, it doesn't show the thumbnail)
After republishing, the bug should be removed.
P.S.: This is very interesting. Something similar to the understanding of videos I saw 2 days ago, hunted by @zaczuo – Twelvelabs + some kind of "video reading" have seen in Notebooks.app by @dev_singh
@busmark_w_nika I have just edited the video.
@saaswarrior I can see the final result, good job! :)
👋 Hey Hunters!
Introducing Meta Perception Encoder — Meta FAIR's powerful new family of vision-language models!
From zero-shot classification to multimodal reasoning, PE pushes the boundaries of what's possible in computer vision. With variants like PE-Core, PE-Lang, and PE-Spatial, it’s designed to tackle everything from image understanding to dense spatial tasks — all using a single contrastive objective.
What’s exciting?
✅ Intermediate embeddings for richer representations
✅ Advanced alignment techniques
✅ Strong zero-shot and retrieval performance
✅ Open-source and research-friendly!
Built for researchers, developers, and AI enthusiasts alike — let’s reimagine visual understanding together.
Would love your feedback! 💬👇
@saaswarrior Super impressive launch! Love the focus on visual understanding. How beginner-friendly is it for someone just getting into AI?
Telebugs
Congrats on the launch! Curious to see what models it surpasses