
Deep Lake AI Knowledge Agent conducts Deep Research on your data, no matter its modality, location, or size. Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.) stored anywhere, to AI. Over time, it learns from your queries, tailoring the results to your work! Deep Lake is used by Fortune 500 companies like Bayer, Matterport, and others.
Deep Lake - AI Knowledge Agent
Hi Product Hunt!
I'm Davit Buniatyan, CEO of Activeloop (YC S18). We're introducing Deep Lake AI Knowledge Agent, which conducts Deep Research on your data, no matter its modality, location, or size. Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.) stored anywhere, to AI.
Deep Lake can search your data from S3, Dropbox, and GCP. Over time, it learns from your queries, tailoring the results to your work!
Here are some example use cases:
1. Financial analysis
Connect earnings call transcripts, data from the Bloomberg terminal, and PowerPoint from the earnings reports or annual reports to analyze thousands of companies!
2. Scientific research
Connect patient EHR data, medical research (both public and internal), and lab results to discover new drugs (this is what a few Fortune 500 and leading biotech companies are doing with us already).
3. Legal stuff like patent search and generation, and contract review.
Cross-reference diagrams, scans of documents, invoices, etc. and ask questions over those! A few companies use us to search across millions of patents worldwide and generate novel, defensible patents! Examples here and here.
You may ask, what is our superpower?
There's a long answer, and a short answer (I'll let Sasun, our Director of Engineering handle the more technical part). But in short, I've spent a good chunk of my time at Princeton researching how to store complex data and connect it to AI.
TL;DR - we're multi-modal (i.e., handle all kinds of data, not just text or vectors) and highly accurate!
This is hard to achieve, but there's a way to store it in a way that's AI-native - and represent data that is unstructured in a more columnar way. At Activeloop, we figured out how to connect any data from your storage and process it in a way to extract as much information from it as possible without complicated OCR pipelines. Then, we store it in an AI-native format and can highly efficiently (more on that from Sasun) retrieve it from the storage, querying across multiple datasets at once.
With the model reasoning improving (our Agent can work with any model, closed-source or otherwise), we've unlocked the missing piece -> we provide the best-in-class AI search, while smart models help us generate answers to complex questions we find the right evidence for!
I'm very happy to share our hard work with you - please ask me anything about the product, our journey through YC to here, and more.
Deep Lake - AI Knowledge Agent
@david_buniatyan very exciting!
Deep Lake - AI Knowledge Agent
@david_buniatyan 🔥
Deep Lake - AI Knowledge Agent
@vahan25 indeed!
Deep Lake - AI Knowledge Agent
@david_buniatyan This is absolutely brilliant!
Deep Lake - AI Knowledge Agent
@david_buniatyan this is fantastic, really excited to see it's happening
Deep Lake - AI Knowledge Agent
Hey folks,
Mikayel here! I am the voice (and the face) on the launch video and Activeloop's Head of Marketing and Growth.
I’d like to answer a few common questions that I’ve heard from talking to 200+ early adopters in person. It should give you context on how you can get the most out of Deep Lake AI Knowledge Agent.
How is it different from other search/RAG/Deep Research tools?
Why do I mention this? Just to sprinkle some trivia, mainly, but also to showcase that our team knows their sh*t! We've spent the last 7 years building for this moment - rethinking how to organize unstructured data stored in different places and connect it to AI.
That's why Deep Lake, in comparison to others is:
Truly multi-modal (i.e. detects more information to feed into AI)
Works on private data (vs OpenAI, for instance, that currently doesn't).
Works on data at any scale, and across clouds (or locally)!
Has a bring-your-own model feature (to be released soon) that allows users to choose which reasoning LLM (e.g. open-source or close source) to use.
I've summarized more differences here.
Are you releasing an API for Deep Research?
Yes… As a matter of fact, whoever comments under this will get early access from yours truly.
Can I share links to my conversations?
Yes, you can -> e.g.https://chat.activeloop.ai/mikayel/conversations/67ba1696a0a3d652b5b5ebe8 (need to be logged in to search).
____
I am thrilled to finally go live on Product Hunt after almost six years of building (well, this product capability took less to build, but it has all culminated in this point). Data infrastructure for AI is really freaking tough to build. Kudos to our insanely talented engineers for developing a rocket engine to rival giants like OpenAI (while the rocket ship is still flying into outer space towards singularity).
As one of the only non-technical folks on the team, I’d be happy to answer any questions below or in our Slack community (slack.activeloop.ai). Thanks for having us!
Deep Lake - AI Knowledge Agent
@mikayel_harut great post!
Deep Lake - AI Knowledge Agent
Hi Product Hunt!
I'm Sasun, Activeloop's (YC S18) Director of Engineering. I've previously co-founded Pixomatic, one of the early successful photo-editing apps. Naturally, one of the things that excites me is how to visualize (and query) unstructured data, like images.
Except… back in the day, there was no SQL for images.
Then I met @david_buniatyan, who started Activeloop with that mission - make the way complex data - like images, videos, text, etc. stored in a more organized way - and make it easily connectible to AI (for training, and asking questions!).
This comes with a number of exciting technical challenges.
1. Unstructured data is… well… unstructured. It's hard to search across such data (imagine saying I want all the images that contain bicycles larger than 200x350 pixels, and two people in them).
Retrieval systems until Deep Lake weren't fit for that.
2. Vector Search is inaccurate.
Achieving accuracy in AI-generated insights is challenging, especially in sectors like legal and healthcare where accuracy is paramount. The issue magnifies with scale—for instance, when searching through the world’s entire scientific research corpus.
Most data lives in data lakes (S3, AWS, GCP)
3. Limited Memory
Bolting a vector index onto traditional database architectures does not provide the scalability required by AI workloads. As the scale of your dataset increases, the memory and compute requirements scale linearly. For datasets that grew past 100M, the cost becomes prohibitive to maintain the index in memory.
My team and I focused on building this as Deep Lake's ‘unfair advantage’, since we're geared more towards analytical cases where the users need to ask questions across complex, large data. As a result, we're up to 10x more efficient than in-memory approaches.
4. AI Agents can fail… spectacularly
Not claiming we've totally solved this issue, but if there's even 1% probability of failing or responding inaccurately every time, in a complex, multi-step system there will be this ‘butterfly’ effect where with every additional step, the probability to fail will increase.
So increasing retrieval accuracy is important - in critical verticals (autonomous driving, life sciences, healthcare, finance) it can be either a matter of life and death, or incalculable losses.
More on this in detail (with benchmarks here).
Feel free to ask me any technical questions on Deep Lake's capabilities, I'd be happy to answer.
Thanks for having us.
Deep Lake - AI Knowledge Agent
@khustup thanks for being a part of our journey and shipping an amazing product!
Deep Lake - AI Knowledge Agent
Deep Lake - AI Knowledge Agent
@mikayel_harut I'd say the most challenging part is to index the large scale data on object storage, keeping the balance between latency and scale.
Deep Lake - AI Knowledge Agent