Turn messy files into actionable data. Upload PDFs, images, audio and websites. Define data points for AI-powered extraction. See results in exportable spreadsheets with linked, highlighted sources. Ask questions, plot charts and draft reports on top.
👋 Hey Product Hunt community!
We're excited to share something we've been working on at Sinaptik (YC W24). After creating pandas-ai (chat with your tables), we've had countless conversations with data analysts and business experts about their daily struggles.
These chats revealed some common frustrations:
1. Valuable insights buried in messy, hard-to-read files
2. The headache of managing documents with different permission settings
3. RAG chatbots that seemed promising but ended up being costly data dumps
4. And the ever-present challenge of bridging the gap between business experts and developers
These weren't just abstract problems - we saw how they affected real people trying to do their jobs efficiently. So, we rolled up our sleeves and got to work. We talked to analysts, data scientists, and business users to understand their needs.
The result is panda{·}etl, a tool we hope will make life easier for anyone dealing with document-heavy workflows. With panda{·}etl, you can:
1. Upload those tricky files (you know, the PDFs, images, and audio files that usually cause headaches)
2. Define exactly what data you need (whether it's ESG metrics, competitors data, market trends, risk engineering reports, insurance claims)
3. Get spreadsheets where you can actually trace where each piece of data came from
4. Easily validate and export your data
5. And use our pandas-ai powered chat to explore your extractions, plot charts and add them to drafts
We've built panda{·}etl with flexibility in mind, offering solutions ranging from SaaS to On-Premise:
1. For individuals, we offer a free personal plan with limited documents processed and extractions per month. It's perfect for trying out the tool and for smaller projects.
2. For businesses, we have scalable plans that grow with your needs, files size and amount of docs.
3. For enterprises, we provide custom solutions, including on-premise deployments for those with specific security or compliance requirements.
We're still learning and improving, and that's why we're here. We'd love to hear your thoughts, experiences, or even your data horror stories. How do you deal with unstructured data in your work? What solutions have you tried?
Let's chat - we're genuinely curious to learn from this community! 🙌
@gdc been following you guys for a long time. as a friend, so proud of seeing you guys evolving the product and test out things
As a fellow startupper, I think you're on track to solve huge problems, not only for enterprise companies but also for startups and scaleups that manage big amount of data.
Great launch!
Congrats! Do you have an API? My use case: I want to build a QA system for my CSV tables with survey results (some cells may contain numbers, and some contain text), and it needs to work within my product
@s5f5f5f thanks a lot! Yes, we also offer an API that is a perfect match for your use case. Feel free to reach out, would love to learn more about your use case: gabriele@sinaptik.ai
Congrats on the launch, @gabriele_venturi ! I love that there's also an open-source version.
It will be useful across a variety of fields in different industries. Wishing your team great success! - I'll definitely give it a try. :)
Congrats to the panda.etl team! This tool sounds like a fantastic way to simplify turning unstructured data into actionable insights. Looking forward to seeing how it helps streamline data extraction!
This sounds really intriguing, @gdc! I'm curious about the specific types of data points that can be extracted. How customizable is the extraction process for different file types? Also, what kind of support do you offer for users who might be new to data extraction? Would love to know more!
@james_wilson_ it can extract structured data from any unstructured data (pdf, audio, etc). Your input is a set of structured files and your output is an easy to use excel spreadsheet. It’s quite intuitive also for non tech users, but we also offer an onboarding call!
@james_wilson_ you can try it already on GitHub. We developed it with flexibility, simplicity and accuracy in mind. After you create a project you can add a new extraction process, define fields names, get pre-filled descriptions and data type you can fully modify. Looking forward to hearing your feedback
This sounds like a game-changer for data handling! 🦾 It's like you took the pain points of data analysts and turned them into a scalable solution. The ability to upload various file types and get actionable insights is something we seriously need. Can't wait to see how panda{·}etl evolves—very curious about the pricing models for businesses too! Keep those updates coming, @gdc! How's the initial traction looking?
@wenzhu_zhang1 thanks a lot for the feedback! Initial traction is going great so far! If you want to get more about the business model, feel free to email me at gabriele@sinaptik.ai
Nice product @gabriele_venturi , can you share a bit more about how it works? Does it extract text from the PDF or does it do OCR?
If it extract text strings, how do you deal with tables which can turn out all messed up with newlines and weird formatting?
@francesco_manicardi great question. It does extract text and it does OCR. We have build a custom parser that for each page identifies the different components of a page (images, text, tables, charts, etc) and parses each individually with the most accurate technique, and parses it to be easier to be understood for LLMs.
@daniel_bukac thanks a lot for the feedback! We are going to focus more and more on a no code ux, adding more and more pipelines from the community. In theory the goal is that no matter your technical expertise, everyone can run pipelines on panda etl!
Hey Giuseppe,
How does panda{·}etl handle different languages or document formats that might have inconsistent layouts?
For the on-premise deployments, what kind of setup and maintenance is typically required?
Congrats on the launch!
@kyrylosilin we have built our parser that is able to split each pages in one or multiple areas and apply the best techniques to parse the data accordingly!
As of the on-prem, it depends a lot on the specific use case. It can be as easy as a docker container for easy use cases, while more complex architectures (terraform, kubernetes, etc) might be needed based on the volumes!
If you have any questions about it, drop a message anytime!
Read in one of the comments that u offer an API… I run a copier dealership with 100s of copier scanners that produce 1000s of PDFs, pretty sure our clients (banks, insurance cos, BPOs, car dealerships, hospitals, etc) would find it useful. How can we get in touch?
I've tried a few RAG enabled tools, but none of them seem to be effective. Will try this out - looks very promising. I like how it's open source, free (with a limit) and how you can create workflows to automate file processing. Would be cool to see how others are handling files - if they want to share workflows they built!
Congrats on the launch @gdc and team!
Congrats on the launch @gdc! I'm really looking forward to trying this.
Working with several clients in the past years, I can see how much value they could get from more open access to data! 👏
This is really interesting, I think the summary of the 4 main issue is exactly it with data scientist/analyst. Especially permission setting, it took us a week to just pass out the right credential and permission for different database access. I've checked out your pricing and I was not too sure how much the 500 credits are going to get me through, if it was in terms of number of files, or sizes of files altogether, how much would you say that is ? Again, congratulation Giuseppe !
@daniel_xpo the pricing is based on characters or pages, whichever is low. The free plan includes at least 1000 pages per month. Thanks a lot for the great feedback. Looking forward to hear more if you give it a try!
Congrats on launching Panda ETL! 🎉The idea of building ETLs without coding sounds like it could save me hours of work each week. By the way, I'm curious about how it handles really large datasets. Does it have any limitations on data volume or processing speed?
@sawana_h I swear this is only the beginning, stay tuned, we are planning to disrupt the way people do ETLs! It handles large datasets very well, but at the moment it parallelizes up to 3 processes at a time. We're working hard to scale it soon tho!
@gdc Congratulations on the launch! Your platform’s ability to turn various file types into actionable data and highlight sources is impressive. At InterWiz, we're revolutionizing the hiring process with on-demand AI-powered interviews and instant evaluations, ensuring top talent is swiftly identified and onboarded. How do you plan to enhance data point definition for more precise AI-powered extraction?
@gabriele_venturi Thanks for the conversation and for sharing your insights! Incase if you have not follow InterWiz, hover over my coming soon badge and click ‘notify me’. We would love your support and feedback when we launch.
Replies
panda{·}etl (YC W24)
Zefi.ai
panda{·}etl (YC W24)
panda{·}etl (YC W24)
Pathway
panda{·}etl (YC W24)
Pathway
Dashblock
panda{·}etl (YC W24)
Dashblock
Sunrise: Guided Journaling & Mindfulness
panda{·}etl (YC W24)
Zillion Pitches
panda{·}etl (YC W24)
Diaflow
panda{·}etl (YC W24)
Zefi.ai
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
Steer
panda{·}etl (YC W24)
Telebugs
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)
panda{·}etl (YC W24)