Kadoa uses AI to explore, extract, and transform web data. Save hours of time setting up and creating web scrapers. Extract the data you need effortlessly with Kadoa.
Hi PH! 👋
We got frustrated with the time and effort required to code and maintain custom web scrapers, so we built an LLM-based solution that can extract data from any website in the format you want. AI should automate tedious and un-creative work, and web scraping definitely fits this description.
We're leveraging large language models to semantically understand websites and generate the DOM selectors for them. Using GPT for every data extraction, as most comparable tools do, would be way too expensive and very slow, but using LLMs to generate the scraper code and subsequently adapt it to website modifications is highly efficient.
Try it out for free on our playground https://kadoa.com/playground and let us know what you think! And please don't bankrupt us :)
Here are a few examples:
- Product Listings (Specialized Bikes) https://www.kadoa.com/playground...
- Financial Data (Yahoo Finance) https://www.kadoa.com/playground...
- Player Stats (LeagueOfGraphs) https://www.kadoa.com/playground...
🛠️ How it works 🛠️ (the playground uses a simplified version of this):
- Loading the website: automatically decide what kind of proxy and browser we need
- Analysing network calls: Try to find the desired data in the network calls
- Preprocessing the DOM: remove all unnecessary elements, compress it into a structure that GPT can understand
- Slicing: Slice the DOM into multiple chunks while still keeping the overall context
- Selector extraction: Use GPT (or Flan-T5) to find the desired information with the corresponding selectors
- Data extraction in the desired format
- Validation: Hallucination checks and verification that the data is actually on the website and in the right format
- Data transformation: Clean and map the data (e.g. if we need to aggregate data from multiple sources into the same format). LLMs are great at this task too
The vision is a fully autonomous, cost-efficient, and reliable web scraper :)
really sad that we could not test it !
whats the idea of sharing the product if its not ready for testing yet ?
you could maybe add an alpha test with code for product hut users .
Kadoa is a game-changer for web scraping. I have been using it for a while and I love how simple and fast it is. All I have to do is enter the URL of the website I want to scrape and Kadoa does the rest. It uses AI to find the data I need and puts it in a spreadsheet for me. It can handle any website, even if it has dynamic content, pagination, or captcha. Kadoa has made my web scraping tasks so much easier and cheaper. It is not perfect, sometimes it misses some data or gets confused by complex layouts, but it is still the best web scraping tool I have ever used. I would definitely recommend it to anyone who needs to scrape websites without coding.
This is really cool, I just conducted a particular scraping job that I was struggling with, and it went on without a hitch. Yet to try on other datasets since I was taking it for a trial, but this should simplify my workflow by a lot. Thanks and congrats on the launch
Kadoa
Kadoa