Cradl AI

Cradl AI

Build accurate document parsing models using deep learning.

85 followers

Cradl AI enables developers to train deep learning models for document parsing. Integrate your model in your own apps using RESTful APIs. No ML experience required.
Cradl AI gallery image
Cradl AI gallery image
Cradl AI gallery image
Cradl AI gallery image
Cradl AI gallery image
Free Options
Launch Team

What do you think? …

Ståle Zerener Haugnæss
Hi everyone 👋, I’m Ståle, co-founder and CEO at Cradl AI. I’m thrilled to share Cradl AI with you, a product that we have been working on for quite some time now. 🚀 What is Cradl AI? Cradl AI is a platform where any developer can train their own custom document parsing models powered by deep learning. 💡 What problem do we solve? Extracting key information from unstructured documents is a common problem in many apps and workflows. Cradl AI enables you to submit PDFs or images to an API and receive a structured JSON response within seconds. Examples: invoices, receipts, contract notes, income statements, tax forms, etc. ⏪ Our Story We started out with a small team of data scientists with an idea that deep learning models will change how we process documents. Initially we focused on enterprises and trained document parsers for financial institutions and government agencies. One of the first discoveries we made was that fine tuning pre-trained models on customer data would often radically improve the performance on the customers’ documents. By doing this we managed to outperform market leaders in benchmarks conducted by our customers despite being a small and lightly funded team. This was a huge inspiration to us when we decided to build Cradl AI and enable developers across the world to do the same. ❇️ What makes Cradl AI unique? In deep learning, access to good training data is key. Cradl AI enables you to fine tune AI models on your own documents in a much more scalable way. ⚙️ Scalable training using historical data Generate training data programmatically using documents + metadata from your database instead of manually annotating documents with bounding boxes. Example: A fintech processing receipts as part of their expense management software. Images of the receipts are stored on a file server + they have a SQL database with corresponding metadata such as amount, merchant and date which is submitted by their users. 🔁 Scalable training using feedback loops Build your first model with only 15 documents. Implement a feedback loop with a single API call, then re-train your model quickly and scalably using corrections from your end users. 🎯 Higher accuracy … as a result of more and better training data. 🙂 💵 Free credits! We are happy to offer the PH community $100 in credits if you sign up before May 1. Just drop us a line in the chat after you sign up and mention ProductHunt, and we’ll apply credits to your account. The credits will be applied if you exceed the free tier.
Abhinav Yadav
@staalezh First of all Congrats on a great product launch and thanks for sharing the story behind the product. I logged in product and the experience is sleek. I like the fact that your models show confidence intervals and add an auto tag to parsed text. I have a question, in order to use the product do I have to retrain my own model with my data? or there are pre-trained models for us to direct deploy. Thanks.
Ståle Zerener Haugnæss
@abhinav_wavel Thanks for the feedback! Great question, we currently have two pre-trained models; an invoice model and a receipt model. They can be cloned and used as-is without any training. However, fine-tuning, even on a small dataset of 15-20 documents will often have a significant impact on the performance of your model. :-)
gerald ray
@abhinav_wavel @staalezh I am glad to see that datasets can be that small. I will try it out! Great work
Ståle Zerener Haugnæss
@abhinav_wavel @gerald_ray Thanks, @gerald_ray! Looking forward it it, and remember to give us a ping on the chat if you're interested in the PH credits!
Utkarsh Mittal
Hi Ståle Congratulations on launching this brilliant product. I have been in the IDP space for more than 5 years as a data scientist. I just signed up on your platform using utkarsh@deeplogicai.tech. At DeepLogic AI, we have processed 1M+ documents of various types over the last few years using both third-party tools and internally developed solutions. I have a few questions and feedback - 1. Where do I mention "ProductHunt" to get my $100 in credits? 2. What advantages does your solution have over Google's, Microsoft's or other IDP solutions? Microsoft form recogniser also has pre-trained models for Invoices & Receipts and the ability to fine-tune those models on our own custom formats as well (using both template and neural modalities). To my knowledge (I could be wrong here), they use the LayoutLMv3 architecture trained on extensive popular and community-validated datasets such as CORD, FUNSD etc. complemented with a high-quality research paper explaining the same and presenting benchmarks. It would be really interesting to see your unique advantage over this. 3. The website mentions that you don't need annotations to train your own models, how is that possible? Past data also counts as annotations, doesn't it? 4. The first landing page after logging in has a pretty messed up UI on mobile. I understand that a product like this can only be properly used on a desktop but it would be a good idea to limit the functionalities and provide a simple (maybe read-only) page on mobile. Once again, congratulations on the launch and kudos to your team's hardwork. All the best for your future. And I am looking forward to hearing back from you! Regards Utkarsh
Ståle Zerener Haugnæss
@utkarshmttl Thanks for your feedback, Utkarsh and for signing up! Great questions, please see answers below. 1. Where do I mention "ProductHunt" to get my $100 in credits? You can just drop us a line in the chat and mention PH, and we'll apply it to your account right away! 2. What advantages does your solution have over Google's, Microsoft's or other IDP solutions? Microsoft form recogniser also has pre-trained models for Invoices & Receipts and the ability to fine-tune those models on our own custom formats as well (using both template and neural modalities). To my knowledge (I could be wrong here), they use the LayoutLMv3 architecture trained on extensive popular and community-validated datasets such as CORD, FUNSD etc. complemented with a high-quality research paper explaining the same and presenting benchmarks. It would be really interesting to see your unique advantage over this. ... and 3. The website mentions that you don't need annotations to train your own models, how is that possible? Past data also counts as annotations, doesn't it? Short answer to both of these questions: we designed our models so that they can be trained on key-value based annotations instead of bounding box annotations or NER tags. This means that users can often generate large training datasets programmatically for example by dumping extracted data from a SQL database. Here's a 5 min read with more info about this: https://www.cradl.ai/guides/how-... From our experience, models that are pre-trained on large datasets (open or proprietary) are often a good starting point for a model, but the ability to easily fine-tune the model on large datasets - both in order to improve accuracy, but also to customize the fields that are extracted - was a problem that we encountered when we started out and trained our first document parsing models. In order to make this possible, we've had to design a new model architecture. The result of this is that we now have users who have fine-tuned their model on millions of their own documents this way, without annotating a single document manually. Let me know if anything is unclear, or if you have any follow-up questions! 4. The first landing page after logging in has a pretty messed up UI on mobile. I understand that a product like this can only be properly used on a desktop but it would be a good idea to limit the functionalities and provide a simple (maybe read-only) page on mobile. Good point, thanks - we should definitely improve this!
Utkarsh Mittal
@staalezh Thanks for your reply! 1. by chat do you mean the customer support chat on cradl.ai? 2&3. Spot on. The inability to customise fields specially was a blocker for us as well. Yeah that's what I thought, KVPs-annotations are still annotations though. The problem that I foresee (for my use cases at least) is that past data from Accounts is very limited (mostly just invoice id, date and total amount) but for new incoming documents we need all the information including the line-items. How do you solve this?
Ståle Zerener Haugnæss
@utkarshmttl 1. Yup, found your account - will apply the credits right away. 2&3. Spot on. The inability to customise fields specially was a blocker for us as well. Yeah that's what I thought, KVPs-annotations are still annotations though. The problem that I foresee (for my use cases at least) is that past data from Accounts is very limited (mostly just invoice id, date and total amount) but for new incoming documents we need all the information including the line-items. How do you solve this? Absolutely - we should definitely work on clarifying that! When training data is limited for example with missing fields, we usually rely on Cradl AI's ability to train on very small datasets. In this scenario, you'll typically train quite frequently when your dataset is small and iterate on your model faster. But this of course requires you to annotate the new fields manually until you can train (we currently require a minimum of 15 examples). Also, with KVP-annotations, feedback loops are easy to implement and can be a good way of getting help from end users (e.g. internal employees or end customers) to train your model scalably.
Utkarsh Mittal
@staalezh I see. Thank you for clarifying my questions, it was great interacting with you. Definitely hope to explore some synergy together in the future. :) got the credits, thank you!
Naim Naj
Congrats on the launch 🚀 Looks great. 👌
Ståle Zerener Haugnæss
@naim_naj Thank you, Naim!
Naim Naj
@staalezh You are welcome