Hello Product Hunt! I'm Vinayak, creator of Camelot.
There are many open-source (Tabula, pdf-table-extract) and closed-source (smallpdf, pdftables) tools to extract tables from PDFs. But they either give a nice output or fail miserably. There is no in between. This is not helpful since everything in the real world, including PDF table extraction, is fuzzy. This leads to the creation of ad-hoc table extraction scripts for each type of PDF table. We, at SocialCops, created Camelot to offer users complete control over table extraction. It is a Python library to extract tabular data from PDFs!
You can install it using conda or pip! Check out the installation instructions in the README: https://www.github.com/camelot-d...
Great documentation is available here: https://camelot-py.readthedocs.i...
We would be really grateful if you could give us any feedback that can help us improve it! You can follow the development on GitHub.
Hi,
I am going to be handling a project soon for which i have to extract tables from pdf, can someone tell me as how accurate camelot actually is(like 80or90% based on your usage). I used tabula though it is great still has some flaws and is not 100% accurate.
It is an important project and i cant do the testing of its accuracy right now.
Any help would be appreciated.
June