Hello Product Hunt! I'm Vinayak, creator of Excalibur, which is a web interface to extract tabular data from PDFs!
There are both open (Tabula, pdf-table-extract) and closed-source (Smallpdf, Docparser) tools that are widely used to extract data tables from PDFs. They either give a nice output or fail miserably. There is no in between. This is not helpful since everything in the real world, including PDF table extraction, is fuzzy. This leads to the creation of ad-hoc table extraction scripts for each type of PDF table.
Excalibur uses Camelot (https://camelot-py.readthedocs.i...) under the hood, which is a Python library I created to offer users complete control over table extraction. If you can’t get the desired output with default settings, you can tweak them and get the job done!
You can install Excalibur using "pip install excalibur-py" or just download and run the Windows/Linux executable from the releases page here: https://github.com/camelot-dev/e...
Great documentation is available here: https://excalibur-py.readthedocs...
I would be really grateful for your feedback that can help me improve it! You can follow the development on GitHub here: https://github.com/camelot-dev/e...
June