@rodaescobar Haha yes, lots of magic happening there! Thanks for that question :D
Basically, the font file is sent to our servers, where the usual training steps happen automatically.
The uploaded font is used to "draw" images. Then a "box" is being wrapped around each character which defines the position of the symbol and the ASCII representation. This image + box combination is used to train the Tesseract model.
The outcome is the trained font file, which lets Tesseract detect and classify the text with the chosen font on images! Hope this explains the magic :)
@_bernhard@__tosh Hi Bernhard and thx for the great question! Kinda brings me to why we've built the whole tool!
Training fonts on your own with Tesseract is quite a hassle. You'd need to download the whole Tesseract Training Tool Chain with all dependencies and compile it which takes a few hours - but a few hours for only one trained font file doesn't really pay off when looking at OCR implementations.
Here is a link to a description of how Training Fonts for Tesseract would look like manually: https://github.com/tesseract-ocr... :)
Some tools already do exist, none of them really works the way ours does: 1. Upload Font File, 2. Get back Trained Font File. Our devs actually always use the tool for our internal projects as well! Hope that helped!
Mimo 2.0
Anyline
Anyline
orat.io
Anyline