Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
mintplant
on May 29, 2017
|
parent
|
context
|
favorite
| on:
Extracting Chinese Hard Subs from a Video, Part 1
In my experience Tesseract improves massively if you can identify the font the text is written in and prepare a custom trained dataset for it to use. See
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTess...
contingencies
on May 30, 2017
[–]
All of the failures were directly related to improperly isolated input. In addition, a huge percentage of Chinese text is written in very few fonts.
Consider applying for YC's W25 batch! Applications are open till Nov 12.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: