Hacker News new | past | comments | ask | show | jobs | submit login

It's surprising that no one has applied ML techniques to historical documents like these. Scanning the documents to uncover some text that was overwritten would be a very useful technique for historical research. Written documents have very distinct statistical properties and if there is a big enough corpus of text then it would be easy to train a classifier to uncover the statistical properties that are not clearly visible but still have the distinct statistical signature of written text.

Presumably we are on the verge of AGI so this seems like a very easy application.




What makes you think no one is?

https://en.wikipedia.org/wiki/In_Codice_Ratio


Mostly because it is an academic exercise and lacks obvious incentives for profit but it's good to know there is such a project. Seems like my idea was right on the mark.

Thanks for the reference. It says that the project was started in 2017, have there been any interesting discoveries since then?


There are tons of teams working on historical document analysis, just look at e.g. ICDAR.

Working with historical documents is just very hard do to poor data quality and availability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: