Hacker News new | past | comments | ask | show | jobs | submit login

> The autosegmentation jumps frequently between adjacent sheets, so is not yet precise enough to reveal contiguous texts, but it coarsely follows the entire scroll.

Maybe a stupid idea, but has anyone tried to make a new scroll with known content and markers/known coordinates, and then cook it so as to bring it to a state close to the ones we're trying to unroll. And then scan it, and use that to fine-tune the software?

There are probably simple insights that are extremely difficult to discover when looking at an entirely new problem, that would become more obvious when one already knows the original inside out.




I know they have instructions on where to buy papyrus and how to cook it to resemble the conditions of the original scrolls, but from what I understand, nobody has done what you suggest. It sounds like a good idea to me also, but a few suggestions on why they haven't done it yet:

1) Scanning a scroll costs around $40k, between the trip to London, renting the equipment, paying the staff etc.

2) I'm not sure that just cooking the scroll is enough to reproduce the exact conditions of the original, which were also buried underground for thousand of years. Time, soil pressure and so on could have a big impact on the final composition of the sheets.

3) To actually reproduce a realistic sample, you need a professional papyrologist. It's not enough to copy an Ancient Greek text from an online database, you need to know all the conventions of the handwriting of the time (they didn't use spaces, they didn't use the diacritics and accent marks we use in modern editions, often letters where written in idiosyncratic ways depending on the period etc.). Considering how few papyrologists there are, how busy they are, and how long would take one of them to recreate a decent replica, I think this is maybe the biggest obstacle.


1/ and 2/ are of course good objections; I wasn't aware of the cost of a scan (but this kind of experiment could be done by the organizers, saving on trip costs).

But I don't think step 3 is strictly necessary. The main point would be to improve software unrolling, using information from the structure of the roll. So it may be enough to simply put printer's mark at regular intervals, with references.


I see what you mean, before I had assumed you meant an exact replica. If you "just" want to write reference marks to help the segmentation models, then you don't need a papyrologist. It's still something that only the organizers could do, since I don't see a volunteer team being able to afford the expenses. I don't know if the reason they haven't done it until now it's simply that they're a small outfit that has to juggle different priorities, or if they have judged it not worth it technically!


I'd say this is when you have perfect being the enemy of good.

I'm sure a whole scroll is expensive to create, cook and scan but sections of a scroll could be done for a fraction.

Also the realism of the papyrus is less crucial than the initial training of uncooked -> cooked -> recovered.

So, OP's suggestion sounds like a great first step to get more insights on what's possible and what's not relatively quickly.


> The autosegmentation jumps frequently between adjacent sheets

The rests of fibers in the cut are exactly like a barcode. They would need a database of each limit and then something to match barcodes. Easier said than done, of course. Other possibility would be to use fiber angles.


Thinking about it, the whole part is not necessary. A partial read of the first upper cm could make the process much faster discarding the obviously different parts. If this can be converted to a barcode somehow printed and then read with a barcode scanner to produce a translation, we could have a way to prefilter most of the pieces that are too different to be neighbors.


They already have human segmenters segmenting existing scrolls, which presumably is used to train the program in much the same way.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: