You're absolutely right about datasets being a constraint for any new entrant. The USDA SR28 is free and open but limited scope. OpenFoodFacts has a great dataset overall but ~~you can't download it (other than rate-limited scraping),~~ the license is ~~strict~~ share-alike; and there isn't an OFF personal consumption tracker.
No need to scrape Open Food Facts, they kindly offer a download of the whole database as csv, rdf or mongodb dump: https://world.openfoodfacts.org/data
It is 100% crowd sourced open data under the ODbL licence (same as OpenStreetMap).
Thanks for that correction. I recall there being a clear reason why I couldn't use their data in my app. But maybe I had it wrong. I remember reading that if my app collected new data about foods and I was using the OFF db, I had to commit to making all my data free and open. I was worried about the possible case that personal food consumption data would be vulnerable to that share-alike constraint.
No, no worries about personal consumption. What the OdBL requires you to do is to add missing products. Not add data outside the scope of the original database.
(I'm a Open Food Facts admin)
Also please don't scrape us, since we release nightly dumps of the DB :)
Thanks for clarifying that. And that's great that those DB downloads are available. I didn't like the idea of scraping the data in the first place so never went that route.
Feel free to ping me at pierre openfoodfacts.org
We have a online discussion chat, if you want to integrate OFF at some point, and have questions about the OdBL