Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the call-outs, it looks like our terms of service need to be made more clear (and I’m flagging that internally); they’re supposed to be talking about “The Website” vs “the content on the website” and the governing license for public datasets should be what users chose when they shared them but I agree it’s not well-worded.

~~Could you file an issue on the repo for the download issue you encountered? We just pushed an update to our python package earlier today that might be related.~~

~~Update: I went ahead and filed this issue[0] so the relevant engineer will see it when they wake up.~~

Update 2: I pushed a fix to our backend API & verified it's working by running that download script in a Colab notebook.

> What are the exact attribution instructions?

Creative Commons has some guidance[1] on how they recommend citing attribution.

[0] https://github.com/roboflow/roboflow-100-benchmark/issues/31

[1] https://wiki.creativecommons.org/wiki/Recommended_practices_...




Regarding the license, I want to know specifically about the part "If You Share the Licensed Material [...], You must: retain [...] identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);"

(abridged "Section 3 – License Conditions" https://creativecommons.org/licenses/by/4.0/legalcode )

I could of course click through 100 individual pages and search whether one of the authors left different attribution instructions, but I would rather not.

The download script generates some "README.dataset.txt" file which states "Provided by Roboflow License: CC BY 4.0", even if the dataset has been created by a different entity, so those files are misleading at best.


Thanks, do you have a suggestion for how we should handle this better?

We’d like to make this as easy as possible for folks to use while giving proper credit to the users who did the hard work labeling and sharing these datasets.


I am not a lawyer, so this is not legal advice.

I think the correct way would be to include a LICENSE.txt file in each subdirectory of the individual datasets with '"<dataset title>" <link to dataset> by <author> <link to author> licensed under <license> <link to license text>' and any additional attribution instructions mentioned by the author(s), e.g. BibTeX citation instructions or additional links to institutions, web profiles, etc. The authors should be referred to by their preferred name instead of their full name if they wish so. This assumes that the entire subdirectories belong to the same author. If individual images are by different authors, things might be more complicated.

https://wiki.creativecommons.org/wiki/best_practices_for_att...

All these little details might sound petty, but there have been many lawsuits in Germany where photographers uploaded images under some creative commons attribution license and then later sued people for using their images if they forgot to include the title in their attribution. Thankfully, those lawsuits have mostly died down, but it is probably better to be safe than sorry.


Appreciate the insight! We'll discuss the best way to improve this next week when folks are back in office after the holiday weekend.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: