Hacker News new | past | comments | ask | show | jobs | submit login

What a short-sighted view.

Having open access to the training data is how you prevent poisoning/biasing of the dataset. People complaining about bad data in the dataset improve the quality of the dataset. That's in addition to the benefit of creators being labeled in the dataset.

Hiding the data from public view seems to only helps nefarious actors.




Pretty sure we're saying the same thing


> Any this unfortunately hampers research and understanding of models because companies are reluctant to talk about training lest the trolls start jumping on

Respecting artists and being open about training data should go hand in hand. That companies feel the need to hide the training data from public scrutiny should immediately be suspect.

It seems like you are saying no one cares about copyright, I inform you that is not the case. I disagree with (most current forms of) copyright, but I do respect artists and their need to feed themselves. Proper attribution, and labeling and scrutiny of the dataset is imperative.

>This is only a made-up issue for a few that are looking for something to criticize. Almost nobody cares, in the sense that appears to be meant here about "ownership" of the training data

So it's not just 'trolls' that want the data to be open and labeled, is my point. If the companies are hurting artists ('s economic output), that should be examined and fixed (stopped and reattuned attention of said companies).

'trolling (bothering)' a company to be 'good (not against human interests)' isn't a bad thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: