Hey miss_classified I was pretty disturbed by most of these images. That images was no exception. Honestly till you pointed it out I had no clue. Anyway if it's safe for work or not that's debatable. Would I want my co-workers seeing this on my screen probably not. As I mentioned in the blog right at the end I'm more convinced after going through this exercise that it's difficult if not impossible for us to agree on what is and isn't safe for work.
No one would get fired for that image. I could imagine it being sold as a Halloween product on Ebay or Amazon, and it should appear in normal Safe Search search results, since it is certainly a toy. It's a scary monster toy, but a toy nonetheless.
Is it safe for every context? Maybe not. There are definitely more scenarios than just being "Safe For Work" when it comes to rating content. Work is ostensibly filled with adults.
Would this image be safe for all age groups? Maybe not. Is it possible you might encounter a product like this in a department store or seasonal store? Yes. Depending on labeling, you might see it in either Spencer Gifts or Target. So, this is where parental controls come in, and MPAA or ESRB ratings are probably a better guide to moderation.
Is it rated "G" or "E for Everyone"? I'd still argue yes, and that red plastic isn't worthy of restriction unless context is explicitly demonstrating that it's an animal eating human remains somehow. If it were labeled as "Rat Eating Zombie Guts" it might be labeled "PG" or "T for Teens" but completely unlabeled, and stripped of context, or other conceptual cues lent by well-known notoriety/infamy, it's clearly just painted rubber, and a curiosity, not a psychologically damaging image.
I'm aware of the issue, and trying to fix it. But medium won't let me make any edits. Since I originally published the blog have been in constant touch with their support but they all seem to be away for thanksgiving. Just keep getting the error "Oops! Something happened to Medium. We’ll fix it as soon as we can."
stevenicr your'e right in that regard. Yahoo is the only one who has provided an on premises solution which is really sad. Even they haven't released the dataset of their images just the model. If more of these companies released open source models on a frequent basis we would all have less objectionable content on the internet.
I guess the reason companies haven't done that and the reason Yahoo isn't really good is because it's difficult to constantly keep updating a model that's already been "released".
I look forward to learning more about "updating a model that's already been "released"." - If it could be as simple as wordpress updates (along with multiple ways to be notified similarly) - that would be optimal imho.
I believe many of these companies are keeping the models to themselves in an effort to stave off competition and try to be monopolies that can hope to get all the government cheese.
Unfortunately keeping the data to themselves in this regard is a disservice to the worldwide community in many unforeseen ways. Certainly they could keep contracts with big players just by providing the cloud computing powers and ease of setup, while offering updated models to others to use.
Maybe offer 99% accuracy to small site operators and 99.5% to contract API accounts or something.
It's more than just objectionable content for the net, it's other things too. Many apps and online connected services could benefit from running pics uploaded through a system that checks not only for nudity but also for age of people.
I could see adding some server space and cpu power to check images transferred for important issues like these - however I can't see sending all that data to third parties, sacrificing privacy of users while at the same time helping to make their proprietary model better without a stake of ownership in such.
Seriously this reminds me of facebook's "upload your nude pics to our system so we can notify you if someone tries to revenge porn share you nudes with third parties" - except that is people doing something consciously knowing they are sharing with fbook, it's employees and it's computer systems...
with these API calls, I would guess that most people who have their images run through them have no idea their pics are being shared with third parties, who are sometimes also putting the images in front of human moderators for further scrutiny.
I guess this kind of IP thing is happening with voice recognition stuff as well in many areas. At least there is more open source in that domain I believe.
1) it's very difficult to find nsfw images especially a particular kind like gore or suggestive nudity unless you Google things (which indicates a bigger problem). Maybe the solution is to use Bing (maybe this would cause the same issue in compariosn) or DuckDuckGo. But honestly I think if DuckDuckGo indexed a page, I'm pretty sure Google did as well. You would probably need something off of non indexed website which makes the job significantly harder.
2) even though google has all the images it's still not the best performing NSFW Detector Nanonets is.
If you're struggling on (1), Reddit has all kind of bizarre subreddits which cater to all kinds of images. It's also conveniently (is that fortunate?) well categorised. There are definitely subs for gory photos, non-nudity NSFW images and so on. Reddit is also a great resource for categorised SFW, since there are so many subreddits with active and strict moderation.
So one thing I did for a while was to take content from porn sites, extract key frames and any available images, and look for SFW images. Hard problems include detecting NSFW stuff that includes no nudity / genitalia (bodily fluids and solids) and correcting skin tone (most models being trained on primarily Caucasian and Asian performer data had trouble with darker skin tones). Some previous research showed that the trained CNNs were looking very hard for lipstick, so adding in samples from performers with less contrast on the lips was also important for training purposes. I didn’t notice anything terribly different when training with transgender performers (hotdog / not hot dog is very easy from an object detection basis) but I had to be sure that there wasn’t confusion that a human could have that would introduce bias into the model. Another big plus with porn sites is that your data is already tagged by its users and they are checked aggressively for accuracy.
My point is really that image searches can only get you so far and that biases are abound in casual NSFW searches to the extent you may need to curate your own data sets that look like they could be on a random porn site in ANY section. Finding an appropriate training set almost reminds me of jury selection processes.
I wrote a comparison of all popular content moderation APIs on the Internet. My basic understanding from doing this is that it is far from a solved problem and there is no 1 size fits all.