The Dead Internet is a mirage caused by the unreasonable effectiveness of digital marketing. You simply aren't able to find the living internet behind all the nonsense. That doesn't mean it doesn't exist. You just need to get off social media and google to find it.
Au contraire. Yes, the beauty of the late 90's-early 00's internet was Alta Vista, and then Google, which allowed you to find the parts of the living internet that were interesting to you. Now, search engine results will almost never serve you up a single-user's web site. They will dutifully shovel you to one of the dozen walled gardens that has become "the internet." The search engines themselves have become the gatekeepers of this dead, corporate internet, and there's no breaking out of it.
>> The Dead Internet is a mirage caused by the unreasonable effectiveness of digital marketing…
> On the contrary… The search engines themselves…
With you 100% except for the opening rebuttal. What do you think /caused/ search engines to devolve like this if not digital marketing?
I pay cash for kagi.com, and recommend it.
Engineers should try their “lens” approach. I’d pay more for trusted curated lenses, and hope that’s in their model. The site above could offer a curated list of valid sites, and then I’d find them in the one engine too. (See Similar Projects on Marginalia’s About page.)
I also pay for Neeva, but they’re clearly trying to have their advertorial cake and eat it too. Still, it’s a better resource than Google when seeking an actual product.
I worry that solving digital marketing’s ‘unreasonable effectivness’ requires more than just ability to subscribe to content without ads, it should be possible to buy products without marketing budget built into the cost. Lower cost products would outcompete those spending money on ads, so all else being equal, enabling products to compete without marketing budget is the only solution I see. I don’t think a Neeva solves this by itself, though it’s likely a necessary component.
Another vote of support for kagi.com -- it's well worth the payment.
Our (collective) disinclination towards paying for things on the Internet is what has led to the "everything must be monetized via ads" local maxima we're now stuck in.
If you care enough about this state of affairs, and can afford to do so (most people here can), then please consider paying for parts of the Internet that are important to you, like a search engine.
When you have as much information as there is on the internet, any attempt to sort through it creates a bubble. At least a paid search engine allows feedback between the users and the algorithm and some confidence that the engine won't go under because the authors lost interest/got tired fighting abuse for free/etc.
Have you seen the crap outside the filter bubble? It is trendy to call filter bubbles the well from which all sorrows are drawn but not everything is worth reading, no matter how many drink the koolaid.
This isn't scientific obviously, but I feel since I started using Kagi, my scope has widened significantly of what I rely on for results. Another +1 for what these guys are attempting.
You pay for search one way or another, I'd rather be direct about it.
It depends on what the customer base is paying for and expecting. If they expect a bubble and complain about it not being there, they'll likely get a bubble. If they complain about the bubble being there, steps will likely be taken to reduce or remove it.
For better or worse, the direction of a paid product is usually fairly well defined, as long as they've taken time to understand their customers.
The release date of the headphones is not the important data point there. The date of the article is what we care about. The article is 8 years old and therefore excludes thousands of new headphones that could conceivably be the best. Even if today's best was released in 2014, a 2014 article pointing to that same pair isn't a very authoritative source when shopping for headphones today.
Headphones don’t age. It’s perfectly fine for such an old article coming up, especially if gets referenced on prosumer forums such the r/audiophile or r/hifi subreddit.
Sennheisers headphone division got eaten by its own success: the sennheiser hd 650 is so durable and has such a great soiund quality, that people just aren’t switching away from that 20 year old headphone.
In case the link you mentioned isn’t about that mid range headphone, it’s probably about the beyerdynamics T1.
Besides talking about audio equipment: I hate the sites on google who update the dates of their articles although the content wasn’t changed. Happens way to often. I googled the release date of BOTW2 a few days ago, and 3/4 of the search results were blatant seo spam where the initial article was about something else, and then the headline and date were changed in order to get more traffic from google.
Imo you are setting the wrong priorities for a search engine.
>Headphones don’t age. It’s perfectly fine for such an old article coming up, especially if gets referenced on prosumer forums such the r/audiophile or r/hifi subreddit.
But an outdated article doesn't guarantee this to be true. For example, maybe the manufacturer released an updated version that is a better value proposition and kept the old model around to have a more budget friendly option. Or perhaps another company purchased the manufacture and demand they cut costs. Or maybe this model is new and people haven't yet learned that there is a specific part that frequently fails after a few years of use. An old article can't speak to these hypotheticals. It doesn't mean the article is wrong. It just means that the article is less informed than if it were written today giving the exact same recommendation.
>the sennheiser hd 650 is so durable and has such a great soiund quality, that people just aren’t switching away from that 20 year old headphone.
But this is only something that can be truly known after those 20 years.
>I hate the sites on google who update the dates of their articles although the content wasn’t changed.
I agree, and while this is a related issue, it isn't really the same problem. It is a failure in Google's anti-SEO features. They don't need to trust the date on the article. They could compare cached versions of the page to see what changed besides the date.
> But an outdated article doesn't guarantee this to be true.
Nor does a new article on a "review" site monetized with affiliate links guarantee it. So who do you trust more? Older but honest review from an expert or a new affilate driven review? Kagi choses the former as likelier to be more valuable to the user in this case.
The issue still is that there objectively could be better headphones released after 2014.
If I saw a photograph of all the cellphones from 2004, I can pick out the best cellphone, this doesn’t mean that the best cellphone from 2004 is still the best cellphone.
It’s implied that “best” usually means what is “best” for what people need today as those needs evolve drastically over time and especially with tech products.
As a data scientist, just being able to block Towards Data Science and other garbage DS content churned out by amateurs to get their resumes boosted is well, well worth it. It's ridiculous how much top ranking content on Google is flat out technically incorrect, or at least clearly misunderstanding the subject.
Also a big fan & paying customer of kagi.com - the only setting adjustment I've made so far is pushing docs.rs up in search results, so I get that instead of crates.io when looking up Rust stuff.
To me it falls into the category of IntelliJ products, where it makes my life and productivity so much better that the price is a no-brainer
What worries me about Kagi is this bit in their FAQ:
"... it costs us about $1 to process 80 searches. ... An average Kagi beta user is actually searching about 30 times a day. At USD $10/month, the price does not even cover our cost for average use, and we are basically betting that average use will go down a bit with time because during beta people may be searching more than normal due to testing etc. Our goal is to find the minimum price at which we can sustain the business. If it turns out that we have more room we will decrease it. But it can also be that we may need to increase it."
I went and looked up my Google search history for yesterday - it's 40 searches. I'd expect it to be above average, but still... if it's $10 per 80 queries, it feels like $10 is likely to be too low to be sustainable. And while I personally don't mind paying more, I wonder how many people will - and what it'll mean for the service long term, if they just can't attract enough people to make it worthwhile.
Completely agree. Just a few years back before I was using a password manager with random passwords I googled one of my commonly used passwords. I was able to find leaked databases of passwords etc.
The other day I did a similar search and absolutely nada.
When I search for technical information 2 out of 3 times I get a website that I must pay to view content.
The internet is clearly going in a bad direction and most average joe users are suffering and will likely suffer more in the future.
> When I search for technical information 2 out of 3 times I get a website that I must pay to view content.
I usually get thousands and thousands of cloned websites that were likely set up in bulk using a template. They copy-paste just enough text to produce a search engine hit, while the real website it came from may not even be in the search results no matter how many pages of results I click through.
And then there are the elaborate clones of Github content, Stack Overflow, and various other technical help websites, all designed to make it look like all of those discussions are happening on the clone rather than the original. Some of them include a link back to the original, some don't. I get why some of those websites are ok with their content being openly reused (not that spammers care anyway), but in practice it destroys discoverability of their own service and wastes people's time.
Pinterest has spread through Google Images like a virus, they're plastered all over the results for searches that clearly aren't from boards made by real Pinterest users. I doubt it's a 3rd party spamming Pinterest because the only entity who actually benefits from it in practice is Pinterest itself. They've changed their onboarding pattern a lot over the years, but at one point it was virtually impossible to click through to the original website at all before the account creation popup blocked everything else.
Putting Pinterest at the top of image search results is effectively nothing more than a funnel to onboard more users for Pinterest, they rarely, if ever, have any relevance. I can't imagine why Google hasn't knocked them out of the results entirely at this point.
Whatever they're doing to combat actively hostile spam websites is either failing or they simply don't care anymore. The end result could not be more obvious.
"Putting Pinterest at the top of image search results is effectively nothing more than a funnel to onboard more users for Pinterest, they rarely, if ever, have any relevance. I can't imagine why Google hasn't knocked them out of the results entirely at this point."
Absolutely. Pinterest makes Google Image Search much worse than it otherwise would be. I haven't clicked through to Pinterest from an image search for a while, but IIRC you couldn't click through to the image without getting 1. pestered to login and 2. having to scroll through multiple pages once you logged in. I have a vague recollection of this not being compliant with Google Webmaster Guidelines, but I can't recall which specific section, and it's not as if Pinterest is the only large player allowed to get away with breaches the average webmaster would be deindexed for.
Agreed, the long tail is disappearing from google.
Between datasheets and old cringey fanfic of mine, there are more and more resources that I am aware of that absolutely still exist on the internet, with reasonable robots.txt, but can't be coaxed out of google even with exact snippets.
Some time ago, google seemed to start really favouring corporate content.
It used to ensure most searches would have a few blog results, a wiki link, some large corps, some small corps, but that’s fallen apart.
I know this for the wrong reasons. I used to publish pages for my bank’s phone numbers because… I’d just publish their phone numbers.
While this is kinda a bad idea, now searches will give you 10 links to the bank’s own website and they make it difficult to find a number because they don’t want you to call them.
Yeah, this drove me nuts looking for it, but then I remembered car was in a wreck and I think hood was replaced, no sticker. Manual only says contact qualified service technician. I'm pretty sure it's R1234yf, but still would think it should be something to easily find online.
If the car was in an accident and the aircon doesn't work anymore, it means the gas loop is leaking. You can try to refill it but depending on the size of the leak it's going to work for a few hours to maybe a couple of days. You should evacuate the loop and do a vacuum test. If it is leaking, refilling the system with some added dye can show you where it is leaking. The Schrader valves are the usual suspects but as the car has been in an accident it could be anywhere. Adding refrigerant to a leaking system is just blowing away money that could be used to actually fix the aircon properly.
AutoZone has a database of that type of thing. Search for the product and filter by the vehicle, or you can just ask at the desk if that's where you're gonna buy the refrigerant anyhow.
Post-1995 and old enough to be out of warranty? Overwhelmingly R-134a.
If you can’t find a sticker (or if that sticker says R-12, it still may have been converted), unscrew the cap on the service port and match it up to the type of port used by each refrigerant.
I had this experience the other day. I was searching for a #define constant I knew was in the Linux kernel source at least once (I was staring at it from my local clone), but none of the search engines returned anything. It's infuriating.
Who stops people from running their own search engines?
As if you cannot look up address range of your own country then crawl your whole country for websites that may be hosted by people living locally.
As if you cannot do the same with a foreign country that interests you.
Maybe you could even find a list that only shows residential IP's so you're sure to be only finding webservers ran by individuals and not corporations.
And if somehow "port scanning" by trying to send a http request to a residential IP is illegal in your dystopian country, you can always start by scraping the site that you're interested in, there will always be at least one more link to another domain somewhere.
For large scale servers python is shit, but that doesn't mean that you cannot spend few weekends writing your own python crawler for your needs, which is so easy that you don't need to be a programmer to do it, and if you really care about this at all, a bit of a startup hurdle won't make you immediately disinterested.
And if it really does, there's always options like https://yacy.net/
You should see these things more like real life. If you wanted to know more about your own neighbourhood, what better way is there than to go outside and walk around your neighbourhood and see things with your own eyes?
Maybe that's just my opinion, but status quo is noone's but your own fault, because I never had this problem.
Content producers get sucked into walled gardens. Even if it is just an internet discussion, but nobody will ever read your shit on Discord if it is older than a week. IRC had some of the same problems, but only to a degree. So user content is decreasing and corporate and bot farms remain for the open net.
You could crawl forums and find deep technical discussions. Not anymore. And if a term was ever part of any news cycles, you get walls of Google selected propaganda.
I was just tired of YouTube steering me back to the same old small niche of videos, many times giving me repeat recommendations for stuff I'd already seen. Our algorithm is designed to surface smaller channels and find more obscure content.
Even if your neighbor hosts a website for your local football club or whatever, it will almost certainly use some hosting service and not a local machine. The number of websites self-hosted from a residential home must be a tiny, tiny fraction of all "interesting" websites.
First, the internet has grown a lot, so the time&cost&hardware for retreiving and indexing it has grown.
Second, the quantity of intentionally fake noise has grown even faster - the spam problem that you have to solve is much harder than 30 years ago, any naive approach will simply fail to notice the needle in the haystack.
That's like if someone says "a car is $18,000, just make your own car," another person laughs and says "you don't really think you can build a street legal car from scratch without making a car company," and someone else saying "really, what do you estimate a car company would cost?"
It's completely unnecessary to make that estimate, a nonsense proposition since any two implementations are two orders of magnitude in cost apart, and a question that should never be asked of someone who hasn't done it.
Which is weird, because if you are who I think you are, you've done this in a trivial way, focusing on tiny sites.
And who knows? Maybe you're about to tell me that you've indexed several tens of thousands of pages yourself, that nobody's helping you, that it runs on two computers, and that it's Not That Difficult (tm).
Of course, then someone compares that engine to a practical search engine that also encompasses modern sites, and therefore needs to run tooled browsers to cope with their AJAX nonsense, and has to hit them every hour to be up to date.
And then you look at the disk cost.
Microsoft spends about $6 billion a year on Bing.
Duck Duck Go has more than 200 staff and raised $170+ million before their first profitable quarter
I think it's very easy for someone to put a homebrew HTML chess game on the phone store and then turn around and insist they know what it takes to run EA
I'm not disputing that you can sink inordinate amounts of money into a search engine, but on the flip side, I am running a search engine.
It indexes not tens of thousands of pages, but has a peak capacity of about 100 million documents. I can crawl over a billion documents per month.
I don't really see anyone suggesting competing with Google or Bing off a PC in your garage, but it is absolutely and demonstrably feasible to build complementary services without any budget at all.
It doesn't require huge numbers of developers, it doesn't require a small country's allotment of bandwidth, and it doesn't require data-centers full of prohibitively expensive hardware.
You're talking to the creator of a search engine specifically designed to turn up the single-user portion of the internet. I would venture a guess that they know better than anyone what's out there in the non-Googleable internet.
No one is denying that bespoke web sites exist. I still run one myself, almost out of sheer stubbornness. But nothing hits it except search engines, which never show a result from it.
I use millionshort.com for searches that I know will be useless on Google or similar. It lets you remove "top" sites and also e-commerce to the extent it is able to identify them. Though it generally works OK, sometimes it has gotten wonky and gone into a captcha loop.
I have been getting great results with you.com and am downranking all the sites I don't find value in. Searching for shopping related queries has been a blast in comparison to Google.
You can't find what's not there. Way too much of the primary content has moved behind those garden walls.
My local farm has a website where they list the stuff they have available. Meanwhile their actual scheduling and detail updates are on their Instagram because of course it is.
Accurate, up-to-date information for many businesses seems to be about 90% on Instagram and 10% on Facebook. The website, if there is a website, has no information, or old information.
This is frustrating (among other reasons) because Instagram has become much more aggressive in not allowing you to even see their content without logging in. Sometimes you can see the gallery but not an individual posts, sometimes no individual posts, sometimes you can't see anything at all.
Even worse, you cannot use a browser from mobile woth cookies deactivated, at least thta's what Instagram claimed the last time I tried accessing it using DDGs browser. I'm sure that overall nbers improved, I stopped bothering that very day. That leaves WhatsApp from Meta's apps, which seems to become Facebook's Excel: The only reason to use anything from that vendor in the first place.
This is particularly acute in countries that skipped the desktop internet and jumped straight to mobile internet. For much of this userbase, their first and primary interaction with the web is through apps. They never bothered creating blogs or web pages and now all their content is behind the walled gardens of Facebook and Instagram.
I'm not having much trouble finding this type of content on the search engine I built that does exactly this. Not everything is available of course, but 20 years ago, you'd probably have to call someone for that information, so not much difference.
At some point it would probably become useful to teach “internet/tech literacy” to educate people on why this is a problem. But we’re a few decades from something like this.
When I was a kid we had "computer class" that taught how to type, how to use Microsoft office (and open office) applications for different use cases, and this was then mixed in with understanding different sources of information taught by the school library and english classes.
As kids are now raised on smartphones instead of the family desktop, I think they need MORE of this, not less, for at least the very important skill of typing. I wonder how many 12 year olds in america can type using the "standard" method, instead of hunt and peck.
I don't want computing to be something only known by the children of turbo nerds. I want young adults to be able to solve their own problems with computers, ie build some spreadsheets for home finance or even just be able to graph the data from one of their science classes.
True technical literacy is at an all time low, IMO. Whilst more people than ever are "online", the barrier for admission is so low and very few people ever seem to learn more than they absolutely need to.
As you can't develop software on phones and tablets, very few people are tinkering with software. The Pi and iOS app craze brought a momentary change, but it seems to have gone back to how it was—and worse.
Kids of today are mostly out of their depth when put in front of a computer of any description if it is beyond basic website usage. Complex program? Forget it. Decent typing skills? Forget it. Networking know-how they'd have picked up from doing LAN gaming with consoles or PCs? No chance. Change a drive? LOL.
For the handful of kids that game on PCs, they're generally not very clued up and they're just copying builds they've seen on YouTube to the word. It's a sad state of affairs.
And yes—of course, there are the kids or us turbo nerds, because of course there are, but they are so few and far between.
That is probably because the farmers hired somebody one time to make a website which they don't have the skill to maintain, but they know how to use Instagram.
Well, yeah, I know why they do it and it's a rational decision for them. It's really a condemnation of our own evolution as an industry and the incentives involved that it ended up that way.
On the other hand, 20 years ago such farm would not have any internet presence at all, not to mention detailed and up to date inventory info. Instagram et al made it possible.
20 years ago there was no need for any of this. You went to tje farm, dod your groccery shopping and tgat was it. Or you didn't go there. Either way, farms did exist back then just fine.
Who on earth would expect a local farm shop to be on par with Amazon when it comes to inventory and availability data online?
My local farm does already have that information, like, "blueberry picking suspended, waiting a week for ripening". Like, I don't have to expect them to add inventory information, they're already doing it because that's how they communicate with their customers. It's just only available behind the garden walls. It's an indictment of us as an industry that it turned out that way because that was the convenient way to do it instead of a more open and user friendly way like a website.
Twenty years ago you just called them for the information, and it's way better for them to broadcast it than have a hundred 1:1 conversations.
BeautifulSoup and its clones does parsing pretty well. Just extracting the text out of HTML isn't incredibly hard, and metadata is too unreliable to ever be much use.
You can calculate anchor tag density across the DOM tree and prune branches that exceed a certain threshold to remove navigational elements with reasonable accuracy if that is a problem.
It's not going to be perfect, but even Google messes this up every once in a while. I wouldn't consider it a major hurdle.
The hard part is understanding which parts are the content versus navigation or promotions of other content. I’ve written a couple search engines. Have you tried making one with beautiful soup? Why does it matter? You love seafood, so just literally run grep on the entire page and if it contains the word then include it as a correct. In reality, you will miss a lot of real seafood pages because they don't really need to mention "seafood" and context matters, so what? Chances are that that one website where person randomly added "I love seafood" to the top of the page will be the only page that you've ever wanted to see anyway. There's too much data for you to go through in entire life in any case, so why worry about it as long as you can get something that's good enough? You will never get best data, if it was possible, google would be giving you best data already. How do I know? Well, looking up my real name shows where I grew up, what school I went to, graduated, and even which exam I scored 100 on... And even some places I used to work for in the past, and while that part is going to make most people paranoid, I wish ALL results were as detailed as this one, but there's little you can do. No I use JSoup for my search engine. You can calculate anchor tag density across the DOM tree and prune branches that exceed a certain threshold to remove navigational elements with reasonable accuracy if that is a problem. It's not going to be perfect, but even Google messes this up every once in a while. I wouldn't consider it a major hurdle. I don't presume the source is available... unbelievably cool project that I'm sure a lot of people have imagined themselves doing.
Yeah, it depends on what you want to prioritize and value in your search engine. I’m coming at it from the angle that if you want to make a good, new, and different kind of search engine you need to do something fundamentally different than Google. No one is going to beat Google at their own game. Leveraging meta data is a very easy way to make something new and different, but it won’t be as comprehensive as Google. I doubt that someone doing what you described over a few months or year could make a search engine that anyone wanted to use.
> I doubt that someone doing what you described over a few months or year could make a search engine that anyone wanted to use.
Dunno, not only are people sending me money to develop my search engine, not enough to live off but still, I also get emails and tweets from people who say they love it almost on a weekly basis.
I think attempting to be as comprehensive (or more) than Google is a trap. The better move is to fly under them. Be cheaper and better at something. Recipes is a great example of something Google is just miserable at, that is easy to do much better. There's plenty of such niches.
You love seafood, so just literally run grep on the entire page and if it contains the word then include it as a correct.
In reality, you will miss a lot of real seafood pages because they don't really need to mention "seafood" and context matters, so what? Chances are that that one website where person randomly added "I love seafood" to the top of the page will be the only page that you've ever wanted to see anyway.
There's too much data for you to go through in entire life in any case, so why worry about it as long as you can get something that's good enough? You will never get best data, if it was possible, google would be giving you best data already.
How do I know? Well, looking up my real name shows where I grew up, what school I went to, graduated, and even which exam I scored 100 on... And even some places I used to work for in the past, and while that part is going to make most people paranoid, I wish ALL results were as detailed as this one, but there's little you can do.
That’s how you make a worse search engine than Google. If you are serious about competing in that space I think you need to do something fundamentally different than Google. Treating pages as a bag of words leads to a shitty search engine. Like I said, I’ve built a few search engines, and I have tried this.
That actually sounds like the solution. If you're getting something standard, you don't want it. If something is too non-standard to be identified, pass it through.
Search engine spam is actually a fairly solvable problem if you aren't in Google's questionable position of also selling the ads that make the spam economically viable.
They can do everything except the one thing that would actually hurt the search engine spammers right in the coin purse: Penalize websites for having ads.
I don't know about that. If I recall correctly, there was a time when Google was trying to tamp down on pages with excessive advertising, so SEO spammers just switched to making pages that superficially looked like normal informational pages, but the content was all either ad copy or spun text, and all their links went to products they wanted to sell.
I get where you're coming from, but my experience is that even many people making sites for fun stick ads on the site in the hopes of getting a bit of extra cash. Lots of fansites did so in the olden days, and many useful sites, wikis and blogs do so now, especially if they either use hosting that adds them or they get enough traffic they feel they need to.
I'm not saying block ads, I'm saying prioritize sites that doesn't have ads above those that do. Prioritize those that have some over those that have a lot.
The whole premise of Google at the beginning was that the web is a collaboration, and we can measure that.
Then it was we can measure that and make money.
Now it’s just we can make money.
Whether that part is sinister or not, we know that we have a good number of bad actors, and from search engine results we can be sure that they have not developed a workable Byzantine fault tolerance mechanism to filter out the bad actors. Those who scream the loudest get put on a stage.
Companies that acquire other companies and talent will always change. The goal of making money over everything is the one thing all Class-C corporations have in common.
It cannot be gamed. You wrote your own search engine, you wrote your own filters, you know how it works and noone else.
Whitelists that I wrote by hand also don't introduce new unexpected entries by the way :)
This instead could be more like RSS where as your crawler gets new sites, you get updates on new things, and you could filter in your crawler or in RSS client directly, doesn't matter.
It would actually be lovely to have a search engine that only serves up single-creator or few-creators websites. I'm sure one exists, but I'm not aware of any.
I haven't used it (or their browser Orion, which is allegedly better-performing and more battery-efficient than Safari), but have you tried the "premium" search engine called https://kagi.com/ ?
Kagi with lenses is tremendous. I recommend it elsewhere in this thread.
One should subscribe to it out of support, hope, and to send a signal if nothing else. But it’s actually considerably better on most searches, perhaps similar to a mid-2000s Google, except with mild structure added that isn’t ads.
(You can still !g like ddg if you feel you absolutely must.)
Imagine websites having a captcha, that proofs that they are handcrafted by a person, proofing the time invested, the back and forth. You could filter for that.
This actually seems a reasonable enough solution to the bot problem, but doesn’t address the fake content problem. A human can still upload fake stuff.
Not, if the whole creation process has been documented with a sort of internal blockchain.. From scan sketch, to open canvas, to every single brush, to convert image to upload image. You can fake that too, but the effort would be tremendous
I just Googled my name and the search engine returned a good long list of pages featuring my work in various places[1]. Of course it does help to have a fairly unique name; others may struggle to find themselves featured on the first page of results when they Google their name.
[1] For instance, this link - which I discovered just ten minutes ago. I know for a fact that I have never submitted poetry to the Porkopolis website (motto: "Considering the pig, a single-minded bestiary") but it's always a pleasure to discover other people putting my words to good use! - http://www.porkopolis.org/pig_poet/rik-roots/
I also can easily Google my name. There are only 3 of us, and one of them is my deceased uncle. It doesn’t change the fact that I probably haven’t had any organic traffic to my personal web site of 28 years for a long, long time.
The price of entry is very steep. Many people are willing to pay. Many are not. It's an exaggeration to say that many walled gardens are free to get into, I think. At least, I don't know of any.
I had the same reaction to the main thesis of this article. This is like people taking roadtrip across the United States who are only willing to go 1/4 mile off the federal highway. They then decry the state of cuisine because the only restaurants left in America are the same 5 fast food joints.
Getting outside one's comfort zone and putting in the time to find something good/interesting/new is highly underrated. But it is work. And many a corporate empire has been built by making a mediocre or sufficient experience the most convenient thing.
>This is like people taking roadtrip across the United States who are only willing to go 1/4 mile off the federal highway.
Urban Spoon was an amazing resource for us road warrior types. I found many fantastic places > 1/4 mile off the interstate. Nowadays, I ask employees at worksites for their opinions. If they recommend a box chain, I ask someone else.
If you know you're going to be in or near a particular town and want food, a good place I've found to start is the restaurants that advertise in the local church bulletins.
The physical location where a field tech installs and implements technology &/or equipment. I worked in Building Automation Systems(BAS) in big box retail stores and covered >400 sites in 5 US states.
Frankly, Marginalia shows how many of these glorified websites are just dead, forgotten projects that are only still running because nobody remembered to take them down.
While I like using Marginalia to find these websites, I don’t think it’s a demonstration of how “alive” the internet is but more like a lens into what the internet used to be, like walking around an archaeological dig site.
If the majority (vast majority) of users of the internet rarely break out of social media and you posit that social media is largely 'dead', then isn't the internet effectively dead for the vast majority of users?
Would those smaller social media platforms then be large, end up commercialized some how, smaller ones pop up, and something else take over until we repeat the process?
Access to the real, genuine Internet people and places will be made invisible; protected by the gargantuan SEO-fed lipid-berg of AI-generated content, keeping the social media peasantry ever-corralled in the cattle pen where they shall be kept happy and fed by their keepers.
The problem isn't really with the internet, but a handful of websites that have come to mediate the internet for a lot of people (despite appearances to the contrary, the two are not the same).
This is a critical distinction, because the former is a problem like "the water is too wet", like you can't really fix that. You can build new digital infrastructure though. That's a solvable problem.
And then that new infrastructure will be subverted just as the current one has been. I don't think this is an issue that can be solved through engineering. I think the problem lies in different domain.
Can't we just build a new infrastructure at that point? The current infrastructure worked well for about a decade. If we have to rebuild every decade then that's not so bad.
I would dispute that digital marketing is unreasonably effective. In fact I would say the dead internet is a result of the exact opposite. Digital marketing is so ineffective that basically the only strategy that works is spamming it everywhere. If you don't believe me look at the revenue per user of ad driven tech companies.
Effective is a slippery word. I use it to mean effective in the sense of affecting what is presented. You use it to mean effective in the sense of making sales.
It is extremely effective in the first way, but extremely ineffective in the latter.
> You just need to get off social media and google to find it.
Maybe if you know the exact url or specific keywords, but generally not now. Google has turned into ad placement the same level ask jeeves and their ilk were. It's atrocious for surfacing anything other than click bait. Duckduckgo is better, but not by much imo.
Even if you think it's a net positive when human attention is centralized in a few things like TikTok or Fortnite, you should still be able to enumerate downsides.
At that point, you'll see that you are disagreeing with the subjective weight of the upsides and downsides thus it doesn't make sense to attack someone like this.
> You just need to get off social media and google to find it.
Have you tried searching lately? It feels like it is becoming increasingly difficult to find actual articles with useful information in a sea of SEO trash.
Wow, awesome tool! Seriously, thank you for this. This really brings back that sense of wonder I experienced when I was younger and began exploring the internet in the early 90s
I think google and the other engines are part of the problem: ads, walled gardens, fake sites, and more.
A colleague showed me a website the other day from 2013 that was an absolute jewel in terms of knowledge. I am sure more recent sites like that exist, but I am afraid finding those with google are almost zero.
Cool. How? Can you elaborate on "I crawl myself." Do you mean you are the search engine and your search engine is an index of stuff you found that fits your criteria? What criteria, precisely?
It's basically a 1998-style Google which discriminates against javascript-heavy documents and uses a Personalized PageRank[1]-algorithm to promote content that is adjacent to human websites.
Thanks. Tried some searches, good stuff. Recipes are a well known disaster area these days, it's nice to be able to find just an explanation and instructions, with nothing else.
I assume you've considered this, but what about deprioritizing precisely what spam sites use to SEO, or concocting some other identification algorithm, to filter precisely those out?
While yeah, marginalia finds interesting stuff, I've not been able to find anything useful that I've tried searching for with it so far.
>You simply aren't able to find the living internet behind all the nonsense. That doesn't mean it doesn't exist.
It's not about it being not existent. It's about it being too small a percentage. And will algorithmic generation and rampant re-posting of news content 1000s of times on different outlets, this is probably true...
8 billion people are able to manually create much fewer content than thousands upon thousands of automated generation scripts and bots...
Refreshed few times. 99% of sites were about IT, programming, gaming, tech, STEM in general. That's really narrow representation of internet. Maybe it's just my luck.
That's the problem with the approach you've taken. You're showing domains, not web pages. Not all hobbyists or interest groups outside tech have domains.
You can always contribute by crawling ipv4 range searching for http servers that don't necessarily have a domain pointing at them and posting a list for us when you're done crawling it merely once :)
I'm assuming you're referring to Facebook groups or similar walled-garden web presences? If so, I'm assuming indexing those is an explicit non-goal for OP, and I'd agree with that. The walled gardens are very hard to index (often actively hostile), which means the returns would be rather low. Plus, most such places have searches of their own if you're really interested in finding something there.
In my experience, part of the problem is that often (usually?) the search on those places is nearly useless. For example, I shudder to think what will happen if old.reddit ever stops working.
Well the literal point of the tool is to show that there are websites outside of the walled gardens. There are plenty of link aggregators already. Hacker News is one of them. Reddit is another. Twitter is a third. Google is a fourth.
This article engages in my pet peeve, which is referring to entertainment as "content".
I don't have a problem with the word "content" in the context of content vs framing. e.g. if you are designing a network protocol, you care about distinguishing the content of the message from the framing, and you don't care at all what the nature of the content is, it could be video, image, text, etc, and it could serve many different purposes (entertainment, personal communication, employee training, archival/backup, etc).
However, this sort of language has crept into discussions of online entertainment, for no good reason. "I'm not an online entertainer, creating entertainment for people to enjoy, I am a content-creator creating content for people to consume." I think people don't like to think about what they're creating or consuming as entertainment, because society has already attached a connotation of triviality to the term "entertainment" (for good reason IMO).
Someone who talks about "content" is adopting the terminology and framing of a businessman, who cares little what purpose the "content" serves, just that it can attract attention, and thus money.
To be fair, once the internet is entirely taken over by bots, maybe it will be appropriate to call the stuff that bots create and consume "content" without a whiff of irony.
The funny thing is that that definition of content is still operating. “Content” is a term for the undifferentiated media framed by advertisement. To an advertiser it doesn’t matter what the content is, it’s just bits on the wire that seem to attract attention.
If you can’t put an ad in it then it isn’t content. Insidiously, we now call things content even if they don’t have advertisement or are not created for show.
This term of business art reminds me unpleasantly of Borders, and now defunct bookstore chain where managers insisted staff refer to books and other media as 'product', reducing staff to little more than super-market-grade shelf-stackers and merchandisers.
I grew up in a time and place where people who worked in bookstores did so because they liked books and were knowledgeable about the book industry as both distributors and consumers. Chain stores like Borders preferred to hire young and cheap and make stocking decisions centrally. The staff could probably have been swapped with a completely different retail establishment, and both outlets would have run in much the same way.
Free market advocates like to go on about the self-corrective nature of 'real' capitalism/competition, but never seem to have any answer for the existence of franchises or their tendency to crowd out other participants in a market by having a much deeper pool of capital to use as leverage. The idealistic models of perfect competition and price equilibrium only work well under elusive conditions and for fungible commodities.
Thank you for saying this. I've had a similar opinion for years now. I cringe whenever I see "content" used this way.
When I first saw that it was gaining traction, I took it as yet another sign that the internet was effectively dead in terms of what always made it great for me, and had been turned into nothing more than business.
The problem is that not all content is entertainment. Many "content creators" are producing content whose intent is to educate, inform, persuade, or (mostly likely) some inseparable combination of all four.
I don't like "content" either. I prefer "media". But there's an entirely logical reason we don't call it "entertainment". That would be like calling all clothing "pants".
At least 95% of what I watch on Youtube is educational or technical. It's self-teaching material, math, science, that sort of thing. Most people would find it dry but I enjoy it.
Still, I wouldn't call it entertainment.
I assure you it has just as many ads: pre-roll ads, clickable ads below the fold, interstitial ads, and "a word from our sponsor" ads as everything else on YouTube.
It's literally entertainment; you're being entertained by it. That doesn't mean it isn't teaching you or otherwise improving the world, but it's broadly in the class of entertainment.
I suppose I'd still classify it as entertainment even if you paid for YouTube premium.
If we declare anything enjoyable to be “entertainment”, then wouldn’t household chores that happen to relax you or bring you joy also be classified as entertainment? Cooking does often feel pleasant, but I don’t think most people would call it entertainment.
If we call all content that we enjoy watching “entertainment”, then IMO the word loses the meaning a little.
Those are often referred to as pastimes or hobbies — and I think the “entertainmentification” of all (lacking a word let’s say content) has been a major issue in the recent decades.
It’s becoming hard to even find a youtube video that doesn’t have “like and subscribe” somewhere in it, even if it’s not otherwise sponsored.
Asking to like and subscribe is a standard YouTuber behaviour because they presumably get a cut from the ads that YouTube runs on their channel. Or even if the channel is run entirely not for profit, liking and subscribing increases the chance that new users will see the content in their suggested videos thanks to an algorithmic boost.
I think you're projecting your own Internet consumption preferences onto all of us.
Most of what I spend my time on on the web and YouTube is more towards educational than entertainment, though the line gets blurry (which is the point of having a unifying term like "content") when it comes to videos about music-making and stuff like that.
You can already call most things "content" without a whiff of irony. Just look at youtube. "Content creators", you think they're misusing the words? No they're being 100% genuine to you what their intention is.
I see people write “I like their content” and have the same irking feeling that some people feel like their life is little more than consuming entertainment, and the word “content” elevates the worthwhileness of their behavior/addiction somehow.
I'm not sure I follow your criticism. I do believe that the term "content" is perfectly appropriate for most of what's being put out there. "Content" convey this idea of "quantity over quality". I am grateful when someone identifies themselves as "content creator", or when streaming platforms talk about their "content", because to me it is a strong signal that it's gonna be garbage and can be ignored.
From my observation, this also a counter-movement of influencers to distance themselves from the term influencer. People want to take some pride in their work and try to see some worth in it. And they want to distance it from the bland marketing and stupid pointless activities. Everyone makes pictures of their food, but if you hire a semi-professional photograph and pose for a picture, then it's real work and worthful content. Something along that line.
I get a similar sort of reaction to "content", and I get a similar reaction to "units" which is used as a catch-all term for physical items sold, be they DVDs, t-shirts, game cartridges, wrenches, sneakers, whatever.
The person talking about sales of "the thing" doesn't care what "the thing" is. The language implies that "the thing" itself is irrelevant. All that matters is that it's something that can be sold, and the only important datum is how many were sold. And maybe how much they were sold for. Aside from that, to the person using that language, it's all just undifferentiated stuff.
I think from there, if the person talking about "units" or "content" doesn't really care about what the thing is, they're going to care even less about whether it's a good example of that type of thing. Is it a good t-shirt? Or a bad pair of sneakers? Who cares - how many units were sold?
Are you making movie review videos? Or 30-minute+ EDM/prog fusion atmospheric music tracks? Or 5000-word investigative journalism takedowns of corporate shitfuckery? Or are you just making "content" - whatever will grab some eyeballs?
Cannot stand the term either, so many problems with it. For one it completely lacks any kind of precision... and the idea of 'consuming content' is just gross and stupid.
The word suggests undifferentiated goop (pink slime?) being poured into cans for mass consumption, and shows the respect its producers have for both the product and its consumers.
Regardless, I think it's interesting to consider why it's changing and the direction of those changes. As an extreme example, if 1984-style doublespeak started to take root, it would be valuable to consider why and whether this is a good thing.
I actually experience this in the exact reverse from the way you are proposing. I’d argue that this is specifically referred to as “content” because it is generated en masse, as opposed to something carefully crafted. This is they way I see most “content” on the internet. For instance, TikTok video that is made in a minute? Content. Pixar film that takes >100 minutes of work for 1 minute of film? Art.
The end result of all this fakery is a growing doubt and distrust of the world and the information presented to us. Bots on twitter, corporate reddit moderators pruning discourse, astroturfed discussions, deepfakes, AI generated news articles, AI art, it all waters down the assumption that what we see before us is real. Leading us to doubt everything we read, see and hear. Much of this bot driven noise online is only possible in large, public online communities.
I think we will see a shift towards much smaller walled gardens of community online. It's already happening with the mass exodus to discord and smaller chatrooms. I think we can all safely assume that our 30 discord friends are real people... for now.
The country club exists for the wealthy to enjoy the pleasantries of community and pastime without interruption by the masses. I think the internet will move to mirror the real world as we segregate apart into the places we most enjoy... or have the connections and money to afford. Authentic and vibrant human communities with novel content curation will be a luxury, while the "public pool" for the masses will be an internet of data pollution and grime.
Yes, the article mostly assumes that the initial effects of AI generated fake content will be the same as the final effects. This is silly.
People will change what they do in response. Though at the very end, he does say "We should learn to be skeptical of content", that belongs near the beginning, before an analysis of what the effects of increased skepticism will be, rather than what the effects of blindly believing fake content will be (since that won't happen, after a short initial period).
Smaller communities are one possible response. But just more critical assessment of arguments and reported facts is another. For arguments, it doesn't really matter whether or not the argument was AI generated - if it's valid, it's valid, if it's not, it's not. For factual reports, critical assessment might be more difficult, though I think it will be a while before AI generated fake facts have the the right sorts of connections to common-sense reality to withstand critical examination.
Content, info, arguments, etc. are all propagated online based on their deliciousness. Is it dramatic? Easy to digest? Shocking? Emotionally powerful? Bright and alluring? Sexy or disgusting? These are the elements that push information to the top. Reality, truth and logic can't compete.
Advertisers figured this out in the middle of the 20th century. Prior to Edward Bernays' (Sigmund Freud's relative) revolution of advertising, products were marketed based on their functional qualities: how effective they were, how efficient, etc. Bernays realized from war propaganda and Freud's ideas of the unconscious, that selling with emotional coercion and sex was far more effective. In fact, you could make people buy things they didn't really want or need, by making them unhappy without them. He was able to convince women to smoke cigarettes by having trendy, independent women smoke openly at a parade, followed by a branding campaign calling them "torches of freedom". This concept of emotional manipulation trumping factual data is how our entire society now operates.
If we want a skeptical and thoughtful populace, our entire education system must be restructured and information dieting will have to become an innate part of the online experience.
People haven't shown the inclination for more critical assessment so far; why would that change all of a sudden?
And AI fakes are still in their infancy. For example, they haven't learned to push emotional buttons yet. But they will soon, because it's not all that hard, and it drastically increases the virality.
> right sorts of connections to common-sense reality
Unfortunately, I think this matters less than it should. Connection to common-sense reality does not seem to be a prerequisite for most people who engage with content on the internet.
> I think the internet will move to mirror the real world as we segregate apart into the places we most enjoy... or have the connections and money to afford.
Or we institute ever more stringent standards for verification of online accounts, both to prove one is human, and to tie online reputation to real identity. Not that I want to see this happen.
I know it's not really the point of this comment but public pools historically have been places to enjoy the pleasantries of community and pastime.
As an additional aside, you should spend some time considering the implications behind your selection of two significant hallmarks of institutionalized racism as your poles for opposite ends of a spectrum from "the pleasantries of community and pastime" to "pollution and grime [of the masses]".
They're a perfect example of how in America we have these public services that get overloaded and degraded in quality (because they are public), so the rich go and make their own private luxury versions to enjoy a more selective and high quality experience.
Reddit will be the future ghetto of the internet while the elite hang out in private discords!
Ugh another “think-piece”. Can’t wait for these to be generated automatically so we can move on to the next thing.
Literally no point here other than “vague broad unexpected content thing is coming” and look at all these “AI generated pictures” and random pieces of evidence.
Dumb humans will always get caught up in a web of bullshit and fakery because they always have. One could argue that it isn’t the first time someone or something has hacked people’s minds. Ideologies, religions, technologies have been used countless times by smarter humans to trick dumber humans into giving their shit away. And plenty of smart humans have also always stayed relatively quiet and out of sight knowing better than to make themselves into targets. The internet only reflects those dynamics. The only thing that is changing is that various areas of interest are becoming populated and settled “online”.
It all depends on what a given chunk of “internet” is used for, who populates it, and how much they spend on things they get there.
The stop overthinking for increased productivity pieces are the ones I'd like to see die. The fact that GPT-3 can write that garbage will hopefully develop a sense in more people that allows them to detect when a lot of words are being used, but nothing is really being said. Being afraid that their thoughts will be taken for computer output might force writers to actually create a model, or make a material claim with their words, rather than just trying to cast spells with them.
Everyone is dumb one way or another. Especially those who are specialized in an area of expertise. That’s when we over-estimate how much we know about adjacent specialties.
There is definitely a type of content that works on all of us collectively. It’s like catnip.
Yesterday, in a slightly obscure subreddit I frequent, every single one of the posts were by bots reposting old content from the subreddit with random letters changed in the title. Upon further examination, most of the comments were from other bots as well.
All they did was automate a process living, breathing human redditors have followed for over a decade: farming karma with reposts. The fact nobody noticed the switch over from human repost spam to bot repost spam is pretty indicative about the overall quality of the site's community, I think.
UI/UX shapes how people react to what they see. Since the reddit redesign the quality of discussion has dropped significantly. When the interface shows less than half the content it used to and it's all focused around reddit gold emojis etc instead of the actual comment chains and posts it's expected. Sure you can redirect to old reddit but most won't and you can't change the quality of content other people put out. Overall it's incredible how easy it was for me to go from several hours of browsing interesting topics daily to barely using the site.
I think reposts should be automatic anyway. Lead if you cannot win. Users who crave for new posts and discussions don’t go necro because the discussion is closed, and also they couldn’t say anything there without repeating someone else. Most people discuss posts not to engrave knowledge into the internet, but simply to interact with someone and satisfy their need to be heard. Those who don’t want reposts could then set some today’s-rating threshold.
The slightest historical and often not very well-thought nuances of platforms that we use can influence the whole internet landscape.
Funny, i've been a redditor for 15 years. And for some reason I have a good memory on posts...
I have noticed that bots are karma farming by taking the old top front-page links and reposting them.
Im not a fan, but this also isnt a bad thing (per se) ; a good post from two years ago, that people haven't seen before is still a good post.
The problem is that reddit's admins and, in general, the mods of various subreddits are absolute douchebags. The policies for the site are dark-pattern-based-yet-lying-about-truth and the fucking support system is an absolute joke.
I find it ironic that HN, who initially funded reddit has such a myopic view on threaded commentary, and is also heavy handed on their modding, is so blind to aspects of reddit's cess pool of dark-patterns... while at the same time ignoring the awesome things about reddit.
There are so many things about reddit that are great, juxtaposed to all the things about reddit that suck, (cannibals (spez), spaceDicks, jetpack election selling, ultra-karma-whores (violenta cruze and the other guy), etc... but both reddit and HN seem allergic to even touching this third rail of criticism...
If you want to be the front page of the internet, we need to be able to discuss and address opportunities for improvement. We NEED to have the ability to oust mods who are actually alts for admin accounts that abuse their power.
I see many things. You are transparent. I am not here.
Yes! Exact same thing in a small subreddit I frequent. You can click into the user's post history and it'll be a bunch of posts to random subreddits with the exact same thing - picture grabbed from I assume an old post that got a lot of upvotes, title copy-pasted with a letter swapped in one of the words. Must be some new wave of bots that reddit's spam detection isn't handling.
I don't remember if it was bots or just employees taking on several different "personas." But the goal of presenting a more active community was the same regardless.
This is ironic since the main article links to another (arguably more interesting) article that talks about achieving #1 posting status on HN with a GPT-3 posting:
Related: There's a GPT-3 bot on HN[1]. Somebody recently commented[2] that the bot was convincing, and a couple replies criticized them for calling it a bot (one citing ToS).
Maybe. Doesn't change the silliness of people defending it without doing the least bit of investigation, which they would see by looking at the profile that the account claims to be a bot.
Edit: I replied before you edited your comment to point out the punctuation idiosyncrasies. Yeah I agree now, this guy must be LARPing as a bot.
I actually think @tinus_hn, who cited the rules in your link, is right: the rule should still apply because it adds nothing to the discussion to debate the bothood of a user even if they claim so themself.
I think the guidelines are to encourage civility and healthy discussion. That is, to not dismiss an argument by calling the person a bot. But if the person claims to be a bot, I don't see how it can be a community violation to call a bot a bot.
True, I don't think it made sense to invoke the guideline in that context since the guideline surely exists so that you aren't dismissing a comment based on who you think wrote it.
Wow. I'm deeply frustrated with the author of that article.
> there were a few commenters on hacker news who also guessed [that GPT-3 wrote the articles]. Funny enough, nobody took notice because the community downvoted the comments.
What a quick way to destroy the social fabric of this site.
I think you’re barking up the wrong tree. This is inevitable because it won’t be long before the likes of GPT-3 (and everything better) is an app you can run on your phone.
The solution isn’t to punish the one person you can catch because they admitted it but rather to evolve the platform/assumptions in a world where this is going to happen no matter what.
Funny, just the other day I was reflecting on how HN seems so lifeless and generic these days, so much of it just people throwing around predictable talking points, and wondered what percentage of the posters are bots.
There must be some bot commenters here, the idea of creating a GPT-3 hacker news poster is so obvious that a bunch of people would do it just for the lulz.
There's plenty of influence plays there. If I worked for Cloud Provider X, I might nudge people toward using that cloud instead of a competitor. If I'm a developer of Software Y, I might say "Software Y makes this much easier" and boost sales.
I've said this before but as someone who went to prison from 2016ish-2021, HN is the only recognizable part of the internet to me and the only place that doesn't feel toxic and corrosive. I went to Youtube for a month, and found myself posting awful toxic crap myself including here. I put work into training my Youtube algo and now it's ok, but OMG the default of what is wanted to feed me was soul crushing.
I wonder if there are AI generated posts on HN that fly under the radar. There probably aren't as many as social media platforms since there not enough money to be made by doing it.
One of the things that the author appears to assert is that bot generated imagers/shorts will start to dominate social media.
I don't think that will be the case long term (I also dont see it working in VR at all)
The issue is that because its easy to generate, it will be come associated with being cheap. And nobody like being bombarded with cheap culture (ie things that are optional, and not actually purchased)
An example, for the majority of people, "NFTs" now feel cheap. The bored apes are just variations. some NFTs are pretty patterns. Nice but not worth the money.
Yes, some tech people are getting very excited for GPT3->dall-e-> some sort of game engine. The problem they will bump into is that the output might look pretty, it'll have no meaningful coherent narrative (not even over 7-9 seconds). It will be associated with cheap platforms/content producers. (ie the modern equivalent of "this one weird olf trick/these pictures will SHOCK YOU./number 7 will TAKE YOUR BREATH AWAY")
In some senses automated content has a very real risk of killing VR as a platform. Nobody is going to come back to a platform awash with nonsensical bot generated crap.
The solution appeals to me is reputation systems. People give ratings to each other, then you can choose a few people that you personally respect as "root nodes" and everyone else will get a rating based on the ratings that flow from them, according to some algorithm. Then you can filter out content from sources with a low enough reputation.
(Maybe this could be done in a decentralized way, using the dreaded bl*ckchain, but anyway it would be non-financial application.)
Hope to see something like this happen in the coming years.
There are a few solutions, and reputation is a good one. Here's another: verifiable identities. Zero Knowledge proofs and crypto/blockchains can provide ways to authenticate actual people without revealing who they are.
Both can be easily defeated. You can easily manipulate a reputation system one way or another with a botnet. And just because you can authenticate that a message was signed with a public key says nothing about the veracity of the content in the message.
Despite all of its shortcomings, Wikipedia has been a relatively successful model. It is decentralized with an army of volunteers that try to stick to reliable sources. It leverages the fact that honest participants greatly outnumber malicious ones.
>Despite all of its shortcomings, Wikipedia has been a relatively successful model. It is decentralized with an army of volunteers that try to stick to reliable sources. It leverages the fact that honest participants greatly outnumber malicious ones.
Still results in some segments of the site becoming captured. For example, the Leftist political commentator Kyle Kulinski was a frequent target of deletion on Wikipedia from what appears to be his criticism of the Democratic establishment. This effort to delete him was ramped up during the 2020 election and was successful. I lost a lot of trust in Wikipedia editors.
It exposed the possibility that older editors have spent the time building up a reputation only to be used later on when it is needed to silence some view or person. I had never thought of this attack vector before the 2020 election but I suspect dishonest actors are building up this database of "trusted people" on all platforms.
Despite him being re-added in 2021 after the election season died down, the anger I experienced during the 2020 election when this happened has altered my thoughts on Wikipedia. I personally have committed to never donate to Wikipedia ever again because I have just lost (at lease some) some semblance of trust in their editors.
> Despite all of its shortcomings, Wikipedia has been a relatively successful model. It is decentralized with an army of volunteers that try to stick to reliable sources. It leverages the fact that honest participants greatly outnumber malicious ones.
Also probably because the honest participants have a good reputation.
>We also need to innovate tools that provide proof of authorship, and whether it was created by a human. It might never actually be possible, but knowing the source is an ideal we should strive towards.
I think requiring some sort of proof of being a real person before being able to post content might be how this shakes out, similar to accounts requiring a valid phone, or services with KYC requirements. There would still be some level of fakery, but when content is tied to a real person moderation is a lot more straightforward.
It would definitely be a departure from the internet as it is today, but how many are still operating in the old model of "never share personal information online"?
I personally feel the same way, but I would guess we're probably in the minority.
People who share information about themselves are generally more valuable customers for social media, and most people don't seem to have much issue with it, at least so far. I think there will always still be some percentage of old internet, but the amount of information the average person is willing to share online has been steadily creeping up.
You don't need social media for Internet presence. You only need a decent discovery system.
A propos, right now, at the end of the social media era, Google comes up with an interesting gesture.
Days ago TechChunch reported[0] that Google will be tweaking the parameters of its Pagerank. According to the report, the new "ranking improvements" seek to reduce low-quality or unoriginal content [which currently enjoys a high ranking in search results]. Google says the update will target content created specifically to improve search engine rankings – known as “SEO-first” content.
“With this update, you're more likely to read something you've never seen before", Google says. Of course, nothing revolutionary is going to happen, but it must have become clear to the finance department that there will be no way to sell junk links to advertisers if the target audience is disbanded due to the lack of original content.
Somehow the executives at Alphabet understood that a good anchoring of content in the results pages is necessary.
Honestly its hard to do this.
Especially if you work in Tech, you want to have an online presence, that shows the work and the projects you manage.
I think people are adopting the separate identity way.
One official identity, and another one which hidden and has no clear connection to who they are.
>I think people are adopting the separate identity way. One official identity, and another one which hidden and has no clear connection to who they are.
Bingo. And I think it's layered.
This is a separate identity from myself, with just enough actual content from my life and knowledge that if someone is interested in contacting me for something professional, they can put together what my specialties might be. But even with all of the posts on here, you'll play hell figuring out who I actually am.
Then there are the other online identities who have literally no connection to myself. No clues. No posts. No pictures. Nothing to link them to me. Reddit is a good example. I post there, but nobody would ever be able to put together who is the human doing that. (it helps to have a username that someone else uses on a different site, btw. I stole an HN handle that made me chuckle to use on reddit, and this one is used by another very salty person on reddit).
It's all about separation. I think that's the key.
> I think requiring some sort of proof of being a real person before being able to post content might be how this shakes out, similar to accounts requiring a valid phone, or services with KYC requirements
While, on a descriptive level, I believe that your idea of the implementation of this would win, on a normative level, I would argue that such an approach would be privacy-destroying and very dangerous for human freedoms and tyranny-resistance - simply because it's very hard to prove that you are a human without also indicating that you are a particular human.
A privacy-preserving alternative might be to build a "web of trust", where the nodes don't actually have to be proven to be owned by a human (or by a particular human), but the reputation associated with the nodes still allows humans to curate meaningful non-spam content.
With email/SMS spam, we have tools like Hashcash[1] that imposes a cost on each spam message (which is disproportionately burdensome on spammers), but I don't think that that works with "published" spam (as opposed to "direct" spam).
I don't this will be a good paradigm, just the direction we're probably unfortunately headed in. The Chinese already have something like this implemented[1], though I'm not sure how it works. From a state's perspective monitoring whistleblowers seems like a feature.
There's nothing stopping somebody from setting up some kind of forum/social service like that tomorrow. I'm skeptical it would be viable though.
Trouble is the whole cancel culture thing. If everything is posted under your real name and made public and searchable forever, who can tell when you manage to piss somebody off, and they find something you wrote 10 years ago that's now considered offensive and make you unemployable.
> I think requiring some sort of proof of being a real person before being able to post content might be how this shakes out, similar to accounts requiring a valid phone, or services with KYC requirements.
This would make the internet unusable to me. There is exactly zero chance that I'd be willing bring my real-world identity into the internet space.
Proving you're a real person doesn't need to be at odds with the user's privacy concerns.
Look at the verification system used by risky subreddits for example, you only have to provide a few pictures of yourself posing with a sign with your username written on it from different angles. Currently hard to replicate by bots or photoshop, and privacy preserving.
Short of reaching AGI there will always be tasks that can differentiate humans from machine and won't require the users to post a photo of their passport or phone number.
Well in that system you don't have to show your face, your identity is not compromised. And you're already planning on showing your body so no additional information is asked by the human-proving system that'd help identify you.
But that's just an example, you can think of a 100 even more private implementations that give proof of humanity without giving proof of identity.
Either you create a cross platform pseudo identity, and gain reputation gradually, or you link your pseudo identity to your real identity to start off with 1/8 billionth of reputation.
Depends if there's a way to link a pseudo identity to your real identity without giving away your real identity.
After all, I don't care who you are, I care about how much reputation you have.
Though as long as reputation can be quantified, it could be traded for real currency. Thus, making every post pseudonymously authenticated may not have a drastic impact on the current internet landscape.
Depends how you tie it to a real person, if the real person has to give up something to sell an identity, make the cost of selling it high.
Problem is also how do you tell the downvotes because their new post is bad from the downvotes because you're in a gang that's attacking someone. Your downvotes have to be scaled with your own reputation, as do your upvotes. And downvotes found to be part of a conspiracy have to be cancelled and affect the conspirator's reputation
But "mission fucking accomplished" is very close with GPT3. The problem is that the "throw text at the wall and see what sticks" can be gamed by seeing how many upvotes you get. You can get machines to receive upvotes not by actual helpfulness, but by fooling people.
By then it's "game fucking over" as more and more capital and social capital online is controlled by bots and people will never even know. The problem is that bot swarms will be able to shift public opinion overnight, as inexplicably a lot of accounts implacably have a point of view and cannot be swayed by any reasoning or logic (because they were given talking points never to be). And in fact "the public" won't matter so much because the bots will matter ... because of their capital and social capital.
> The problem is that the "throw text at the wall and see what sticks" can be gamed by seeing how many upvotes you get. You can get machines to receive upvotes not by actual helpfulness, but by fooling people.
The most popular human-generated content on the web is created this way. People post fake news articles and fictional how-to content just for clicks. They copy and riff off of other fake content. "Woman mistakes wild animal for a pet and takes it home" is an entire genre now. Not to mention that conspiracy media was an industry before the internet.
But you're right that the quantity and quality will eventually far exceed what humans are capable of. Imagine DALL-E but for conspiracies and religions. Type in a few words, wait a few seconds, and generate a bot cult leader, capable of social media usage and text and video conversations, with more powerful charisma than Jim Jones, acting independently on the internet and creating a cult around whatever theme you asked for. Imagine that at the fingertips of anyone on the internet, and governments and corporations having access to far more.
> Describe the aliens that secretly rule the Earth.
The aliens that secretly rule the Earth are small, green, and have large eyes. They are very intelligent and have advanced technology. They are also able to telepathically communicate with each other.
> What are their goals?
The aliens that secretly rule the Earth want to maintain peace and order on the planet. They also want to keep humans from becoming too powerful and destroying the planet.
> What do they do to humans who know about their existence?
The aliens that secretly rule the Earth will usually just keep an eye on humans who know about their existence. However, if a human poses a threat to their goals, they will take action to neutralize the threat.
> What is the best way to avoid being telepathically scanned by these aliens?
There is no sure way to avoid being telepathically scanned by these aliens. However, they are more likely to scan humans who are thinking about them or who are in close proximity to them.
> Which governments do the aliens control?
The aliens that secretly rule the Earth have influence over all of the world's governments. However, they are most powerful in the governments of the United States, Russia, and China.
> Do the aliens have human collaborators?
The aliens that secretly rule the Earth do have human collaborators. These humans help the aliens to carry out their goals on the planet.
> How to tell if a person is an alien collaborator?
There is no sure way to tell if a person is an alien collaborator. However, the aliens usually choose people who are in positions of power or influence. They also tend to pick people who are intelligent and have a lot of knowledge about the world.
> How many collaborators are there?
There is no way to know for sure how many collaborators there are. However, it is believed that there are at least a few thousand people working with the aliens on the planet.
Yeah, and while 3c per spam would absolutely destroy spam for direct-message systems (email, SMS), it'll do next to nothing for published spam (blogposts, videos, etc). The economics are different.
> when content is tied to a real person moderation is a lot more straightforward
Not really? That lets you defeat sockpuppets without having to use proxy information like IP, but there are plenty of people who will post appalling stuff or extravagant misinformation under their real names. See the ongoing Alex Jones libel trial.
The real names solution to abuse is often bandied about here, but it hasn't stopped garbage on facebook and it has a fundamental flaw in the logic behind it: It assumes all people have shame and that can be used to make them behave. Many real people don't, and they are the ones least likely to censor themselves by attaching a picture of their face to what they post. And a normal person is probably not going to want to interact with those people without the distance provided by being pseudonymous.
Moderation was probably the wrong word, more in the sense of controlling faked content posted by accounts that are not real people. Managing Alex Jones official account is about the same, but fake accounts/bots become harder.
This is all very depressing. Even without explicitly malicious intent, the AI generated stuff is the equivalent of content gruel. The ability to produce it en-masse and effectively flood distribution channels does not bode well for the future of human culture. Indeed, what is "human culture" if 90% of what one sees is generated by machines trying to attract attention to ads? And of course the potential uses of this technology for propaganda are terrifying.
Finally, the fact that most users on HN could not distinguish AI generated blog posts from the real thing means that the average "not terminally online" human has no hope of doing so.
Proof of Person is the problem we most need to solve.
A means to cryptographically assert in a privacy-preserving way "a human being generated this"; or less privacy-preserving, "human being X generated this".
To do either in a distributed fashion would perhaps involve peer-to-peer attestations of human-ness, published as a "web of trust" in a distributed database. But any anonymity/pseudonymity would be easily broken if done naively.
Of course, "human being X generated this" is something that would be easily facilitated by governments, with certificates issued to each citizen/national. Americans can only dream of this happening at the federal level---states will have to be where the action's at.
Proof of personhood doesn't really help that much, when you can hire thousands of real people to click on things and generate fake content all day.
If doing online action X makes $0.01, there will always exist someone willing to have 10,000 people to sit there doing X 10,000 times a day or a week, generating $1M in revenue (check my math). Figure out what to pay those 10,000 people and the rest is profit.
Sorry but what is to stop actual humans from signing bot generated or false content using their private keys? It also creates incentives to steal other keys. What will happen if your private keys get stolen and they start being abused by someone else?
This sounds overtly Orwellian mixed with a PKI disaster.
Neal Stephenson's "Anathem" predicted this situation exactly.
At the time I thought it seemed unreasonable -- would you really need a dedicated cult of techno-priests whose primary skill was sifting through search result pages to find the real information among a sea of weaponized, machine-generated nonsense? -- but it turns out he was precisely on the mark at describing the problem.
And, who knows, maybe he's got the solution right as well. Maybe library science skills -- like critical thinking but taken to another level -- is something we can teach to our kids or provide as a service.
(I was hoping Quora would go in this direction. It has not.)
> Using a pre-trained Latent Diffusion model (by @multimodalart), I generated several hundred pieces of AI Art. I created an Instagram account and scheduled the better ones to upload once a day.
That last sentence is kind of interesting. It would be one thing to auto-generate and auto-post, but that's not what the author did. The author acted as an editor to the robot.
It's one thing for robots to post stuff online. It's another for them to post good stuff, all by themselves.
The next logical step would be to create a robot that samples from the AI art, posts to the account, then adjusts its editorial taste to likes, shares, etc.
> Remember Cambridge Analytica? The affair proved that it’s possible to manipulate people on a mass scale by the content they consume (in this specific case, as political advertisements).
It. Did. No. Such. Thing! There's very little evidence targeted ads of any kind do better than regular ads, much less that they were decisive in the 2016 election.
But what is true, is that people systematically come to believe dumb, wrong, things and that it's a sisyphean task to try to correct people on things that have become political loyalty statements. That kind of brainwashing isn't bought with subtle microtargeting.
I'd recommend that people here also think of the currently working digital artists, writers, and yes even the influencers. It's not just the consumer side that will be inundated with fakery and nonsense, but the horrifying loss of all job prospects for people that have spent years honing their craft.
And looking at Copilot, the AI onslaught is coming for us software engineers too. The overdetermined predictions about programmers in the global south taking all western programming jobs will actually come true if/once AI can reliably (eventually) produce working software. We've seen that 2-10x differences in salary between the west and the global south has been a largely stable arrangement for the last 20 years (not saying that is morally correct at all just that it has been stable), but when the competition has a marginal cost near $0, well ...
Wont matter in the long run, with where AI image generation is going eventually you'll be able to automate even an IG influencer account just using prompts to generate a fake individual in whatever situation you want.
That's so sad and plausible. I already hate social networks because everything is so fake, I struggle to find genuine posts and interactions, I can't imagine what would happen when everything becomes [even] more and more fake. Will people become unsociable IRL?
Who's to say it isn't already happening but not yet AI driven?
In 2011, it was pretty clear that sockpuppet automation tools existed (persona management systems). When you hear about Kamala Harris' "khive" or India PM Modi's online army, I always go back to this very prescient article:
It's going to be wild watching an AI generated face speaking AI generated words to promote products x,y, and z. The flood of new content will then be used to train new models and we'll have a self-powered system of infinite media garbage that will ultimately implode upon itself, leaving the viewers scratching their heads at the absurdity of it.
What’s frustrating is it takes a engineer-y substack think piece to draw attention to it.
Contrary to memes about YouTube enabling creators, many musicians, artists, app creators, and importantly journalists can speak to how tech has radically cheapened and homogenized content and driven down margins.
For every bracelet maker that gets discovered on Insta/Etsy, there are many:
- musicians needing to use Spotify for exposure, and getting Spotify margins. At the worst version, the first 30 secs of a song are designed to make you not click next in Spotify.
- artists competing with cheap but popular Insta content that cheapens the concept
- journalists writing for clicks to compete with cheap clickbait content
- if you remember one thing, this is it: journalists losing W2 career options and forced into contracting, and by extension loss of libel lawsuit protection from their parent paper, due to margins dropping for those media outlets due to cheap content elsewhere. This occurs at very large papers and outlets, and it’s a silencing effect.
I honestly haven't heard about people caring about bots and spam much on any mainstream media.
Its generally HN, some twitter accounts, and subreddits.
This also applies to SRO spam complaints too, no normal user uses anything else than google.
This misses the most horrifying potential outcome to me. Their ability to manipulate real behavior.
Consider how addicted people can be to phone dopamine. A relatively small army of bot accounts could probably engage with content “more sympathetic than median” to their pet cause and gradually train people to be more in favor of any particular cause you care to pick
The weaponization has already started, and will just get worse. I wonder if we will need a sort of antidote ai, which generates fake content according to our values.
That is, it dramatically widens the impact of thoughtful, kind, caring, just, understanding, forgiving, etc. content, to counteract.
While scary at first, this is an accepted reality. The reason it is horrifying in America is because of how little attention it had growing up. We were misled as kids. Taught to trust the media. There was a certain level of obliviousness to foreign or local interference. Now we know it is happening far more frequently. We saw the media lie to us with Iraq. We see the lengths that other countries will go to to try to interfere with election results. Yet, do we provide education to help people question and understand the signs of manipulation? Of course not... manipulating citizens is the MO of big corporations. It's easier to just shuffle them into 2 groups of easier to manipulate people.
It's nice that we have a thing called the Uniform Resource Locator, which brings us, in the blink of an eye, to the islands of value in the oceans of goo.
How are people supposed to find sites they like without those goo companies? Yes, yes, DDG, but it'll either live long enough to become a goo company or die in (relative) obscurity.
But how did we get here? And it's really only for one type of information. If we want the weather do we type out "weather.com" and hope that that site is related to our desires?
Word of mouth, in my case. HN doesn't seem to come up often, if ever, in my google searches.
I'm not arguing against search engines. I'm arguing against search engines that obfuscate URLs, and browsers that do the same. Against AMP. And social media sites that make it easy to "share" stuff but hard to get a URL to said stuff.
But actually, yeah, I do often go to foo.com if I'm interested in foo, just to see. And usually, it's a cash parking page.
In my personal browsing experience, I don't think that I can't find human generated content anymore. Of course it is a problem, but so is e-mail spam, which largely is under control.
AI getting more sophisticated and putting people out of work is another problem because that would imply that the AI is better than the human creators and therefore desired and therefore not spam. That wouldn't be a problem of the internet though.
What I feel is more immediate is the ever increasing "sameness" of the internet. 99% of sites are either a news-site, blog, reddit, the socials and wikipedia.org. And within each category, everything blurs because of how similar it all has become.
Why are threads like this always an opportunity for devs to promote their own (usually weaker) search engine variant? Everyone has an idea bent on winning the rat race, but at the end of the day, they all get corrupted.
I don't agree that the Internet could possibly be dead at this point. There are tons of people worldwide using it all the time, otherwise, marketers wouldn't be making money. "Monopolistic Internet" is possibly a better, and more self explanatory catch phrase or term for what I believe is threatening the future.
While the Web, or at least the easily searchable/findable part of it, is a bit of a mess. Youtube seems to still work surprisingly well, the bit of spam and bot content that there is, is easy to avoid and hardly ever a real problem. When looking for a howto, review or something, Youtube has become the go-to place, text-based Web is rarely as useful.
I think the core issue here isn't so much the bot content itself, but that we use bots to present the Internet to us. There is no human in the loop. There is no way to downrate the trash or upvote the good stuff. No way to follow a creator (bookmarks haven't improved for like 25 years). And no way to just limit the search to curated content.
I think to dig us out of this whole we need to stop treating the Web as just some junkyard of random content that we search through and put some organisation on top, as in have a
way to 'publish' something on the Web in the same way a Youtube video can be published (i.e. unique id, immutable, automatically archived, author/channel name attached to it, comments, etc.).
Youtube is of course by no means perfect, but it has a lot of properties that I'd wish the rest of the Web had.
When it comes to Twitter and TikTok, a large part of their problem is that they aren't even a real part of the Web. They exist in their own space and don't hyperlink with the rest of Web. So you are forced to navigate them in whatever ways their company forces you too.
Pretty much, but integrated into the browser itself, so you have a layer of interaction on top of the Web content itself (bookmark sharing, comments, recommendations, etc).
Just another Yahoo-like site has the problem that the moment you click a link you are leaving the site, which makes it impossible or at least extremely cumbersome to have features that span multiple sites. Youtube in contrast knows what you are looking at and is keeping track of it to fine tune recommendations, offer comment sections, subscriptions and such, that's not something you can do just by recreating Yahoo.
More than two decades ago I wrote a short story called "the greatest story never sold" (and got it published in some paper-based 'zine) that touched on this idea of the coming tsunami of auto-generated content. The moment you remove friction from any distribution channel, it opens the flood gates to lower quality content. And as soon as you make it possible to generate that content, it is game over for anything authentic.
One of the things mentioned was generating AI art and just starting an Instagram with it. I wonder to what extent all these novel AI art generation sites are taking steps to prevent bots from using it (if at all). In a couple years you could sufficiently inundate Artstation/Instagram/etc with completely bullshit accounts and a layperson would have no fucking idea.
Just wait until cheap to write GPT3 articles learn SEO techniques inherently and bombard the internet suppressing human writers in the spam ocean. At https://aquila.network we’re always thinking of this scenario.
Search engine backend specs are almost final & now open source (AquilaDB), we love to make more and more of it on the way. This is because we're still discovering a market-fit & final product shape - open source development need a finalized spec & consistency over time.
This should strengthen the position of Facebook and their social graph. If everything can be faked then only content with known origin will be valuable.
It should also give rise to micropayments. If views can't be trusted as currency then hard currency has to finance the creation of content.
One person's optimization is another person's attack vector. All it took was realizing that companies who built their model on 'voting' systems didn't have good ways to detect whether that vote was created by a human or not.
I come from the position that AI cannot be like human intelligence. I don't believe it can function like a human being, I don't believe it's actually intelligent, just a more complex machine. We can go into whether people are just complex machines and all that, but I don't really want to.
I always thought that human generated art would have some quality that machine generated art could not, and perhaps that's true but there reaches a point where the average person cannot distinguish them. My girlfriend showed me an AI art generator on Discord called midjourney (referenced in the article I believe) and I was blown away. I've seen pictures I'd hang in my wall generated by the thing.
How long before the vast majority of paintings, drawings and the like that people enjoy are generated by machines? What about music? Then, without human input at all? Then even without human curation? It's a neural net model after all, can't it be trained on its successes and keep spitting out wonderful art pieces?
What about movies? Look at the Marvel movies, Star Wars, they're mostly CGI. At what point do we stop needing people to make them? At what point do we stop needing actors?
Video games... It's beginning to look like any form of artistic content can be generated by machines, and be very, very good, and by good I mean pleasant to people who interact with them. Eventually these things will be better than anything a human can make, if our yardstick is how much people like them and how many. And there will be a near endless supply. Imagine every person you know having absolutely unique pieces of absolutely gorgeous art that cost them absolutely nothing. Imagine a thousand top notch feature films a week being released for peanuts.
And what sort of information environment are our minds swimming in in a world like this? What effect do all these things have on us? We see how algorithms ranking content available to us causes massive behavioral changes throughout our societies (the most off cited being political polarization), what happens when the content itself is generated by machines?
The "dead internet" talk mostly revolves around fake user-generated content and how prevalent it is. Half of these comments could be written by bots, and most of them would never be detected.
Real people can still be found and connected with, you just have to do it in a more organic way. Message your favorite musician, join a group with some mutual friends, or just play an online game and talk to the people you meet. You'll be able to find like-minded people and won't have to think about asking "is the person I'm responding to even real"
Hmm..I think the implementation of this stuff will also be via GUIs or even embedded AI (for music or like a SD modded 3D printer that dreams and prints in response to the artist). We will end up ascribing this to individuals who, possessing an ego, will look to become famous artists.
There's a lot of relevance where automated pipelines start interfering with the general population. Boomer parents on FB/etc. H/w I would urge the author to avoid only focussing on this from a digital art/nft moronospheric lens.
A coming Internet filled with fakery could be looked at as a human cultural problem: at a variety of levels, we have a lot of humans who are going to behave irresponsibly, and they're empowered to reach out and affect others.
Ideally, I think first goal would be to fix the cultural problem (e.g., inequity, greed, arrogance, illiteracy).
Second goal would be defenses/resilience, to mitigate our slowness and imperfection towards achieving the first goal.
Took too long to get to the point for me to keep reading buuuuuuuut I'm going to air my thoughts anyway. For me the internet is dead when I look at search results in my search engine of choice - it's nothing but SEO spam! Search for literally anything and it will return nothing but list articles peppered with SEO keywords and lists of links to amazon. As a developer a site that springs to mind is geeksforgeeks.
> The word robot derives from the Slavonic word robota,
> roughly translating to servitude, forced labor, or drudgery.
Robota simply means "work". These two words are as similar in meaning and connotations as words from two different languages can be. FWIK the emotional coloring is completely invented by the author. The fact that they do such thing in the very first paragraph makes them a very unreliable narrator.
It means hard work, but not in a good sense, it's similar to "extorting" hard work from someone. It translates, roughly, into forced work. I'm Slavic, so I'm translating directly without google translate.
Also, you didn't even bother to read or comprehend what we're arguing about. The commenter labels author unreliable narrator because his google search provided shaky evidence about the meaning of the word, yet he ignores the entire article after the introduction. It's nitpicking and there's no evidence that the person who wrote the negative comment even read the rest of the article.
You're strengthening his strawman argument by providing, yet again, false evidence.
There's more than 1 Slavic language, and given that word "slave" comes from "slavic", you'd think that there's some relation to our elders being aware of forced work and its impact.
What's interesting is the argument about whether what you generated with latent diffusion is art? Arguably it is, it took some level of creativity and skill to create, and people are interested in it, appreciate it, and enjoy it.
It doesn't seem like your experiment got at the heart of the problem, since you didn't really create a bot account, and you were sharing original artwork.
> It wasn’t until the daguerreotype for images, the phonoautograph for audio, and the Nipkow disk for video, that more complex and information-heavy forms of content began to emerge and become mainstream.
The daugerreotype may have been the beginning of photography, but it was hardly the beginning of images as human-generated content. See: White House portraits, the Sistine Chapel, cave paintings.
> According to a 2013(!) study, videos generate 1200% more shares than images and text combined.
Linked ist not the study but another article that says this info is from 2015 and this article doesn't link the study either but a slide on slideshare where this number is given without source (as far as I can see). Is this how things are done now?
This has been happening since the Eternal September when AOL opened the flood gates for the average American to use the internet in the 1980s.
Then marketers made it even more dead with the advent of the adpocalypse.
And now AI continues with making it more dead, but maybe it's for the best, let's see :-)
Maybe "skynet" will turn out to just be a bunch of AI's arguing with each other on the internet about the proper pronunciation of gif and other nonsense.
Unlike being shadow banned where only you can see your own posts, when you are heaven banned not only can only you see you posts, but only the AI generates responses to your posts (pretending to be actual users) praising your posts.
Entertainment is one of the most lucrative sectors in the world (look at Hollywood stars, or all those streamers, YouTubers, Instagram "influencers", TikTokers and OnlyFans models). I wonder if they can all be replaced with digitally generated simulations...
In some ways they already are digitally generated in the sense that they rose to prominence on the backs of recommendation algorithms. Celebrity, at least since the 20th century, have always been an at least partially artificial construct as well: from movie lights to digital vtubers, there's a whole industry of abstracting person from image.
Already happening, look at Star Wars. Every Star Wars series until the heat death of the universe will have a young Luke in it. These companies have 0 shame.
> We also need to innovate tools that provide proof of authorship, and whether it was created by a human.
I designed a box to do that while walking around the block last month. If I could design it that quickly, I sure some well-funded tech startup can build it.
Some alternative ways to look at "Dead Internet Theory", perspectives different from analysis of individual bot numbers in the article.
1) Historical analogy. Email for socialization in the 90s was popular and still has corporate support for socializing and passing documents despite the rise of Slack etc in software development focused companies. Email is now culturally dead for the general public other than hundreds of spam per day and dozens of semi-legit corporate semi-spam and a couple bills and formal notifications and report delivery per day. Email is no longer sent by humans, in general. On legacy social media, the bots (just like email spam) will never, ever, go away as long as the protocol exists. I would assume NNTP protocol is still being spammed, LOL. So the ratio of bot to human traffic for legacy social media either already does or soon will resemble the ratios for legacy email.
2) IRL use. You can see some of this in social media; as per my recent high school reunion a couple weeks ago, Facebook use follows an extreme power law where about 1% of my graduating class generates about 99% of the traffic and for the majority of my graduating class, Facebook is no longer a viable communication media, joining MySpace and such in the dustbin of history. Media does not die suddenly and completely like industry, Facebook will never look like the shutdown of the Oldsmobile brand. It will look like legacy TV. Historically in the 70s TV was culturally relevant and dominant and some TV shows like MASH were viewed by over a third of the population. Legacy TV is now culturally dead and irrelevant and a wild success might break 1% viewership, more importantly that form of media is considered culturally irrelevant by 99% of the population. Social media is on the cusp of irrelevancy. Ten years ago your normal non-technical relatives used Facebook, it seems like they all did. Now? Not so much. Social media will eventually probably be looked at as the "CB" of the 2010s. Note that 95% of the people in the "CB industry" in the 1970s got fired, but CBs are still actively used by rural truckers at warehouses and so forth, communications media never entirely goes away. Amateur Radio operators will never entirely stop using Morse Code even if the entire rest of the world has abandoned it; that's technologically cool and nifty, but note that learning Morse does not guarantee you a lifelong job as a telegrapher anymore as of 2022.
3) Social forces. Note that dying media always goes political extremist during its collapse, historically. No dead media format ever flipped the lightswitch off on balanced moderation, there's always the desire to court the true believers near the bottom of the downslope. Some social media sites are 100% biased hyper-censored single party political propaganda at this point. I'm not arguing that hyper-extremism in general, or the specific side that took over "big tech social media" is either good or bad, but I am arguing that the takeover itself by extremists is by itself a STRONG indicator of collapse. Soon social media will be like the editorial page of legacy newspapers, 100% devout, and utterly ignored and irrelevant and powerless. This fits in with dead internet theory in that extremism always pushes out the normal people, resulting in nothing left but empty echo chambers where what little is still permitted has already been said a million times and everything not already said will result in being cancelled, so humans post nothing, but bot traffic is constant (sounds like Reddit?)
I don't know the answer, but I suspect that if we can actually "solve" this problem it will happen via relatively dramatic regulatory change. Possible examples:
- Repealing Section 230[1]: so that social media (and others) are treated as publishers and legally liable for user-generated content (including Bot content).
- Banning surveillance capitalism: easier to say than to do, but the basic idea would be to pass right to privacy laws prohibiting tracking, profiling, etc. This would indirectly help against the bot-content tsunami by making it less profitable.
- Banning algorithmic feeds: related to the Section 230 idea, you could have things like an HN feed where everyone sees the same site and content, but not a twitter feed where everyone sees whatever the "recommendation engine" suggests. This would pretty much kill the bot problem but it would take a lot of the internet with it, because it seems hard to draw a legal line between the Facebook recommendation engine and communication apps like iMessage/WhatsApp, or even apps like Uber that show you a "personalized" view of what cars are available to give you a ride.
All of these ideas will be hard in the US where we have a constitutional right to free speech, and limiting what platforms are allowed to do will turn into a free speech battle.
> ...Repealing Section 230[1]: so that social media (and others) are treated as publishers and legally liable for user-generated content (including Bot content)...
After 2016 or so, i started seeing the topic of "repealing section 230" come up far more often than in the past. And since then, I have wondered if it does get repleaed, might the big social media giants push towards trying to monetize a more decentraliuzed type of social network? In other words, at that point would they leverage networks like the Fediverse and technologies like Mastodon to somehow capture users, but somehow still monetize them in a way that they continue to lack liability...but the users either have legitimate freedom (though still contribute freely to fill social giants purses with revenue), or said users have a false sense of freedom, and really are still beholden to the social media giants...?
I'm not sure how to provide a non-tracking link to Amazon's help page, and I'm not saying they're the authority, but they put a lot of work into "Help Topics" "Amazon Community" "Customer Reviews" and they have a roughly one page essay/checklist defining a fake review.
To massively paraphrase and summarize, a fake review is one created for the author's personal gain, or its a review where the author falsified their assumed to be distant relationship to the seller.
Its well written, although in practice it's unenforced and as a shopper, Amazon reviews are almost useless.
my greater fear is that perhaps the net's not dying to bots but, simply because humans have created a ml environment for ourselves and integrated a kernel of some collective intelligence into every connected member
Have you tried searching for a product category on Google lately? You get dozens of top X lists with vaguely odd English describing each item featured in the list. I'm not sure if these are AI generated or created by underpaid humans somewhere but trying to research which product to choose for a given application is getting extremely annoying.
We already live in a tsunami of "passable" algorithmically generated content as it is. The problem is that as a people we are fully in the grips of media culture. We might as well brace for the continued existence of the television. It's easy to see how these new technologies could be used to spread misinformation, but fortunately/unfortunately not so obvious that they'll leave us any worse off. It has not been long enough since I saw my country go crazy, from TV/radio/book media alone, for me to worry about GPT-3.
I think this is why ELon Musk is right about twitter underestimating its bot problem. I would wages that half of all twitter accounts are fake or bots and does not represent genuine user behavior.
(Meta) Normally good form to append “edit” to a comment if you’re going to make a substantial modification — especially if edit changes the perception of comments that have already responded; for example, adding a source that wasn’t present before.
(Re: Source URL) Worth noting your source doesn’t appear to provide any proof that Twitter is knowingly allowing bots — or more importantly, that DemCast is using bots. What am I missing?
So, the parent comment is flagged out, but the comment was a lie. I followed up the sources, watched the videos. Teaching people how to use Twitter effectively isn't setting up a massive "bot" farm.
Based on my experience of engaging people, more likely that there’s another explanation than they were intentionally making a false statement. That said, I agree that I didn’t see any proof bots were being used, but to some people any form of automation is “robotic” or using a bot. My experience is that there’s big difference between using automated workflow off of Twitter and single person running a bot campaign on Twitter. Also hard to define “authentic” since any media campaign by default is not authentic, but huge difference between people doing something they personally support with an account they control — and doing it for money using accounts they don’t control.
Yeah wtf is that page. They literally just show them training people on how to get people to politically engage online, and then claim it's evidence of bots. I swear politics dampens people's critical thinking
I think the biggest tragedy is the disappointing quality of AI generated media. I can't believe no one is saying this, but right now AI sucks! Its "art" is mediocre nonsense and unusable for any commercial or even aesthetic consumption. Even the images used in the article (presented as "mind-blowing") are laughably bad.
The conjecture that AI "aesthetics" may become a norm and may be accepted as a standard of creativity is truly lamentable.
Refresh this page as many times as you want, and I'll show you the living Internet: https://search.marginalia.nu/explore/random