The coming tsunami of fakery

marginalia_nu · on Aug 25, 2022

The Dead Internet is a mirage caused by the unreasonable effectiveness of digital marketing. You simply aren't able to find the living internet behind all the nonsense. That doesn't mean it doesn't exist. You just need to get off social media and google to find it.

Refresh this page as many times as you want, and I'll show you the living Internet: https://search.marginalia.nu/explore/random

TheRealDunkirk · on Aug 25, 2022

Au contraire. Yes, the beauty of the late 90's-early 00's internet was Alta Vista, and then Google, which allowed you to find the parts of the living internet that were interesting to you. Now, search engine results will almost never serve you up a single-user's web site. They will dutifully shovel you to one of the dozen walled gardens that has become "the internet." The search engines themselves have become the gatekeepers of this dead, corporate internet, and there's no breaking out of it.

Terretta · on Aug 25, 2022

>> The Dead Internet is a mirage caused by the unreasonable effectiveness of digital marketing…

> On the contrary… The search engines themselves…

With you 100% except for the opening rebuttal. What do you think /caused/ search engines to devolve like this if not digital marketing?

I pay cash for kagi.com, and recommend it.

Engineers should try their “lens” approach. I’d pay more for trusted curated lenses, and hope that’s in their model. The site above could offer a curated list of valid sites, and then I’d find them in the one engine too. (See Similar Projects on Marginalia’s About page.)

I also pay for Neeva, but they’re clearly trying to have their advertorial cake and eat it too. Still, it’s a better resource than Google when seeking an actual product.

I worry that solving digital marketing’s ‘unreasonable effectivness’ requires more than just ability to subscribe to content without ads, it should be possible to buy products without marketing budget built into the cost. Lower cost products would outcompete those spending money on ads, so all else being equal, enabling products to compete without marketing budget is the only solution I see. I don’t think a Neeva solves this by itself, though it’s likely a necessary component.

disqard · on Aug 25, 2022

Another vote of support for kagi.com -- it's well worth the payment.

Our (collective) disinclination towards paying for things on the Internet is what has led to the "everything must be monetized via ads" local maxima we're now stuck in.

If you care enough about this state of affairs, and can afford to do so (most people here can), then please consider paying for parts of the Internet that are important to you, like a search engine.

Nullabillity · on Aug 25, 2022

Paid search engines sound like they'd make filter bubbles even worse though, no?

Karrot_Kream · on Aug 25, 2022

When you have as much information as there is on the internet, any attempt to sort through it creates a bubble. At least a paid search engine allows feedback between the users and the algorithm and some confidence that the engine won't go under because the authors lost interest/got tired fighting abuse for free/etc.

mulmen · on Aug 25, 2022

The filter bubble is the value proposition. I never want to see results from spam sites like Quora or expertsexchange.

Nasrudith · on Aug 25, 2022

Have you seen the crap outside the filter bubble? It is trendy to call filter bubbles the well from which all sorrows are drawn but not everything is worth reading, no matter how many drink the koolaid.

warble · on Aug 25, 2022

This isn't scientific obviously, but I feel since I started using Kagi, my scope has widened significantly of what I rely on for results. Another +1 for what these guys are attempting.

You pay for search one way or another, I'd rather be direct about it.

xboxnolifes · on Aug 26, 2022

You're assuming you're not already in the worst possible filter bubble: advertising incentivized curation.

kbenson · on Aug 25, 2022

It depends on what the customer base is paying for and expecting. If they expect a bubble and complain about it not being there, they'll likely get a bubble. If they complain about the bubble being there, steps will likely be taken to reduce or remove it.

For better or worse, the direction of a paid product is usually fairly well defined, as long as they've taken time to understand their customers.

spullara · on Aug 25, 2022

Their first result for their first example search, "best headphones" is from 2014. Google results for this are way better.

warble · on Aug 25, 2022

Unless the best headphone was designed in 2014?

slg · on Aug 25, 2022

The release date of the headphones is not the important data point there. The date of the article is what we care about. The article is 8 years old and therefore excludes thousands of new headphones that could conceivably be the best. Even if today's best was released in 2014, a 2014 article pointing to that same pair isn't a very authoritative source when shopping for headphones today.

mccorrinall · on Aug 26, 2022

Headphones don’t age. It’s perfectly fine for such an old article coming up, especially if gets referenced on prosumer forums such the r/audiophile or r/hifi subreddit.

Sennheisers headphone division got eaten by its own success: the sennheiser hd 650 is so durable and has such a great soiund quality, that people just aren’t switching away from that 20 year old headphone.

In case the link you mentioned isn’t about that mid range headphone, it’s probably about the beyerdynamics T1.

Besides talking about audio equipment: I hate the sites on google who update the dates of their articles although the content wasn’t changed. Happens way to often. I googled the release date of BOTW2 a few days ago, and 3/4 of the search results were blatant seo spam where the initial article was about something else, and then the headline and date were changed in order to get more traffic from google.

Imo you are setting the wrong priorities for a search engine.

slg · on Aug 26, 2022

>Headphones don’t age. It’s perfectly fine for such an old article coming up, especially if gets referenced on prosumer forums such the r/audiophile or r/hifi subreddit.

But an outdated article doesn't guarantee this to be true. For example, maybe the manufacturer released an updated version that is a better value proposition and kept the old model around to have a more budget friendly option. Or perhaps another company purchased the manufacture and demand they cut costs. Or maybe this model is new and people haven't yet learned that there is a specific part that frequently fails after a few years of use. An old article can't speak to these hypotheticals. It doesn't mean the article is wrong. It just means that the article is less informed than if it were written today giving the exact same recommendation.

>the sennheiser hd 650 is so durable and has such a great soiund quality, that people just aren’t switching away from that 20 year old headphone.

But this is only something that can be truly known after those 20 years.

>I hate the sites on google who update the dates of their articles although the content wasn’t changed.

I agree, and while this is a related issue, it isn't really the same problem. It is a failure in Google's anti-SEO features. They don't need to trust the date on the article. They could compare cached versions of the page to see what changed besides the date.

freediver · on Aug 26, 2022

> But an outdated article doesn't guarantee this to be true.

Nor does a new article on a "review" site monetized with affiliate links guarantee it. So who do you trust more? Older but honest review from an expert or a new affilate driven review? Kagi choses the former as likelier to be more valuable to the user in this case.

lWaterboardCats · on Aug 26, 2022

The issue still is that there objectively could be better headphones released after 2014.

If I saw a photograph of all the cellphones from 2004, I can pick out the best cellphone, this doesn’t mean that the best cellphone from 2004 is still the best cellphone.

It’s implied that “best” usually means what is “best” for what people need today as those needs evolve drastically over time and especially with tech products.

slg · on Aug 27, 2022

Why is that the choice? You are assuming an expert has not written an honest review of the headphone market in 8 years.

freediver · on Aug 27, 2022

It would presumeably surface in the results too.

freediver · on Aug 25, 2022

This is actually a feature. Try answering the query 'best headphones' yourself.

jkestner · on Aug 25, 2022

Must be the peak. That's when the first "true wireless earbuds" came out.

time_to_smile · on Aug 25, 2022

Another happy Kagi customer here.

As a data scientist, just being able to block Towards Data Science and other garbage DS content churned out by amateurs to get their resumes boosted is well, well worth it. It's ridiculous how much top ranking content on Google is flat out technically incorrect, or at least clearly misunderstanding the subject.

skrtskrt · on Aug 25, 2022

Also a big fan & paying customer of kagi.com - the only setting adjustment I've made so far is pushing docs.rs up in search results, so I get that instead of crates.io when looking up Rust stuff.

To me it falls into the category of IntelliJ products, where it makes my life and productivity so much better that the price is a no-brainer

int_19h · on Aug 26, 2022

What worries me about Kagi is this bit in their FAQ:

"... it costs us about $1 to process 80 searches. ... An average Kagi beta user is actually searching about 30 times a day. At USD $10/month, the price does not even cover our cost for average use, and we are basically betting that average use will go down a bit with time because during beta people may be searching more than normal due to testing etc. Our goal is to find the minimum price at which we can sustain the business. If it turns out that we have more room we will decrease it. But it can also be that we may need to increase it."

I went and looked up my Google search history for yesterday - it's 40 searches. I'd expect it to be above average, but still... if it's $10 per 80 queries, it feels like $10 is likely to be too low to be sustainable. And while I personally don't mind paying more, I wonder how many people will - and what it'll mean for the service long term, if they just can't attract enough people to make it worthwhile.

mike10921 · on Aug 25, 2022

Completely agree. Just a few years back before I was using a password manager with random passwords I googled one of my commonly used passwords. I was able to find leaked databases of passwords etc. The other day I did a similar search and absolutely nada.

When I search for technical information 2 out of 3 times I get a website that I must pay to view content.

The internet is clearly going in a bad direction and most average joe users are suffering and will likely suffer more in the future.

mrsteveman1 · on Aug 25, 2022

> When I search for technical information 2 out of 3 times I get a website that I must pay to view content.

I usually get thousands and thousands of cloned websites that were likely set up in bulk using a template. They copy-paste just enough text to produce a search engine hit, while the real website it came from may not even be in the search results no matter how many pages of results I click through.

And then there are the elaborate clones of Github content, Stack Overflow, and various other technical help websites, all designed to make it look like all of those discussions are happening on the clone rather than the original. Some of them include a link back to the original, some don't. I get why some of those websites are ok with their content being openly reused (not that spammers care anyway), but in practice it destroys discoverability of their own service and wastes people's time.

Pinterest has spread through Google Images like a virus, they're plastered all over the results for searches that clearly aren't from boards made by real Pinterest users. I doubt it's a 3rd party spamming Pinterest because the only entity who actually benefits from it in practice is Pinterest itself. They've changed their onboarding pattern a lot over the years, but at one point it was virtually impossible to click through to the original website at all before the account creation popup blocked everything else.

Putting Pinterest at the top of image search results is effectively nothing more than a funnel to onboard more users for Pinterest, they rarely, if ever, have any relevance. I can't imagine why Google hasn't knocked them out of the results entirely at this point.

Whatever they're doing to combat actively hostile spam websites is either failing or they simply don't care anymore. The end result could not be more obvious.

alvah · on Aug 26, 2022

"Putting Pinterest at the top of image search results is effectively nothing more than a funnel to onboard more users for Pinterest, they rarely, if ever, have any relevance. I can't imagine why Google hasn't knocked them out of the results entirely at this point." Absolutely. Pinterest makes Google Image Search much worse than it otherwise would be. I haven't clicked through to Pinterest from an image search for a while, but IIRC you couldn't click through to the image without getting 1. pestered to login and 2. having to scroll through multiple pages once you logged in. I have a vague recollection of this not being compliant with Google Webmaster Guidelines, but I can't recall which specific section, and it's not as if Pinterest is the only large player allowed to get away with breaches the average webmaster would be deindexed for.

mulmen · on Aug 25, 2022

Pinterest is the genetically modified dinosaur clone theme park of the web.

The engineers were so preoccupied with whether or not they could, they didn't stop to think if they should.

jjoonathan · on Aug 25, 2022

Agreed, the long tail is disappearing from google.

Between datasheets and old cringey fanfic of mine, there are more and more resources that I am aware of that absolutely still exist on the internet, with reasonable robots.txt, but can't be coaxed out of google even with exact snippets.

Scoundreller · on Aug 25, 2022

Some time ago, google seemed to start really favouring corporate content.

It used to ensure most searches would have a few blog results, a wiki link, some large corps, some small corps, but that’s fallen apart.

I know this for the wrong reasons. I used to publish pages for my bank’s phone numbers because… I’d just publish their phone numbers.

While this is kinda a bad idea, now searches will give you 10 links to the bank’s own website and they make it difficult to find a number because they don’t want you to call them.

hattmall · on Aug 25, 2022

I have been trying to find what refrigerant my vehicle uses for 2 days. It is absolutely insane that I can't find anything but spam content.

bombcar · on Aug 25, 2022

I often drop to YouTube videos about the vehicle (but check a few to make sure they're consistent, there's bad info there, too).

RedShift1 · on Aug 25, 2022

There should be a sticker somewhere in the engine bay with type and quantity.

hattmall · on Aug 25, 2022

Yeah, this drove me nuts looking for it, but then I remembered car was in a wreck and I think hood was replaced, no sticker. Manual only says contact qualified service technician. I'm pretty sure it's R1234yf, but still would think it should be something to easily find online.

RedShift1 · on Aug 25, 2022

Sorry for mansplaining if you already know this.

If the car was in an accident and the aircon doesn't work anymore, it means the gas loop is leaking. You can try to refill it but depending on the size of the leak it's going to work for a few hours to maybe a couple of days. You should evacuate the loop and do a vacuum test. If it is leaking, refilling the system with some added dye can show you where it is leaking. The Schrader valves are the usual suspects but as the car has been in an accident it could be anywhere. Adding refrigerant to a leaking system is just blowing away money that could be used to actually fix the aircon properly.

westmeal · on Aug 25, 2022

Mansplaining? Really?

aquaduck · on Aug 25, 2022

Well, the GP was specifically complaining about how he's tired of all the bot-splaining.

traverseda · on Aug 25, 2022

Maybe it's short for manual.

djbusby · on Aug 25, 2022

It's also in the manual. Searching for Make+Model+Manual has usually found it for me, then use the index in the manual to find the right page.

nix0n · on Aug 25, 2022

AutoZone has a database of that type of thing. Search for the product and filter by the vehicle, or you can just ask at the desk if that's where you're gonna buy the refrigerant anyhow.

sokoloff · on Aug 25, 2022

Post-1995 and old enough to be out of warranty? Overwhelmingly R-134a.

If you can’t find a sticker (or if that sticker says R-12, it still may have been converted), unscrew the cap on the service port and match it up to the type of port used by each refrigerant.

rainbowzootsuit · on Aug 25, 2022

There is supposed to be a sticker per SAE J639 as the other person noted.

If it came off then I'd suggest calling a dealer parts department with your VIN and they should be able to get the information.

baryphonic · on Aug 25, 2022

I had this experience the other day. I was searching for a #define constant I knew was in the Linux kernel source at least once (I was staring at it from my local clone), but none of the search engines returned anything. It's infuriating.

lexfors · on Aug 25, 2022

I usually find what I need regarding kernel source here:

https://elixir.bootlin.com/linux/latest/source

mike10921 · on Aug 25, 2022

When google was pure this would absolutely not of happened.

mayoi · on Aug 25, 2022

Who stops people from running their own search engines?

As if you cannot look up address range of your own country then crawl your whole country for websites that may be hosted by people living locally.

As if you cannot do the same with a foreign country that interests you.

Maybe you could even find a list that only shows residential IP's so you're sure to be only finding webservers ran by individuals and not corporations.

And if somehow "port scanning" by trying to send a http request to a residential IP is illegal in your dystopian country, you can always start by scraping the site that you're interested in, there will always be at least one more link to another domain somewhere.

For large scale servers python is shit, but that doesn't mean that you cannot spend few weekends writing your own python crawler for your needs, which is so easy that you don't need to be a programmer to do it, and if you really care about this at all, a bit of a startup hurdle won't make you immediately disinterested.

And if it really does, there's always options like https://yacy.net/

You should see these things more like real life. If you wanted to know more about your own neighbourhood, what better way is there than to go outside and walk around your neighbourhood and see things with your own eyes?

Maybe that's just my opinion, but status quo is noone's but your own fault, because I never had this problem.

raxxorraxor · on Aug 25, 2022

Content producers get sucked into walled gardens. Even if it is just an internet discussion, but nobody will ever read your shit on Discord if it is older than a week. IRC had some of the same problems, but only to a degree. So user content is decreasing and corporate and bot farms remain for the open net.

You could crawl forums and find deep technical discussions. Not anymore. And if a term was ever part of any news cycles, you get walls of Google selected propaganda.

NCC1701DEngage · on Aug 25, 2022

>Who stops people from running their own search engines?

This is basically the reason my team and I are building an alternative set of YouTube recommendations. You can check them out here:

https://channelgalaxy.com

I was just tired of YouTube steering me back to the same old small niche of videos, many times giving me repeat recommendations for stuff I'd already seen. Our algorithm is designed to surface smaller channels and find more obscure content.

warble · on Aug 25, 2022

Yes, we like alternatives. Thanks for building this!

(casual observation: Try matching titles without spaces, I did 'thisoldtony' and got nothing, but 'this old tony' matched. )

NCC1701DEngage · on Aug 25, 2022

Good suggestion and noted.

Thanks!

HotHotLava · on Aug 25, 2022

Even if your neighbor hosts a website for your local football club or whatever, it will almost certainly use some hosting service and not a local machine. The number of websites self-hosted from a residential home must be a tiny, tiny fraction of all "interesting" websites.

PeterisP · on Aug 25, 2022

First, the internet has grown a lot, so the time&cost&hardware for retreiving and indexing it has grown.

Second, the quantity of intentionally fake noise has grown even faster - the spam problem that you have to solve is much harder than 30 years ago, any naive approach will simply fail to notice the needle in the haystack.

JohnHaugeland · on Aug 25, 2022

> Who stops people from running their own search engines?

This question reveals a failure to understand the equipment, labor, and bandwidth costs of running a search engine.

marginalia_nu · on Aug 25, 2022

Really, do you have an estimate of what it would cost?

JohnHaugeland · on Aug 26, 2022

That's like if someone says "a car is $18,000, just make your own car," another person laughs and says "you don't really think you can build a street legal car from scratch without making a car company," and someone else saying "really, what do you estimate a car company would cost?"

It's completely unnecessary to make that estimate, a nonsense proposition since any two implementations are two orders of magnitude in cost apart, and a question that should never be asked of someone who hasn't done it.

Which is weird, because if you are who I think you are, you've done this in a trivial way, focusing on tiny sites.

And who knows? Maybe you're about to tell me that you've indexed several tens of thousands of pages yourself, that nobody's helping you, that it runs on two computers, and that it's Not That Difficult (tm).

Of course, then someone compares that engine to a practical search engine that also encompasses modern sites, and therefore needs to run tooled browsers to cope with their AJAX nonsense, and has to hit them every hour to be up to date.

And then you look at the disk cost.

Microsoft spends about $6 billion a year on Bing.

Duck Duck Go has more than 200 staff and raised $170+ million before their first profitable quarter

I think it's very easy for someone to put a homebrew HTML chess game on the phone store and then turn around and insist they know what it takes to run EA

marginalia_nu · on Aug 26, 2022

I'm not disputing that you can sink inordinate amounts of money into a search engine, but on the flip side, I am running a search engine.

It indexes not tens of thousands of pages, but has a peak capacity of about 100 million documents. I can crawl over a billion documents per month.

I don't really see anyone suggesting competing with Google or Bing off a PC in your garage, but it is absolutely and demonstrably feasible to build complementary services without any budget at all.

It doesn't require huge numbers of developers, it doesn't require a small country's allotment of bandwidth, and it doesn't require data-centers full of prohibitively expensive hardware.

JohnHaugeland · on Aug 27, 2022

> has a peak capacity of about 100 million documents

This is much larger than expected.

alvah · on Aug 26, 2022

"Who stops people from running their own search engines?" It's not who, its $$$ / time / knowledge (pick 1,2 or all 3).

lolinder · on Aug 25, 2022

You're talking to the creator of a search engine specifically designed to turn up the single-user portion of the internet. I would venture a guess that they know better than anyone what's out there in the non-Googleable internet.

TheRealDunkirk · on Aug 25, 2022

No one is denying that bespoke web sites exist. I still run one myself, almost out of sheer stubbornness. But nothing hits it except search engines, which never show a result from it.

lolinder · on Aug 25, 2022

Yes, Google and the other big, profit-driven engines won't show bespoke websites. search.marginalia.nu doesn't show anything but bespoke.

If you want to break out of the dead, corporate internet, that's exactly what the GP built marginalia.nu to do.

beezle · on Aug 25, 2022

I use millionshort.com for searches that I know will be useless on Google or similar. It lets you remove "top" sites and also e-commerce to the extent it is able to identify them. Though it generally works OK, sometimes it has gotten wonky and gone into a captcha loop.

Terretta · on Aug 25, 2022

works well for common things lots of people write about, but uncommon search combos come up empty or full of underperforming listicle pages

I wish dumping the top million also dumped anything with “Top N” in the title of the page…

april_22 · on Aug 25, 2022

I have been getting great results with you.com and am downranking all the sites I don't find value in. Searching for shopping related queries has been a blast in comparison to Google.

https://you.com/search?q=best+laptops https://www.google.com/search?q=best%20laptops

marginalia_nu · on Aug 25, 2022

Is there any real obstacle to building a search engine that primarily shows smaller independent websites? Seems pretty doable.

Blackthorn · on Aug 25, 2022

You can't find what's not there. Way too much of the primary content has moved behind those garden walls.

My local farm has a website where they list the stuff they have available. Meanwhile their actual scheduling and detail updates are on their Instagram because of course it is.

zerocrates · on Aug 25, 2022

Accurate, up-to-date information for many businesses seems to be about 90% on Instagram and 10% on Facebook. The website, if there is a website, has no information, or old information.

This is frustrating (among other reasons) because Instagram has become much more aggressive in not allowing you to even see their content without logging in. Sometimes you can see the gallery but not an individual posts, sometimes no individual posts, sometimes you can't see anything at all.

hef19898 · on Aug 25, 2022

Even worse, you cannot use a browser from mobile woth cookies deactivated, at least thta's what Instagram claimed the last time I tried accessing it using DDGs browser. I'm sure that overall nbers improved, I stopped bothering that very day. That leaves WhatsApp from Meta's apps, which seems to become Facebook's Excel: The only reason to use anything from that vendor in the first place.

spaceman_2020 · on Aug 25, 2022

This is particularly acute in countries that skipped the desktop internet and jumped straight to mobile internet. For much of this userbase, their first and primary interaction with the web is through apps. They never bothered creating blogs or web pages and now all their content is behind the walled gardens of Facebook and Instagram.

marginalia_nu · on Aug 25, 2022

I'm not having much trouble finding this type of content on the search engine I built that does exactly this. Not everything is available of course, but 20 years ago, you'd probably have to call someone for that information, so not much difference.

scyzoryk_xyz · on Aug 25, 2022

At some point it would probably become useful to teach “internet/tech literacy” to educate people on why this is a problem. But we’re a few decades from something like this.

mrguyorama · on Aug 25, 2022

When I was a kid we had "computer class" that taught how to type, how to use Microsoft office (and open office) applications for different use cases, and this was then mixed in with understanding different sources of information taught by the school library and english classes.

As kids are now raised on smartphones instead of the family desktop, I think they need MORE of this, not less, for at least the very important skill of typing. I wonder how many 12 year olds in america can type using the "standard" method, instead of hunt and peck.

I don't want computing to be something only known by the children of turbo nerds. I want young adults to be able to solve their own problems with computers, ie build some spreadsheets for home finance or even just be able to graph the data from one of their science classes.

hammyhavoc · on Aug 26, 2022

True technical literacy is at an all time low, IMO. Whilst more people than ever are "online", the barrier for admission is so low and very few people ever seem to learn more than they absolutely need to.

As you can't develop software on phones and tablets, very few people are tinkering with software. The Pi and iOS app craze brought a momentary change, but it seems to have gone back to how it was—and worse.

Kids of today are mostly out of their depth when put in front of a computer of any description if it is beyond basic website usage. Complex program? Forget it. Decent typing skills? Forget it. Networking know-how they'd have picked up from doing LAN gaming with consoles or PCs? No chance. Change a drive? LOL.

For the handful of kids that game on PCs, they're generally not very clued up and they're just copying builds they've seen on YouTube to the word. It's a sad state of affairs.

And yes—of course, there are the kids or us turbo nerds, because of course there are, but they are so few and far between.

not2b · on Aug 25, 2022

That is probably because the farmers hired somebody one time to make a website which they don't have the skill to maintain, but they know how to use Instagram.

Blackthorn · on Aug 25, 2022

Well, yeah, I know why they do it and it's a rational decision for them. It's really a condemnation of our own evolution as an industry and the incentives involved that it ended up that way.

badpun · on Aug 25, 2022

On the other hand, 20 years ago such farm would not have any internet presence at all, not to mention detailed and up to date inventory info. Instagram et al made it possible.

hef19898 · on Aug 25, 2022

20 years ago there was no need for any of this. You went to tje farm, dod your groccery shopping and tgat was it. Or you didn't go there. Either way, farms did exist back then just fine.

Who on earth would expect a local farm shop to be on par with Amazon when it comes to inventory and availability data online?

Blackthorn · on Aug 25, 2022

My local farm does already have that information, like, "blueberry picking suspended, waiting a week for ripening". Like, I don't have to expect them to add inventory information, they're already doing it because that's how they communicate with their customers. It's just only available behind the garden walls. It's an indictment of us as an industry that it turned out that way because that was the convenient way to do it instead of a more open and user friendly way like a website.

Twenty years ago you just called them for the information, and it's way better for them to broadcast it than have a hundred 1:1 conversations.

alexb_ · on Aug 25, 2022

https://wiby.me

CloseChoice · on Aug 25, 2022

This is amazing and really made with love. Just found a website in the 90's style of guy who asked companies for free stuff by letter.

disqard · on Aug 25, 2022

Nice! I discovered this:

https://tilde.club/~fab1/

(might make your fans spin up)

kyleee · on Aug 25, 2022

I searched for “cactus” and found a fun history website: http://www.realhistoryww.com/world_history/ancient/Etruria_t...

jshen · on Aug 25, 2022

The biggest hurdle is that parsing web pages is really hard, and sites do a bad job at providing good meta data.

marginalia_nu · on Aug 25, 2022

BeautifulSoup and its clones does parsing pretty well. Just extracting the text out of HTML isn't incredibly hard, and metadata is too unreliable to ever be much use.

jshen · on Aug 25, 2022

The hard part is understanding which parts are the content versus navigation or promotions of other content.

I’ve written a couple search engines. Have you tried making one with beautiful soup?

marginalia_nu · on Aug 25, 2022

No I use JSoup for my search engine.

You can calculate anchor tag density across the DOM tree and prune branches that exceed a certain threshold to remove navigational elements with reasonable accuracy if that is a problem.

It's not going to be perfect, but even Google messes this up every once in a while. I wouldn't consider it a major hurdle.

peteradio · on Aug 25, 2022

I don't presume the source is available... unbelievably cool project that I'm sure a lot of people have imagined themselves doing.

edit: https://git.marginalia.nu/marginalia/marginalia.nu !!!

marginalia_nu · on Aug 25, 2022

The actual feature I described is not in that repo though. It's something I've been working on. Here's the code for that (AGPL):

    private static final double PRUNE_THRESHOLD = .5;

    public void prune(Document document) {
        PruningVisitor pruningVisitor = new PruningVisitor();
        document.traverse(pruningVisitor);

        pruningVisitor.data.forEach((node, data) -> {
            if (data.depth <= 1) {
                return;
            }
            if (data.signalNodeSize == 0) node.remove();
            else if (data.noiseNodeSize > 0
                    && data.signalRate() < PRUNE_THRESHOLD
                    && data.treeSize > 2) {
                node.remove();
            }
        });
    }



    private static class PruningVisitor implements NodeVisitor {

        private final Map<Node, NodeData> data = new HashMap<>();
        private final NodeData dummy = new NodeData(Integer.MAX_VALUE, 1, 0);

        @Override
        public void head(Node node, int depth) {}

        @Override
        public void tail(Node node, int depth) {
            final NodeData dataForNode;

            if (node instanceof TextNode tn) {
                dataForNode = new NodeData(depth, tn.text().length(), 0);
            }
            else if (isSignal(node)) {
                dataForNode = new NodeData(depth,  0,0);
                for (var childNode : node.childNodes()) {
                    dataForNode.add(data.getOrDefault(childNode, dummy));
                }
            }
            else {
                dataForNode = new NodeData(depth,  0,0);
                for (var childNode : node.childNodes()) {
                    dataForNode.addAsNoise(data.getOrDefault(childNode, dummy));
                }
            }



            data.put(node, dataForNode);
        }

        public boolean isSignal(Node node) {

            if (node instanceof Element e) {
                if ("a".equalsIgnoreCase(e.tagName()))
                    return false;
                if ("nav".equalsIgnoreCase(e.tagName()))
                    return false;
                if ("footer".equalsIgnoreCase(e.tagName()))
                    return false;
                if ("header".equalsIgnoreCase(e.tagName()))
                    return false;
            }

            return true;
        }
    }

    private static class NodeData {
        int signalNodeSize = 0;
        int noiseNodeSize = 0;
        int treeSize = 1;
        int depth = 0;

        public void NodeData(int depth) {}

        private NodeData(int depth, int signalNodeSize, int noiseNodeSize) {
            this.depth = depth;
            this.signalNodeSize = signalNodeSize;
            this.noiseNodeSize = noiseNodeSize;
        }

        public void add(NodeData other) {
            signalNodeSize += other.signalNodeSize;
            noiseNodeSize += other.noiseNodeSize;
            treeSize += other.treeSize;
        }

        public void addAsNoise(NodeData other) {
            noiseNodeSize += other.noiseNodeSize + other.signalNodeSize;
            treeSize += other.treeSize;
        }

        public double signalRate() {
            return signalNodeSize / (double)(signalNodeSize + noiseNodeSize);
        }
    }

It renders the text of this link (at present): https://news.ycombinator.com/item?id=32594821

Into this search-engine friendly text:

The hard part is understanding which parts are the content versus navigation or promotions of other content. I’ve written a couple search engines. Have you tried making one with beautiful soup? Why does it matter? You love seafood, so just literally run grep on the entire page and if it contains the word then include it as a correct. In reality, you will miss a lot of real seafood pages because they don't really need to mention "seafood" and context matters, so what? Chances are that that one website where person randomly added "I love seafood" to the top of the page will be the only page that you've ever wanted to see anyway. There's too much data for you to go through in entire life in any case, so why worry about it as long as you can get something that's good enough? You will never get best data, if it was possible, google would be giving you best data already. How do I know? Well, looking up my real name shows where I grew up, what school I went to, graduated, and even which exam I scored 100 on... And even some places I used to work for in the past, and while that part is going to make most people paranoid, I wish ALL results were as detailed as this one, but there's little you can do. No I use JSoup for my search engine. You can calculate anchor tag density across the DOM tree and prune branches that exceed a certain threshold to remove navigational elements with reasonable accuracy if that is a problem. It's not going to be perfect, but even Google messes this up every once in a while. I wouldn't consider it a major hurdle. I don't presume the source is available... unbelievably cool project that I'm sure a lot of people have imagined themselves doing.

jshen · on Aug 25, 2022

Thanks for sharing! I’ll try this after work on some URLs from the search engine I’m working on as my hobby project.

marginalia_nu · on Aug 25, 2022

Yeah, almost certainly could do with some tweaking and tuning, but the basic idea works remarkably well in many cases.

jshen · on Aug 26, 2022

Do you have a fully functioning code example? I didn't realize it was just a snippet when I looked at it earlier.

marginalia_nu · on Aug 26, 2022

https://paste.ofcode.org/vxC36HShHxpsEgztgqwzxi

jshen · on Aug 25, 2022

Yeah, it depends on what you want to prioritize and value in your search engine. I’m coming at it from the angle that if you want to make a good, new, and different kind of search engine you need to do something fundamentally different than Google. No one is going to beat Google at their own game. Leveraging meta data is a very easy way to make something new and different, but it won’t be as comprehensive as Google. I doubt that someone doing what you described over a few months or year could make a search engine that anyone wanted to use.

marginalia_nu · on Aug 25, 2022

> I doubt that someone doing what you described over a few months or year could make a search engine that anyone wanted to use.

Dunno, not only are people sending me money to develop my search engine, not enough to live off but still, I also get emails and tweets from people who say they love it almost on a weekly basis.

I think attempting to be as comprehensive (or more) than Google is a trap. The better move is to fly under them. Be cheaper and better at something. Recipes is a great example of something Google is just miserable at, that is easy to do much better. There's plenty of such niches.

mayoi · on Aug 25, 2022

Why does it matter?

You love seafood, so just literally run grep on the entire page and if it contains the word then include it as a correct.

In reality, you will miss a lot of real seafood pages because they don't really need to mention "seafood" and context matters, so what? Chances are that that one website where person randomly added "I love seafood" to the top of the page will be the only page that you've ever wanted to see anyway.

There's too much data for you to go through in entire life in any case, so why worry about it as long as you can get something that's good enough? You will never get best data, if it was possible, google would be giving you best data already.

How do I know? Well, looking up my real name shows where I grew up, what school I went to, graduated, and even which exam I scored 100 on... And even some places I used to work for in the past, and while that part is going to make most people paranoid, I wish ALL results were as detailed as this one, but there's little you can do.

jshen · on Aug 25, 2022

That’s how you make a worse search engine than Google. If you are serious about competing in that space I think you need to do something fundamentally different than Google. Treating pages as a bag of words leads to a shitty search engine. Like I said, I’ve built a few search engines, and I have tried this.

Edit: https://en.wikipedia.org/wiki/Bag-of-words_model

jjk166 · on Aug 25, 2022

That actually sounds like the solution. If you're getting something standard, you don't want it. If something is too non-standard to be identified, pass it through.

shagie · on Aug 25, 2022

Unless this is a whitelist, it's a metric which will be gamed.

If it is a white list, then why have a search engine rather than an old school curated Yahoo directory?

marginalia_nu · on Aug 25, 2022

Search engine spam is actually a fairly solvable problem if you aren't in Google's questionable position of also selling the ads that make the spam economically viable.

They can do everything except the one thing that would actually hurt the search engine spammers right in the coin purse: Penalize websites for having ads.

jjoonathan · on Aug 25, 2022

> The goals of the advertising business model do not always correspond to providing quality search to users.

- Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine

marginalia_nu · on Aug 25, 2022

Yeah, that one aged well :P

chc · on Aug 25, 2022

I don't know about that. If I recall correctly, there was a time when Google was trying to tamp down on pages with excessive advertising, so SEO spammers just switched to making pages that superficially looked like normal informational pages, but the content was all either ad copy or spun text, and all their links went to products they wanted to sell.

CM30 · on Aug 25, 2022

I get where you're coming from, but my experience is that even many people making sites for fun stick ads on the site in the hopes of getting a bit of extra cash. Lots of fansites did so in the olden days, and many useful sites, wikis and blogs do so now, especially if they either use hosting that adds them or they get enough traffic they feel they need to.

marginalia_nu · on Aug 25, 2022

I'm not saying block ads, I'm saying prioritize sites that doesn't have ads above those that do. Prioritize those that have some over those that have a lot.

rightbyte · on Aug 25, 2022

> Penalize websites for having ads.

Oh ... this is such a good idea. I'm like tempted to try it and see what happens.

warkdarrior · on Aug 25, 2022

Is Wikipedia's prompt to donate (which they do quite often) an ad?

hinkley · on Aug 25, 2022

The whole premise of Google at the beginning was that the web is a collaboration, and we can measure that.

Then it was we can measure that and make money.

Now it’s just we can make money.

Whether that part is sinister or not, we know that we have a good number of bad actors, and from search engine results we can be sure that they have not developed a workable Byzantine fault tolerance mechanism to filter out the bad actors. Those who scream the loudest get put on a stage.

r00fus · on Aug 25, 2022

Companies that acquire other companies and talent will always change. The goal of making money over everything is the one thing all Class-C corporations have in common.

0xdeadbeefbabe · on Aug 25, 2022

An AI could make the list, and pretty soon the gamers of the system would opt for really terse recipe sites.

mayoi · on Aug 25, 2022

It cannot be gamed. You wrote your own search engine, you wrote your own filters, you know how it works and noone else.

Whitelists that I wrote by hand also don't introduce new unexpected entries by the way :)

This instead could be more like RSS where as your crawler gets new sites, you get updates on new things, and you could filter in your crawler or in RSS client directly, doesn't matter.

danielvaughn · on Aug 25, 2022

It would actually be lovely to have a search engine that only serves up single-creator or few-creators websites. I'm sure one exists, but I'm not aware of any.

tspike · on Aug 25, 2022

https://search.marginalia.nu/

syeare · on Aug 25, 2022

I haven't used it (or their browser Orion, which is allegedly better-performing and more battery-efficient than Safari), but have you tried the "premium" search engine called https://kagi.com/ ?

Terretta · on Aug 25, 2022

Kagi with lenses is tremendous. I recommend it elsewhere in this thread.

One should subscribe to it out of support, hope, and to send a signal if nothing else. But it’s actually considerably better on most searches, perhaps similar to a mid-2000s Google, except with mild structure added that isn’t ads.

(You can still !g like ddg if you feel you absolutely must.)

qikInNdOutReply · on Aug 25, 2022

Imagine websites having a captcha, that proofs that they are handcrafted by a person, proofing the time invested, the back and forth. You could filter for that.

dicknuckle · on Aug 25, 2022

Are you a GPT-3?

fumblebee · on Aug 25, 2022

This actually seems a reasonable enough solution to the bot problem, but doesn’t address the fake content problem. A human can still upload fake stuff.

qikInNdOutReply · on Aug 26, 2022

Not, if the whole creation process has been documented with a sort of internal blockchain.. From scan sketch, to open canvas, to every single brush, to convert image to upload image. You can fake that too, but the effort would be tremendous

rikroots · on Aug 25, 2022

I just Googled my name and the search engine returned a good long list of pages featuring my work in various places[1]. Of course it does help to have a fairly unique name; others may struggle to find themselves featured on the first page of results when they Google their name.

[1] For instance, this link - which I discovered just ten minutes ago. I know for a fact that I have never submitted poetry to the Porkopolis website (motto: "Considering the pig, a single-minded bestiary") but it's always a pleasure to discover other people putting my words to good use! - http://www.porkopolis.org/pig_poet/rik-roots/

TheRealDunkirk · on Aug 25, 2022

I also can easily Google my name. There are only 3 of us, and one of them is my deceased uncle. It doesn’t change the fact that I probably haven’t had any organic traffic to my personal web site of 28 years for a long, long time.

panick21_ · on Aug 25, 2022

How many 'walled gardens' must there be before it simply is a city?

hinkley · on Aug 25, 2022

The people standing outside don’t give a damn what the people on the walls call it.

panick21_ · on Aug 25, 2022

They are not walled for those going in.

How can we call everything a walled garden when many of them are free to get in and interconnect with each other?

JohnFen · on Aug 25, 2022

The price of entry is very steep. Many people are willing to pay. Many are not. It's an exaggeration to say that many walled gardens are free to get into, I think. At least, I don't know of any.

cma · on Aug 25, 2022

You just have to give up your firstborn at the gates, and buy a paid Boost to be seen by people who clicked something like "notify of all posts."

BirAdam · on Aug 26, 2022

This is why I use Brave Search now. It isn’t perfect, but it gets more relevant results than do other search engines.

unity1001 · on Aug 25, 2022

"Brands is how you sort out the mess" - Eric Schmidt, circa sometime 2010s.

wmeredith · on Aug 25, 2022

I had the same reaction to the main thesis of this article. This is like people taking roadtrip across the United States who are only willing to go 1/4 mile off the federal highway. They then decry the state of cuisine because the only restaurants left in America are the same 5 fast food joints.

Getting outside one's comfort zone and putting in the time to find something good/interesting/new is highly underrated. But it is work. And many a corporate empire has been built by making a mediocre or sufficient experience the most convenient thing.

tunap · on Aug 25, 2022

>This is like people taking roadtrip across the United States who are only willing to go 1/4 mile off the federal highway.

Urban Spoon was an amazing resource for us road warrior types. I found many fantastic places > 1/4 mile off the interstate. Nowadays, I ask employees at worksites for their opinions. If they recommend a box chain, I ask someone else.

bombcar · on Aug 25, 2022

If you know you're going to be in or near a particular town and want food, a good place I've found to start is the restaurants that advertise in the local church bulletins.

warkdarrior · on Aug 25, 2022

What's a "worksite"? (asking for a friend who works remotely)

tunap · on Aug 26, 2022

The physical location where a field tech installs and implements technology &/or equipment. I worked in Building Automation Systems(BAS) in big box retail stores and covered >400 sites in 5 US states.

hombre_fatal · on Aug 25, 2022

Frankly, Marginalia shows how many of these glorified websites are just dead, forgotten projects that are only still running because nobody remembered to take them down.

While I like using Marginalia to find these websites, I don’t think it’s a demonstration of how “alive” the internet is but more like a lens into what the internet used to be, like walking around an archaeological dig site.

ianai · on Aug 25, 2022

If the majority (vast majority) of users of the internet rarely break out of social media and you posit that social media is largely 'dead', then isn't the internet effectively dead for the vast majority of users?

IIAOPSW · on Aug 25, 2022

What does it even mean for it to be dead for the majority of users if it was never alive for that same majority?

WalterSear · on Aug 25, 2022

Maybe a better term for it is the zombie internet.

WarOnPrivacy · on Aug 25, 2022

If largest social media platforms all went *pOoF!* today, where would our dead-headed majority be in a year?

mayoi · on Aug 25, 2022

Just a wild guess based on past 2 years, but probably outside.

cosmojg · on Aug 25, 2022

One or two of the remaining smaller social media platforms?

doubled112 · on Aug 25, 2022

Would those smaller social media platforms then be large, end up commercialized some how, smaller ones pop up, and something else take over until we repeat the process?

marginalia_nu · on Aug 25, 2022

Is that necessarily a bad thing?

qikInNdOutReply · on Aug 25, 2022

Something like TV but it makes me feel good about myself.

tenebrisalietum · on Aug 25, 2022

Yes, and let's celebrate!

Access to the real, genuine Internet people and places will be made invisible; protected by the gargantuan SEO-fed lipid-berg of AI-generated content, keeping the social media peasantry ever-corralled in the cattle pen where they shall be kept happy and fed by their keepers.

Is September 2022 the final Eternal September?

marginalia_nu · on Aug 25, 2022

The problem isn't really with the internet, but a handful of websites that have come to mediate the internet for a lot of people (despite appearances to the contrary, the two are not the same).

This is a critical distinction, because the former is a problem like "the water is too wet", like you can't really fix that. You can build new digital infrastructure though. That's a solvable problem.

JohnFen · on Aug 25, 2022

And then that new infrastructure will be subverted just as the current one has been. I don't think this is an issue that can be solved through engineering. I think the problem lies in different domain.

harimau777 · on Aug 25, 2022

Can't we just build a new infrastructure at that point? The current infrastructure worked well for about a decade. If we have to rebuild every decade then that's not so bad.

fallingknife · on Aug 25, 2022

I would dispute that digital marketing is unreasonably effective. In fact I would say the dead internet is a result of the exact opposite. Digital marketing is so ineffective that basically the only strategy that works is spamming it everywhere. If you don't believe me look at the revenue per user of ad driven tech companies.

marginalia_nu · on Aug 25, 2022

Effective is a slippery word. I use it to mean effective in the sense of affecting what is presented. You use it to mean effective in the sense of making sales.

It is extremely effective in the first way, but extremely ineffective in the latter.

kosh2 · on Aug 25, 2022

If I type a new marketing relevant keyword into google, I will get a perfect add for that in 5 minutes watching youtube.

Influencers are some of the most popular people on the planet for young people.

DebtDeflation · on Aug 25, 2022

And you will continue getting those ads for months after you made the purchase.

marktangotango · on Aug 25, 2022

> You just need to get off social media and google to find it.

Maybe if you know the exact url or specific keywords, but generally not now. Google has turned into ad placement the same level ask jeeves and their ilk were. It's atrocious for surfacing anything other than click bait. Duckduckgo is better, but not by much imo.

fourstar · on Aug 25, 2022

>You just need to get off social media and google to find it

Isn't that the point? The general populace isn't getting off of social media and Google, thus, dead internet theory continues to compound itself..

wmeredith · on Aug 25, 2022

In the bygone* days of the weird internet the general populace wasn't there either.

*The weird internet's death has been highly exaggerated.

mayoi · on Aug 25, 2022

[flagged]

hombre_fatal · on Aug 25, 2022

This just comes across as mean-spirited.

Even if you think it's a net positive when human attention is centralized in a few things like TikTok or Fortnite, you should still be able to enumerate downsides.

At that point, you'll see that you are disagreeing with the subjective weight of the upsides and downsides thus it doesn't make sense to attack someone like this.

MisterTea · on Aug 25, 2022

> You just need to get off social media and google to find it.

Have you tried searching lately? It feels like it is becoming increasingly difficult to find actual articles with useful information in a sea of SEO trash.

dirtyv · on Aug 25, 2022

Wow, awesome tool! Seriously, thank you for this. This really brings back that sense of wonder I experienced when I was younger and began exploring the internet in the early 90s

emptyparadise · on Aug 25, 2022

>Refresh this page as many times as you want, and I'll show you the living Internet

You take the red pill, you stay in Wonderland, and I show you how deep the rabbit-hole goes.

MikeDelta · on Aug 25, 2022

I think google and the other engines are part of the problem: ads, walled gardens, fake sites, and more.

A colleague showed me a website the other day from 2013 that was an absolute jewel in terms of knowledge. I am sure more recent sites like that exist, but I am afraid finding those with google are almost zero.

Kaibeezy · on Aug 25, 2022

Cool. How? Can you elaborate on "I crawl myself." Do you mean you are the search engine and your search engine is an index of stuff you found that fits your criteria? What criteria, precisely?

marginalia_nu · on Aug 25, 2022

It's basically a 1998-style Google which discriminates against javascript-heavy documents and uses a Personalized PageRank[1]-algorithm to promote content that is adjacent to human websites.

[1] http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf (ch. 6)

Kaibeezy · on Aug 25, 2022

Thanks. Tried some searches, good stuff. Recipes are a well known disaster area these days, it's nice to be able to find just an explanation and instructions, with nothing else.

marginalia_nu · on Aug 25, 2022

It's a bit hit-and-miss. The index is pretty small right now, I'm working on that, as well as getting better results when you have many keywords.

Right now it's absolutely amazing if you have like a broad topic you want websites about, but kind of weak when you want something more specific.

Kaibeezy · on Aug 25, 2022

I got tortilla-making tips, I’m good ;)

hackarimoo · on Aug 26, 2022

I assume you've considered this, but what about deprioritizing precisely what spam sites use to SEO, or concocting some other identification algorithm, to filter precisely those out?

While yeah, marginalia finds interesting stuff, I've not been able to find anything useful that I've tried searching for with it so far.

coldtea · on Aug 25, 2022

>You simply aren't able to find the living internet behind all the nonsense. That doesn't mean it doesn't exist.

It's not about it being not existent. It's about it being too small a percentage. And will algorithmic generation and rampant re-posting of news content 1000s of times on different outlets, this is probably true...

8 billion people are able to manually create much fewer content than thousands upon thousands of automated generation scripts and bots...

haskell_melody · on Aug 25, 2022

http://readsomethinginteresting.com/ is also great.

RGamma · on Aug 25, 2022

Thanks for helping to keep the internet alive :)

ihateolives · on Aug 25, 2022

Refreshed few times. 99% of sites were about IT, programming, gaming, tech, STEM in general. That's really narrow representation of internet. Maybe it's just my luck.

marginalia_nu · on Aug 25, 2022

Turns out, people who make websites are usually the sort of people who make websites.

ihateolives · on Aug 25, 2022

That's the problem with the approach you've taken. You're showing domains, not web pages. Not all hobbyists or interest groups outside tech have domains.

mayoi · on Aug 25, 2022

You can always contribute by crawling ipv4 range searching for http servers that don't necessarily have a domain pointing at them and posting a list for us when you're done crawling it merely once :)

lolinder · on Aug 25, 2022

I'm assuming you're referring to Facebook groups or similar walled-garden web presences? If so, I'm assuming indexing those is an explicit non-goal for OP, and I'd agree with that. The walled gardens are very hard to index (often actively hostile), which means the returns would be rather low. Plus, most such places have searches of their own if you're really interested in finding something there.

harimau777 · on Aug 25, 2022

In my experience, part of the problem is that often (usually?) the search on those places is nearly useless. For example, I shudder to think what will happen if old.reddit ever stops working.

lolinder · on Aug 25, 2022

True, but for reddit adding "reddit" to a google search does just fine. Not so much for the single-purpose websites marginalia focuses on.

marginalia_nu · on Aug 26, 2022

Well the literal point of the tool is to show that there are websites outside of the walled gardens. There are plenty of link aggregators already. Hacker News is one of them. Reddit is another. Twitter is a third. Google is a fourth.

forbiddenvoid · on Aug 25, 2022

This is basically the late 90s internet in a nutshell.

moomoo11 · on Aug 25, 2022

Sweet! This is an awesome resource

jeremysalwen · on Aug 25, 2022

This article engages in my pet peeve, which is referring to entertainment as "content".

I don't have a problem with the word "content" in the context of content vs framing. e.g. if you are designing a network protocol, you care about distinguishing the content of the message from the framing, and you don't care at all what the nature of the content is, it could be video, image, text, etc, and it could serve many different purposes (entertainment, personal communication, employee training, archival/backup, etc).

However, this sort of language has crept into discussions of online entertainment, for no good reason. "I'm not an online entertainer, creating entertainment for people to enjoy, I am a content-creator creating content for people to consume." I think people don't like to think about what they're creating or consuming as entertainment, because society has already attached a connotation of triviality to the term "entertainment" (for good reason IMO).

Someone who talks about "content" is adopting the terminology and framing of a businessman, who cares little what purpose the "content" serves, just that it can attract attention, and thus money.

To be fair, once the internet is entirely taken over by bots, maybe it will be appropriate to call the stuff that bots create and consume "content" without a whiff of irony.

zwkrt · on Aug 25, 2022

The funny thing is that that definition of content is still operating. “Content” is a term for the undifferentiated media framed by advertisement. To an advertiser it doesn’t matter what the content is, it’s just bits on the wire that seem to attract attention.

If you can’t put an ad in it then it isn’t content. Insidiously, we now call things content even if they don’t have advertisement or are not created for show.

anigbrowl · on Aug 25, 2022

This term of business art reminds me unpleasantly of Borders, and now defunct bookstore chain where managers insisted staff refer to books and other media as 'product', reducing staff to little more than super-market-grade shelf-stackers and merchandisers.

I grew up in a time and place where people who worked in bookstores did so because they liked books and were knowledgeable about the book industry as both distributors and consumers. Chain stores like Borders preferred to hire young and cheap and make stocking decisions centrally. The staff could probably have been swapped with a completely different retail establishment, and both outlets would have run in much the same way.

Free market advocates like to go on about the self-corrective nature of 'real' capitalism/competition, but never seem to have any answer for the existence of franchises or their tendency to crowd out other participants in a market by having a much deeper pool of capital to use as leverage. The idealistic models of perfect competition and price equilibrium only work well under elusive conditions and for fungible commodities.

JohnFen · on Aug 25, 2022

Thank you for saying this. I've had a similar opinion for years now. I cringe whenever I see "content" used this way.

When I first saw that it was gaining traction, I took it as yet another sign that the internet was effectively dead in terms of what always made it great for me, and had been turned into nothing more than business.

munificent · on Aug 25, 2022

The problem is that not all content is entertainment. Many "content creators" are producing content whose intent is to educate, inform, persuade, or (mostly likely) some inseparable combination of all four.

I don't like "content" either. I prefer "media". But there's an entirely logical reason we don't call it "entertainment". That would be like calling all clothing "pants".

bombcar · on Aug 25, 2022

The reason we don't call it "entertainment" is because we're embarrassed to admit that it's all entertainment.

If the "content" isn't entertainment, it often has no ads and is referred to via another name.

jlokier · on Aug 25, 2022

> it often has no ads

At least 95% of what I watch on Youtube is educational or technical. It's self-teaching material, math, science, that sort of thing. Most people would find it dry but I enjoy it.

Still, I wouldn't call it entertainment.

I assure you it has just as many ads: pre-roll ads, clickable ads below the fold, interstitial ads, and "a word from our sponsor" ads as everything else on YouTube.

bombcar · on Aug 25, 2022

It's literally entertainment; you're being entertained by it. That doesn't mean it isn't teaching you or otherwise improving the world, but it's broadly in the class of entertainment.

I suppose I'd still classify it as entertainment even if you paid for YouTube premium.

selfhoster11 · on Aug 26, 2022

If we declare anything enjoyable to be “entertainment”, then wouldn’t household chores that happen to relax you or bring you joy also be classified as entertainment? Cooking does often feel pleasant, but I don’t think most people would call it entertainment.

If we call all content that we enjoy watching “entertainment”, then IMO the word loses the meaning a little.

bombcar · on Aug 26, 2022

Those are often referred to as pastimes or hobbies — and I think the “entertainmentification” of all (lacking a word let’s say content) has been a major issue in the recent decades.

It’s becoming hard to even find a youtube video that doesn’t have “like and subscribe” somewhere in it, even if it’s not otherwise sponsored.

selfhoster11 · on Aug 27, 2022

Asking to like and subscribe is a standard YouTuber behaviour because they presumably get a cut from the ads that YouTube runs on their channel. Or even if the channel is run entirely not for profit, liking and subscribing increases the chance that new users will see the content in their suggested videos thanks to an algorithmic boost.

munificent · on Aug 25, 2022

I think you're projecting your own Internet consumption preferences onto all of us.

Most of what I spend my time on on the web and YouTube is more towards educational than entertainment, though the line gets blurry (which is the point of having a unifying term like "content") when it comes to videos about music-making and stuff like that.

moonchild · on Aug 25, 2022

A medium is rather like a frame; it doesn't say anything about what is contained.

mayoi · on Aug 25, 2022

You can already call most things "content" without a whiff of irony. Just look at youtube. "Content creators", you think they're misusing the words? No they're being 100% genuine to you what their intention is.

Also, here's a good website in general for this kind of stuff, thank you Stallman: https://www.gnu.org/philosophy/words-to-avoid.en.html#Conten...

fassssst · on Aug 25, 2022

I see people write “I like their content” and have the same irking feeling that some people feel like their life is little more than consuming entertainment, and the word “content” elevates the worthwhileness of their behavior/addiction somehow.

dudul · on Aug 25, 2022

I'm not sure I follow your criticism. I do believe that the term "content" is perfectly appropriate for most of what's being put out there. "Content" convey this idea of "quantity over quality". I am grateful when someone identifies themselves as "content creator", or when streaming platforms talk about their "content", because to me it is a strong signal that it's gonna be garbage and can be ignored.

PurpleRamen · on Aug 25, 2022

From my observation, this also a counter-movement of influencers to distance themselves from the term influencer. People want to take some pride in their work and try to see some worth in it. And they want to distance it from the bland marketing and stupid pointless activities. Everyone makes pictures of their food, but if you hire a semi-professional photograph and pose for a picture, then it's real work and worthful content. Something along that line.

Karellen · on Aug 25, 2022

I get a similar sort of reaction to "content", and I get a similar reaction to "units" which is used as a catch-all term for physical items sold, be they DVDs, t-shirts, game cartridges, wrenches, sneakers, whatever.

The person talking about sales of "the thing" doesn't care what "the thing" is. The language implies that "the thing" itself is irrelevant. All that matters is that it's something that can be sold, and the only important datum is how many were sold. And maybe how much they were sold for. Aside from that, to the person using that language, it's all just undifferentiated stuff.

I think from there, if the person talking about "units" or "content" doesn't really care about what the thing is, they're going to care even less about whether it's a good example of that type of thing. Is it a good t-shirt? Or a bad pair of sneakers? Who cares - how many units were sold?

Are you making movie review videos? Or 30-minute+ EDM/prog fusion atmospheric music tracks? Or 5000-word investigative journalism takedowns of corporate shitfuckery? Or are you just making "content" - whatever will grab some eyeballs?

pnt12 · on Aug 25, 2022

Even worse is instagram entertainers who go by influencers. They don't even claim to produce content, just that they can influence users behavior.

Obviously this is is valuable for the customers of social networks (advertisers), but it's usually not so blatantly exposed.

EasyTiger_ · on Aug 25, 2022

Cannot stand the term either, so many problems with it. For one it completely lacks any kind of precision... and the idea of 'consuming content' is just gross and stupid.

fmajid · on Aug 30, 2022

The word suggests undifferentiated goop (pink slime?) being poured into cans for mass consumption, and shows the respect its producers have for both the product and its consumers.