Headlines are not one monolithic thing, any more than publishers are.
There are giant worldwide news organizations, there are personal blog publishers, and everything in between.
They all have different constraints and different motivations for their headlines.
Even if you look at just one type of publisher -- a traditional print newspaper with an online presence -- you are likely to find that they write multiple headlines for each story, depending on where that headline will be seen.
For instance, they might write one headline for the print version of the story that's constrained by page layout requirements.
Then then might write another headline that gets displayed on their homepage. It has to be short, punchy, and eye-catching.
Then they might write another longer, more complete, SEO-friendly headline that gets displayed when you actually click on the link to the story.
There might also be an alternate headline that gets displayed as the HTML/meta title of the page, which can be useful for when the link is shared via social media.
And all of those have their specific purposes and limitations.
Another wrinkle: In the case of news organizations, it's highly likely that the person who writes the headline is not the person who writes the article, whereas typically with personal blogs the same person writes both the story text and the title.
So if inaccurate or sensationalistic headlines are one problem, then another problem is treating headlines as if they are all the same. They're not.
Incentives matter, and the underlying problem that gives us clickbait headlines (and fake news, and the rest of our current journalistic ills) is that sites are primarily incentivized to maximize the number of people who show up on the site (so that they load ads and/or can be surveilled) -- hence the chase for "clicks at any cost".
There are two ways to fix this:
1. Somehow change adtech's incentives from "I get paid in proportion to how many clicks I get" to "I get paid in proportion to some other combination of metrics that's a better proxy for quality content (and quality engagement) than 'someone clicked on this and my ads loaded.'"
2. Break the link between advertising and some worthwhile subset of "content that we want to exist in the world" by coming up with a scheme to entice readers to fund the content directly.
I think there's a whole universe of untried startup ideas in both areas. I've been noodling around with an idea for #2 as a side project, and maybe at some point this spring I'll attempt a "Show HN".
Anyway, my ultimate point is that attacking this problem at the level of the content itself -- verifying headlines or rating the "fakeness" of news, etc. -- is the wrong approach, and I think most of the smart people who toss these ideas out know on some level that the proposed cure may end up being worse than the disease.
The fundamental problem is the busted incentive structure. You have to find ways to incentivize the creation of quality content by either rethinking the relationship between advertising and users, or finding a way to get users to pay. There is no third option that's market-based and sustainable.
Several years ago I stopped using ad networks entirely and started selling and hosting my own ads.
This decision has helped keep the ad quality high, keep more revenue in my pocket, and allows me to have a relationship with the advertiser that isn't solely based on clicks and data. Edit: And this is important because I can have a conversation with them to make sure the traffic they receive is the type of traffic they want and not just immediate bounces.
Regarding headlines, my main content consists of office design image tours and my headlines are always basically "Company Name's Offices - Location'. I used to try and post catchy headlines, but after doing it for several years it lost its appeal and ultimately didn't match with the type of website I wanted to create.
To point one, a former co-worker of mine cooked up a great algorithm that measured how much time a user spent on your site and how far down they scrolled and then presented that as aggregated data on the different levels of engagement users displayed.
It was brilliant work, and exposed deep flaws in many customer's traffic data. But no one cared all that much and it became a buried tool in a much bigger startup idea.
"[it] exposed deep flaws in many customer's traffic data" -- this probably is why nobody cared that much and it got buried. See my longish reply below. The incentives are such that only metrics that make traffic look more valuable are metrics that publishers and ad agencies have any love for.
I'm sure a truly great way of measuring a user session's real value to the advertiser has been invented and discarded hundreds of thousands of times over the past two decades -- invented because it seems needed, and discarded because it actually worked and holy crap most traffic is garbage.
>a truly great way of measuring a user session's real value to the advertiser has been invented and discarded hundreds of thousands of times over the past two decades --
Most advertisers do track and measure actual revenue to them. I think your mistake is assuming junk traffic and bounced eyeballs dont convert or contribute to sales...They do. If you have a decent product with decent margins, each sale can pay for thousands of useless eyeballs.
The "how many clicks" metric is a white lie anyway. Just because someone clicks through to a page does not mean they read it. Ad impressions should really take into account how long a reader spends on a page. I would bet click bait has a far higher bounce rate than accurate headlines
"I would bet click bait has a far higher bounce rate than accurate headlines" -- It does, but it can still be monetized with junk ads that do pay a tiny bit. So you go all-out for volume, and you end up with the situation we're in now.
If I'm reading your number one correctly it sounds like a paradox. There has to be a measurement of traffic - that genie escaped the bottle over 20 years ago (unlike radio/TV).
That's why I said "some combination of metrics" as an alternative to "clicks". In other words, yes, traffic will always matter, but there has to be better discrimination among kinds of traffic.
Traffic that's worthless to advertisers -- i.e. I clicked to your page and bounced immediately without actually engaging or really "seeing" any of the ads that loaded -- should be priced at $0.
Traffic that's worthwhile to advertisers -- i.e. I clicked to your page and stayed there for a bit, engrossed in the content and noticing the ads and getting a positive feeling from both -- should be priced somewhere north of $0.
I've been in this business on the content and publishing side for almost 20 years now, and I've seen very many attempts to distinguish worthless traffic from worthwhile traffic. As with the fake news/headlines issue, the above problem persists and is not amenable to a quick technical fix because it's ultimately a problem of incentives: publishers and ad agencies are strongly incentivized to maximize the percentage of a publisher's traffic that (by hook or by crook) can be classified as worthwhile to the advertiser, whether it really is or not.
Given that the ad agency supposedly represents the advertiser in the negotiation with the publisher over campaign pricing and value but is nonetheless incentivized to collude with the publisher because the agency gets paid based on the size of the ad spend, you can see how the advertiser is left with no one looking out for his/her interests and is at the mercy of an industry that pitches it "metric of the month" as a proxy for value.
You can also see why performance advertising platforms that offer advertisers tons of real-world data and transparency, i.e. FB and Google, are eating up the entire ad industry. It's just a better deal for advertisers than the publisher + agency model. At least, we thought it was, but now that FB has admitted to "accidentally" screwing advertisers with bad metrics, who knows...
Anyway, yeah, traffic will always matter. My only point is that in a world where revenue scales with any and all forms of traffic, no matter how worthless to the advertiser (and the user, in many cases), is a world where people will pursue traffic by any means.
Edit: Ultimately, though, if you're optimizing for "user got some sort of potentially actionable informational value from this" instead of "user got momentary emotional charge (i.e. the classic 'surprise and delight') and the advertiser got positive/useful vibes in the process", the answer has always been user-funded content. This was true in the era of print/TV/radio, and it's true today. Every ad-supported model will always optimize for emotional manipulation over any other form of utility to the consumer. For some types of content (i.e. fiction and pure entertainment) this works out great, and for others (i.e. news, reviews, investigative reporting) it will always eventually lead to disaster.
Validating headlines may not be as good a model as it seems.
First, what problem are you trying to solve? In this case, it's "How can I find good articles even with bad headlines?" So while the approach addresses headlines, the interest is in the content. So I'm not sure the proposed solution solves the perceived problem.
Second, what are the current solutions/workarounds to the problem? In my case, at least, the solution is blanket rejection of certain sites. I assume certain sites are so full of clickbait nonsense and/or partisan propaganda that I won't read them at all. The probably works better than some software that will consistently rate The Economist as good and anything from Infowars as nonsense (or worse, think the nonsense headline and the nonsense content are sympatico, so it's fine).
Third, what is the root of the problem? And the root is largely that people like their nonsense. People consistently read bad headlines and bad stories, often preferring them over respectable mainstream news.
And finally, how do you implement this? You clearly don't want something that can be gamed by crowdsourced campaigns, or it will be gamed. So you're either somehow relying on deep learning automation, or you're relying on human editorial effort. The former is unreliable, the latter is expensive, and itself prone to both bias and rejection (consider how many people consider Snopes to be untrustworthy).
I dunno. Maybe there's a great business or social idea here. But it's going to take some deeper thinking.
This is a very good idea: "someone, or some company, or some open source community ought to build software that parses headlines and the stories that follow and rate them for how well the headline represents the article."
I think the cure would be worse than the disease in this case. Fake news is a fake problem, or at least not a new one. We've had the National Enquirer and similar publications in every supermarket checkout for decades.
What I would really like to see is a way that the original source of a story is promoted or easier to find. Too often I see a headline online and click it only to see that the entire story is "according to site x..." Then I go to site X and see that its story is "according to site y..." and so on.
While I know that some subsequent stories can do original reporting, too often sites with better SEO just republish stories without adding much and, whether intentional or not, often distorting some part of the actual story
I also have seen hundreds of stories written about me,
USV, and our portfolio companies that have sensational
and often inaccurate headlines followed by stories that
are essentially correct and well reported. It drives me
nuts but I don’t often do much about it.
Subjectively, this is not what I see. Instead I find that junky headlines go with junky articles. That would still be an interesting thing to try to objectively quantify, but different from what the author has observed.
Are there any free apis/rss feeds for newsline breaking stories?
Reuters and other publishers have rss feeds however they are split accross many categories and also have strict ToU. I have been trying to find news & event feeds that are free to consume; ideally with a headline and article, but simply a blast like "3 trapped in hiking incident in montana cavern" would be useful.
It would be nice if URLs were a living thing; that upon loading a page, the browser had to pull live metadata for that URL indicating the title and end URL. No more broken links, and the original content provider could retain control of the headline.
Headlines are not one monolithic thing, any more than publishers are.
There are giant worldwide news organizations, there are personal blog publishers, and everything in between.
They all have different constraints and different motivations for their headlines.
Even if you look at just one type of publisher -- a traditional print newspaper with an online presence -- you are likely to find that they write multiple headlines for each story, depending on where that headline will be seen.
For instance, they might write one headline for the print version of the story that's constrained by page layout requirements.
Then then might write another headline that gets displayed on their homepage. It has to be short, punchy, and eye-catching.
Then they might write another longer, more complete, SEO-friendly headline that gets displayed when you actually click on the link to the story.
There might also be an alternate headline that gets displayed as the HTML/meta title of the page, which can be useful for when the link is shared via social media.
And all of those have their specific purposes and limitations.
Another wrinkle: In the case of news organizations, it's highly likely that the person who writes the headline is not the person who writes the article, whereas typically with personal blogs the same person writes both the story text and the title.
So if inaccurate or sensationalistic headlines are one problem, then another problem is treating headlines as if they are all the same. They're not.