The investment fidelity of this information is likely pretty high - not necessarily with this analysis ... but investment picks from topics popular on hn (ex: tesla, bitcoin, apple, amazon[ec2]) were ahead of the market.
Products, services, or companies repeatedly lauded in the comment section, in my experience, are remarkably indicative of future broader trends.
For instance, this user, in 2010, lamented about the rampant bitcoin discussions as excessively overflowing on hn like some irritating internet meme: https://news.ycombinator.com/item?id=1998630 ... at the time of posting, bitcoins were selling for $0.06 each. Would it have been a smart idea to buy 10,000 after reading that? Probably.
I can imagine an arb-style subscription to the right sql queries could be packaged and resold for extremely good profit to the right people.
Would it have been a smart idea to buy 10,000 after reading that? Probably.
The same signal would have also fired, much more strongly, from August 2013 through December 2013. The LPs of the VC firms who share your view of its predictive power are presently not very happy.
That's a super interesting thought. You should consider that the sum total of popularity of topics on HN up till today can't be used in hindsight as a predictor. It would be interesting to see if we merely looked for past spikes in keywords and used that to govern investment decisions. Even then, I fear that for every "bitcoin" and "apple", there may be other technologies and companies (especially smaller startups) that didn't work out so well, although I hypothesize a net positive.
Despite it being public data, because the information circulated on HN is at the core of technology, it could prove valuable to investors with limited knowledge of it (and might well be worth packaging and selling, haha).
I'd like to postulate that the average disposable income of an active hn user is probably, with respect to forums of the same class as hn (metafilter, reddit, digg, etc) one of the highest. (There's been historical self-reported polls eg. https://news.ycombinator.com/item?id=6464725 - 44% of respondents are in the top 10% income, ~25% are in the top 5% and ~4% are in the top 1%)
I'd also like to postulate that if you were to segment the market into "early adopters", hn would have a larger share of this segment then other forums in the same class, of an equivalent or greater volume of traffic.
If this postulation is correct, then effectively hn is "trendsetters with money" ... a good group to listen to.
I don't have data to back these claims up, but intuitively I feel they are pretty safe.
This of course doesn't give any indication of market velocity. I've done a number of investments based on HN at the wrong velocity - I presumed the stock had been undervalued because of hn content, when in fact, the market had YET to undervalue it. I forecasted a distant chance of success given an undervalued stock (in this case blackberry) - knowing that they were going to do an android with a physical keyboard <eventually>, and I invested upon this speculation --- well before the market doubted the future of the company.
As a result, I bought it way early and it fell precipitously and is only rebounding slightly now. So no, this isn't a magic sauce to time the events or how they will affect the market price, just perhaps one to forecast their eventuality.
The Pokemon story you show in an example (and you wrote and submitted) looked interesting so I looked it up. I recall now that I had started reading it but never got through more than the beginning, because I got totally sidetracked by Twitch plays Pokemon which you linked to in the very beginning of your article. I guess I get to revisit and read that, so thanks. ;)
Granted, the primary reason I had written that Pokemon article was because the response of tech media outlets was essentially "lol weirdos" when the mechanics are pretty interesting.
I've always been interested in seeing a statistic that shows how often the top comment is a negative comment that attempts to controvert the original story.
Didn't someone try to run stats on this recently? Maybe it was the post that announced the same HN dataset was on BigQuery? I recall that some people weren't convinced by the accuracy of the sentiment analysis of the top comments though. I'll see if I can find it.
Edit: I think this[1] is it: "Hacker News as a case study to test the wisdom of the crowd theory". Not quite what you were asking, but you might find it interesting.
My old Posterous blog is one of the top domains ranked by average upvotes. That says something about the time when I was a better and/or more prolific essayist... And something about walled gardens.
I still haven't found anything that made it so easy for a regular person to become a better essayist, in volume or quality, than Posterous and its auto email feature. Boy, how I miss it.
I'm a lazy programmer who would love to blog again, but needs something as easy as Posterous. Any suggestions, anyone?
FWIW raganwald, you're one of those who people should continue to pay attention to, even 140 characters at a time. I know I do. Please keep 'em coming!
I wrote that feature as part of an unrelated product in Rails over the course of a few days with the help of mailguns email-to-POST-request feature.
It's free for up to 200 emails per day. The most difficult parts were formatting issues between different HTML e-mail programs that you can probably ignore for your use case.
Edit: Oops, you said regular person. Didn't read carefully enough.
> As of 13th October, 2015, out of nearly 2 million Hacker News (1,959,809) submissions, merely 217 have managed to rake up over 1000 upvotes. That's about one out of every 2000 posts.
On the graph of total posts over the days of the week, do you know what time and timezone are the peaks? it seems very regular, like if only one/a few timezones where concerned. Do we have such a little posting power in Europe ... ?
I should've mentioned that all the times are in UTC. I'll work on normalizing them to PST - it's pretty confusing right now. Thanks for letting me know!
In a way reddit is the ultimate model. When the main room gets to big you can go and make a new room (subreddits), but still in the same house where everyone else is. Brilliant model in my opinion and one that I believe will be followed by successful future discussion board systems in years to come.
An in-between option is the http://lobste.rs model of having tags on stories, and letting users filter on tags. Allows me to ignore a few topics I just don't care about, without really splitting the community (some of the community opts-out of a few topics here and there, but it's by and large one community).
I've been meaning to do a content analysis for most popular animal among HN users, based on subject in headlines. My guess is something along this order:
Interesting to see who some top usernames are. Also interesting how little I care who anyone who posts here actually is in real life. All about that post quality, gents.
By "contributors", the linked post means article submissions rather than comments, and grellas doesn't submit a lot of articles.
I wrote an overview of the 20 users with most total karma points (submissions+comments) about two years ago, which he is on when you count that way. Maybe still interesting: http://www.kmjn.org/notes/hacker_news_posters.html
> With a runaway total of over 7000 posts on Hacker News, Clement Wan averages 2.24 posts a day since Hacker News took off (It's been 3,158 days since Feb 19, 2007). Two very mysterious users appear on this list.
Is this submissions and comments, or just subs, or just comments?
SELECT
author,
COUNT(1) AS c
FROM
[fh-bigquery:hackernews.stories]
WHERE
author IS NOT NULL
GROUP BY
1
ORDER BY
2 DESC
LIMIT
1000
and armed with the knowledge that HN has been in existence for 3158 days, there are 11 people who post strictly more than once a day. They are:
1 cwan 7077
2 shawndumas 6602
3 evo_9 5659
4 nickb 4322
5 iProject 4266
6 bootload 4212
7 edw519 3844
8 ColinWright 3766
9 nreece 3724
10 tokenadult 3659
11 Garbage 3538
Just under 1 a day: robg 3121
The one time pg got super mad at me was when I triggered the second Erlang stampede. It was the evening of Demo Day by the time he saw the front page full of nothing but Erlang stories and he had to go through them on his phone and kill them all manually. He then searched to figure out who had started it and... mea culpa.
I have to disagree with the most upvoted contributors in the article. The #1 on here has over 200,000 karma points. https://news.ycombinator.com/leaders
Since the dataset is derived from the official HN API, there is no tabulation for Comment Karma, which will result in misleading rankings if attempting to reverse-engineer overall karma.
I don't know who he is, but he's not Paul Graham. The story behind that is that pg emailed me on April Fool's asking me to help him with a hoax to make it look like he was really nickb, who was the most prolific contributor on the site at the time. pg just manually changed the account name on a reply to make it look like he was accidentally replying under the wrong username, and my job was to submit a story looking like I had discovered this.
(This was also already publicly discussed somewhere on HN previously, albeit several years ago.)
It's now part of "ancHNt history" (nickb hasn't posted for 6+ years). I did recall some discussion about nickb = pg, but don't think I had seen the 'smoking gun'. Noting how Reddit was started, I'm neither surprised nor concerned if there was such an account in the early days (either by pg or by the yc partners).
[Edit] Dammit. Now I've read Alex3917's response, I wish I'd done the accurate calculations on that "Smoking Gun" link to note it was posted on 1 April, 2008.
9 years ago, separation of presentation and content was already considered a good practice. Yet here we are with application frameworks and component-based designs that throw it all out the window...
To expect any consistent design principles on a development medium as ad-hoc and devoid of principles as the web, is wishful thinking.
HN isn't about using good practices. It's about getting to the heart of the matter. Content is content, who cares how it's displayed, for better or worse. But people keep showing up. So it must be working just fine. If it ain't broke, don't fix it.
Products, services, or companies repeatedly lauded in the comment section, in my experience, are remarkably indicative of future broader trends.
For instance, this user, in 2010, lamented about the rampant bitcoin discussions as excessively overflowing on hn like some irritating internet meme: https://news.ycombinator.com/item?id=1998630 ... at the time of posting, bitcoins were selling for $0.06 each. Would it have been a smart idea to buy 10,000 after reading that? Probably.
I can imagine an arb-style subscription to the right sql queries could be packaged and resold for extremely good profit to the right people.