Hacker News new | past | comments | ask | show | jobs | submit login
Analysis of Stack Overflow's Survey of 10K developers (statwing.com)
140 points by glaugh on April 3, 2013 | hide | past | favorite | 61 comments



This is some really interesting data. I could spend hours on this site given some interesting things to look at. I really wish SO had included salary in their survey.

Some feedback since I see you are the cofounder of statwing:

1. I'd really love to be able to share one specific stat/relation with a friend. I found the relationship between "Career / Job Satisfaction and Number of Employees at Company" to be very interesting, and wanted to share just that one with a friend. Either include a share button on each relation, or update the url to reflect exactly what I'm looking at so that if I send it to someone it'll take them to the same page.

2. Your front page looks like it was written for data producers. Write it instead for consumers. I would be much more likely to come to your site again and again if it showed something similar to Quora: a list of most popular (by views, upvotes, recency, or ideally a combination of all three) stats. Stuff that a lot of people have found interesting and I will probably find interesting so I can go look for myself.

Edit: I just realized statwing is pretty much only for privately analyzing your data. That's too bad. I could see it being a really entertaining site for public data.


Hey, thanks for the feedback.

#1 resonates a lot. If you upload your own data, then we make it really easy to share a few specific analyses (sort of like how I shared a few analyses via this link). But we don't currently have that ability to let folks share analyses of a dataset that they didn't upload themselves. Sounds like we should, we're wasting a good opportunity for you to tell your friend about us.

#2 is also interesting. Obviously from a marketing perspective it's useful for us to get people sharing public datasets. Perhaps there's a way to walk the balance between analysis of private datasets and sharing of public ones.

Thanks again.


Really fascinating website. I agree with arassmussen's feedback. I'd also add that while spending the past five minutes playing around with the website, I'm consistently hitting Describe when I mean Relate. I wonder whether, if you looked at your logs, you'd find a high occurrence of Describe requests followed by Relate requests within 1-2 seconds.


Is that because of misclicking, the order of the buttons, or that it just feels like "Describe" should be the way to get the result you're looking for? (Or something else?).

Thanks for the feedback.


I think it's the order. The primary action I want is the one that connects two columns, but it's second in the order. I click on the first button without thinking what word is on it.

I'm not saying you should switch the order, though. In some way, it makes sense that you should first Describe a column before Relating it to something else. But I think some usability testing would be useful.

I also didn't immediately understand that the different actions would 'stack up' on the right-hand pane, showing everything you've looked at so far. That's really useful, and the pop-up tips were good too, although I had to restrain my instinct to automatically close/turn them off.

It would be nice if, in the Describe summaries, you could sort/order the data. E.g., I'd like to sort the Job Title by Count, instead of alphabetic order of Job Title.

After clearing the data, I wanted to relate three different things together. It wasn't clear how to do that; the Relate button was disabled. I tried Describe, then figured out that Relate To Each (which is clearly labeled!) was the right option. I think I would have preferred not splitting the Relate button into two, Relate and Relate To Each.

It's a neat site, and I hope you continue providing demos.


Awesome. Really appreciate that feedback. Thanks a bunch.

Edit: Also, we do actually allow you to sort the descriptive by Count, alphabetical, or manually chosen. There should be a "Sort" button to the upper right of the table.


The bar chart is sortable, but the table isn't, as far as I can tell.

Also the default sorting for compensation isn't that good. It should be in numerical order, but it's in some sort of alphabetical order.


What kind of confidence interval indicates it's statistically significant? In the sciences, in my experience, a lot of times you see 95% as the standard of confirming correlation, or is it something else?

This is awesome.


Yeah, we use 95% for all out stat testing. If you click the "Advanced" tab of any analysis you can see the actual p-value, too.


They did. See Compensation (with Bonus) right under Career / Job Satisfaction?


From playing around with the data, Stack Overflow reputation was not strongly correlated with compensation. Instead, experience (and naturally age) exhibited a stronger relationship with getting paid.


It's pretty obvious, but country is strongly correlated with compensation.

If we want to get paid, we have to move (or get a job) in the U.S. or Australia (didn't know Australia was that well-paying)


If you adjust for COL, it's not close. The U.S. dominates. AUS has very high minimum wages which drives up costs. Look at Steam games in AUD vs. USD.


AUS is interesting. Everyone lives in VERY large cities, they're very far from many places where vegetables are grown, they have had a HUGE mining boom (which means more money chasing fewer goods, and the cities are incredibly far from one another)

Add onto that the Parallel Import Tax (which prevents people from just importing it themselves), and you have an INCREDIBLY captive market, getting gouged on all corners. Sure the high minwages dive costs up, but the high minwages are needed for the high costs too that it takes to live there.

They're quite recently dropping lots of the items off the PIT list, and prices are going down. Imagine that, drop protectionist tariffs, and prices decrease.


You might want to consider some other comparator — I like the cost of a combo meal at Mcdonalds (The Economist uses the price of a Big Nac). In my experience, consumer electronics are cheaper in the US but good produce is cheaper in Australia.


When I was in Geelong bananas were $12/kg, and in Atlanta they were around $0.50/lb. Other produce seemed quite expensive, also.


Was this sometime after the Queensland flooding? Bananas from a major supermarket are now between $3 and $6 per kg depending on the kind, but it did spike for a period due to the flooding damaging crops I believe.


Yes, it was around that time.


Yeah that destroyed a very large portion of the banana farms in .au and the banana benders up in qld have lobbied well enough to restrict imports


It has almost nothing to do with the domestic IT product market and everything to do with demand. Australia has a very high demand for talented programmers at all levels. It really surprised me when I came back just how many companies are using node.js or Ruby on Rails. The steam games, etc, has more to do with regional pricing. So many products are more expensive in Australia for almost no reason other than that people will still pay for it.


I did not find there was high demand or high salaries for a programmer knowing Node & RoR in Australia 2 years ago. There were programming jobs, but they seemed to be mostly IT department style jobs. The job boards were dominated by recruiters and staffing firms so it was hard to figure out what the positions actually entailed.

It seemed like lower level tradesmen such as tilers made as much as programmers, and higher level tradesmen, like master electricians, made significantly more. All forms of tradesman seemed to be more in demand than programmers.

Perhaps things have changed significantly in the last 2 years?


I agree with this, although I live in one of the smaller Australian cities and cannot move for various reasons. I am desperately trying to find a reasonable programming job, but having quite a hard time. Pretty much all the jobs around here are IT department style, with long winded garble on job boards, and silly requirements about - just from memory, I've seen things like "must be familiar with at least one of C, C++, Java, XML" (Apparently XML is a general purpose language) or "Knowledge of C++ or Delphi required" (What these two have in common I'll never know). I'm sure such ads exist in other places as well, but there seem to be an unnervingly high proportion around here.

I'm not sure what it's like in Sydney or Melbourne, however. Perhaps it's better there...


I don't think you mean to suggest that working families should live in poverty (America, heck yeah!) so the leisure class can spend less on video games.


Video games is a metric that is easy to verify and familiar to this audience. Sub in a loaf of bread if it is more palatable to your sensibilities.


I disagree with the suggestion that the American minimum wage is in a better place because it is lower. Keeping working people poor so wealthier people can afford more things (whether they are things the likes of me understand like bread (...) or things "this audience" understands, like video games) is gross.

I suspect that you and the original commenter think extremely impoverished American children are an acceptable price to pay for cheap video games. Like I said: America. Heck yeah!


Properly interpreting this data is hard. For instance, the relation between 'Desktop OS' and 'Compensation' suggests Windows 8 users earn significantly more, on average, than Linux users. However, if you dig a bit further, you find that Linux use is much higher in countries where 'Compensation' is much lower across the board. So the relationship between 'Desktop OS' and 'Compensation' is a proxy for the relationship between 'Country' and 'Compensation', which makes it much less surprising.


Agreed. Right now the best way to handle that is to filter your analysis in a way that accounts for confounding variables. So in the example you gave, you'd run the Desktop OS vs. Compensation analysis then filter for Country == USA only.

That's a pretty rough way to deal with that issue. Ideally you'd run a regression that more subtly takes country into account, without losing data. Unfortunately that's not possible in Statwing (currently...).


"There is a very weak but statistically significant relationship between Owns: iPhone and Using: Node.js"

Hipster Hackers.


But it's actually the devs that don't own iphones that are more likely to have used node.

Since I'm here, I'll speak up as someone who has an iphone and uses node.


Actually, the

    'percentage of iPhone owners using node' out of 'total iPhone owners'
is greater than

    'percentage of non-iPhone owners using node' out of  'total non-iPhone owners'
by around 1%

But since there are more than 3 times as many non-iPhone owners than iPhone owners, a Node user is more likely to not own an iPhone.


Considering I just picked up my first iPhone and started playing with node.js this week, does that make me weird?


Statistically, no.


Ok here's an analysis on job satisfaction that I found interesting.

1. very weak relationship to high compensation. Not completely surprising and in-line with much of the research.

2. Here's a hard one. No significance of consumer device (tablet, gaming console, etc) EXCEPT for Apple devices (albiet a weak one). All the fanboys are probably nodding their heads but I challenge you to explain it rationally.

3. The importance of work/life balance seems to show in the data with weak relationships to job satisfaction but its mostly around hours spent working and commuting. Why is it only a weak relationship?

4. Suprisingly, life at work seems to be less related to job satisfaction. With only very weak relationships to things like quality of office space, bureaucracy, quality of workstation, # of meetings, opportunities to work on new tech, and opportunities for growth. I wonder if perhaps the data is skewed by the averages. I wonder what the data would look like if you separated tier 1 developers from everyone else.


Windows 8 devs were as happy as Mac devs, which was interesting. No-one else was close. Also note that Mac devs tend to work in smaller shops, which is correlated to happiness.

Game devs were the happiest devs. So - work for a small mac game dev.


I wish there was a way to filter by full-time vs part-time/intern. Filtering by country USA, a plurality of developers under 25 are making under $40,000. I am assuming that a good portion of them are interns.


People in the advertising industry who think their job is very important:

53.1%


That can't be serious.


More exprienced developers spend more time refactoring code? Does that mean that they're working on crappier code, established code bases, or higher end applications where tech debt really matters?

Conversely, are the younger guys spending less time because they don't know any better, working on small projects where tech debt isn't a priority, or working on completely new code.

My guess it's a reflection of the type of work being done.


The more less experienced developers you work with, the more time you spend refactoring the code they write.


What does "non-negotiable" mean? It's positioned between "don't care" and "not very important".


Edit: This issue has been fixed.

Original comment: Sorry, that's a presentation issue on our end. Non-negotiable is the highest value, and if you Relate one of the variables with that as a response option to any other variable, you'd see everything in the correct order.

We currently have a small issue where that order isn't being correctly displayed by default if you do a Describe on one of those variables. We're working on fixing that. In the meantime you can manually Sort --> Manual and that chart will display appropriately.

Thanks for the feedback, much appreciated.


I think I asked this last time I saw a Statwing analysis, but are you guys accounting for multiple comparisons anywhere? The actual p-values are quite robust here, but I can generate some pretty spurious analysis by just running every comparison at once.


We still don't account for them (except in the context of ANOVA post hocs).

The thinking is that (1) we're similar to other stats tools in that it's incumbent on the user to account for that, (2) a 'practical' version of accounting for multiple comparisons is to just be aware of them (as per your "p-values are quite robust here" comment), and (3) eventually this will be a really cool opportunity for us to stand out, and we do plan on eventually accounting for them--we just haven't really been able to prioritize it at this point.

Thanks for the feedback, we're very happy to have that comment brought up quite a bit, it's definitely really important, especially given the goal of democratizing data analysis.


I think I just always bring it up since your tool moves directly to data mining operations and I know my presentation of so many results will be highly likely to have false positives.


Yeah.. I thought the same. People reporting stats to the public at large lack incentives to adjust for multiple testing: most of their audience do not know enough to care; and to them, the value of the product improves with the number of 'insights' found. It is perverse but real.


I would expect age to be somewhat skewed toward 20-somethings, but not so strongly. Makes me wonder if this is an accurate picture of developers in the US.


Yeah, in an ideal world we'd have found the real-world stats for that kind of information and weighted the responses appropriately. Though of course we'd still have selection bias questions of who chooses to answer the survey or not.


Those with more experience (and/or older) spend less time on StackOverflow, and/or less time answering surveys.


This probably has more to do with the environment that these developers learned to code in. Stack Overflow was a huge part of my journey as a programmer, but their journey probably included more reference books and colleagues.


My programming journey started more than a decade before Stack Overflow came into existence, but Stack Overflow-style forums were always a very big part of it, so the concept is certainly not new to me.

I've found some utility in the site when I want to see a quick API reference example from a popular library, but find most of the content to be not particularly interesting to me at this stage. I have found that there is minimal content in the subjects I am interested in today, with more dead ends than answers, leaving very little draw to the site for me.

Based on my own history, I feel like if I had started programming when Stack Overflow was an available resource I would still be at the stage where the kind of content on the site would be incredibly fascinating. I used to eat that stuff up like it was going out of style. But at some point you start to see the same patterns and it all begins to feel repetitive, even if the names have changed.

Perhaps people with longer programming histories simply outgrow Stack Overflow? I still turn to it once in a while, but don't see it as the incredible resource many others seem to.


That would assume that developers who stop learning remain employed as developers.


This assumes that Stack Overflow is the only way to continue learning as a developer.


This just in-- there is a strong and significant relationship between age and years of experience. Who knew?!

Also, job satisfaction and new feature dev!


5k+ using jQuery but only 3k+ using JavaScript?


Maybe about 2.000 people don't know jQuery IS JavaScript...


I'm sure there are more than two ;)


Not every country uses the comma as the thousands separator.


The decimal is used as a comma in a few places.



How many are using javascript and not jQuery then (i.e. more than -2000?)


This is disappointing for me, C# developers tend to make less. Then again, young people tend to make less (definitely causation) and C# developers tend to be young, so maybe there's no causation there. Or maybe there is and that's why older people get out of C# development!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: