While I dig the idea, it's important to note a few issues with the dataset. Take the presented data with a huge grain of salt.
First, many repositories are not a single language. For example, this PHP framework is reported as a CSS project [0]. While it has more lines of CSS than PHP, it only has a single CSS file [1].
Second, GitHub has a problem with correctly identifying programming languages. For example, PrimeCoin [1] is identified as one of the most popular TypeScript repositories, but it has 0 lines of TypeScript code. Instead, it has... large localization files with the extension *.ts [2]. BitCoin used to have the same problem, but it looks like GitHub hack fixed it for that particular repository as less popular forks of BitCoin still have this issue.
It took me a few minutes to find these examples, just by examining trending repositories [4]. I'm sure there are many more. So do not be rash in drawing conclusions from this data! :)
I have discovered this as well. Although 100% of my own GitHub projects[1] are JavaScript, more than half appear as CSS due to the included documentation.
Ideally, the project manager should be able to define the language composition in their own projects. Something GitHub should consider IMO.
If I noticed a repository was classified incorrectly I'd open an issue asking the maintainer to rectify it, if it was possible. They would likely do this simply because someone asked. If not that, then simply for it to appear correctly in the searches.
Why open source at all, what do they get out of it?
Lots of data right there, and nicely visualised at that, only what it actually means is unfathomable without knowing any broader context.
For instance: C++ has the greatest number of opened issues per repository, then comes Rust, then Scala. All right.
Does it indicate that they're more tricky than others and hence more bug reports?
Or perhaps that projects written in these languages are under more intense scrutiny?
Or that people watching these repositories are just more eager to step up and file an issue instead of sulking in silence (a trait of programming culture surrounding these languages)?
And so on, and so on.
Or it could be one in case of C++, another in case of Rust - since they differ under so many aspects.
Wide field for wildest speculations, but no meaningful correlations identified.
Yes, context is key! In addition to what steveklabnik said about Rust, the other interesting bit of context is that the deluge of breaking changes[1] has resulted in a very interesting phenomenon: everyone chips in and submits PRs to library dependencies that haven't been updated by their authors yet. The pace is so fast that most of the time, in my case, it's just a matter of "I haven't gotten home from work yet and there are already 3 PRs against my repos with fixes for the latest Rust nightly." As a result, many of my Rust projects have an oddly disproportionate number of contributors/pull requests relative to other projects I have that are more popular.
[1] - Breaking changes have simmered down a lot in the last few weeks. We still have one more semi-big one in front of us, but hopefully smooth sailing from there...
Exactly. And I feel this graph gives an idea of what the tendencies for a language is, and then you can research the context. You can glimpse a lot of fun things from this graph.
Rust is one of the heavier users of Issues. I've started a pretty huge re categorization and triage effort, and when I started, we had around 2200 open. Earlier today I got us to 1899. Part of this is that we tracked things like 'it would be nice to have a library to do things like X,' and in preparation for 1.0, we're moving those over to our RFC repo instead, so that the main project has only bugs and internal enhancements.
... I guess I should also mention that not all of them are these kinds of issues, there's also lots of stuff, like in any tracker, that got fixed somewhere, but the issue was never closed. Or where a library that used to exist no longer does. Etc. Triage is drudge work, but really valuable for any big project.
For practical or business purposes, this is a nice bit of incomplete information to help make a decision. I want to take a serious, time-invested dive into a new statically compiled language, but which one should I pick? An old die-hard or the new-hotness? I could make a guess from reading the docs and such, but I'd also want to know community activity and support. This is a handy chart for getting a sense of that.
Or, I'm a business owner who just hired my first engineer. He's saying the backend should definitely be built in Groovy, or maybe he'd be willing to do Scala or something else, but definitely Groovy, yeah, Groovy man. I might be able to get a better idea of which would be beneficial for my long-term business prospects (hiring more engineers, etc) by checking out a chart like this as I might not have time to do real in-depth research.
As a scientist you require complete, sound and accurate statistical data. As a business practice (this site is about start-ups, no?) you need to be comfortable making serious and important decisions based off of incomplete and possibly inaccurate information because making fast decisions is often paramount. You can and will always make other fast decisions later and decide whether it's worth the effort to course-correct if you need to due to new and more accurate information.
This is maybe too deep an analysis of a fun little infographic, but as a former professional poker player who made a living off of incomplete information you got my cockles up.
> "For practical or business purposes, this is a nice bit of incomplete information to help make a decision. I want to take a serious, time-invested dive into a new statically compiled language, but which one should I pick? An old die-hard or the new-hotness? I could make a guess from reading the docs and such, but I'd also want to know community activity and support. This is a handy chart for getting a sense of that."
No, not really, because you've no idea what assumptions are baked into the data. For some decisions, you can make fast, gut-based ones. For others, you need to take a much more considered and scientific approach. The difference can be defined by the ability to course-correct after-the-fact (the harder to course-correct, the more stringent the decision-making process). There's an entire academic (and military) discipline around decision-making processes and with good reason. People want to make good decisions as well as quick ones.
Anyone making business critical decisions based on this chart, without doing the extra work to understand the data, is basically lying to themselves. That's why vanity metrics and data-porn should be handled with extreme caution.
I'd also want to know community activity and support. This is a handy chart for getting a sense of that.
The entire chart? Wouldn't the first column be sufficient? Number of repositories gives you some idea about language popularity.
Well, kind of: there's bias of hype here. Obviously choices behind open-sourced projects on GitHub aren't representative for the industry. It's the software's world avant-garde, if anything.
And even so, that's just one parameter out of five, and it can be very well be considered in isolation from all the rest.
I wouldn't make business decisions based, for instance, on the average number of open issues. Because it's an outcome of many different variables. So how would you know what it means? Is high good? Is high bad?
Interrelations between data - shown by this clever chart - are even more mysterious.
TeX has a very high number of pushes per repository (second best), while there's fairly few repositories, and they are rarely forked.
At the same time R has low number of pushes (second lowest of all), whereas it wins in the "new forks per repository" category (#1).
> For practical or business purposes, this is a nice bit of incomplete information to help make a decision
Titbits of incomplete information are often placed as a result of publicity campaigns. In the specific case of the github source info for this graphic, the languages near the bottom of the list can easily have that information manipulated by their backers scamming the stats. All you need is one change to be pushed during the measured period for a project to be registered as active, a data point which I know is being scammed for at least one language near the bottom of the list.
> For practical or business purposes, this is a nice bit of incomplete information to help make a decision.
Is it? Does the fact that people open lots of issues in C++, Rust and Scala make you more or less inclined to pick one of those for your new project? Why?
I'm all for making the best use we can out of incomplete or noisy data, but that stat doesn't tell you anything, it's just a number.
Indeed. This is a nice example of data porn, in that we can pore over graphs and trends and make lots of speculations (which will conveniently confirm our pre-existing biases). It's visualisation, not analysis so we should take it with a large pinch of salt.
Having said that, I do think the visualisation is beautiful and there are definitely useful things that could be drawn from the data if someone were willing to do the extra work. However, I'm not sure I have much faith in the data quality e.g. some of my repos are considered 'CSS' just because I've added some boilerplate from elsewhere.
What's interesting to me is that there seems to be an inverse correlation between pushes per repository and the number of forks. Are forks counted as individual repositories? That would be a boring explanation.
Am I the only one that doesn't like the visualization? It seems like it would be fundamentally better if each bar was simply labeled instead of connected via line. Mouseover could highlight the same language in the other categories to get the cross-category information.
The question "What is ranked above Ruby for New Watchers Per Repository?" seems to be a question this dataset should be answering, but it is enormously difficult to parse here.
Thanks for giving me a term to google. Also your point about sorting is vital, and would make this type of chart very useful for understanding relationships.
I agree. I don't think interaction should be necessary in a good visualization for most questions. It should add usability on top of an already usable graphic. This graphic would not be usable if published in a paper.
- Dart (I guess the lack of native browser support is the killer here)
- Typescript (I'm surprised this didn't take off)
- Puppet (Interesting.)
- ActionScript (obvious now that Flash is dead)
- Scheme
- Common Lisp
- D
- Fortran
- Logos (huh?)
(I know near flat is subjective, but still these are the languages that are not seeing much growth in 2014, and what likely isn't growing strong in 2014, is likely to continue that trend in 2015.)
Going to state the obvious and say JavaScript. For all of the obvious reasons but also because ES6 is going to make it more palpable for those who formerly found it distasteful.
The decrease of TypeScript is probably thanks to GitHub recognizing less and less C++ projects as TypeScript (Qt localization files has .ts extension which GitHub counts as TypeScript files, although in less and less cases).
So pretty much almost every well-designed language that I like or was curious about is losing popularity in favor of Java and JS. Sigh. I remember when Ruby was #2 on GitHub. Those were the days.
Puppet drop may be due to options. Was just Chef, now is Ansible, Salt and even Docker.
And totally agree that Ruby is surprising. I'm a Pythonista myself, but always thought Ruby was fairly comparable if having a different approach. I don't have enough experience with it though to understand the possible reasons for the drop.
GitHub was adopted early by the Ruby community, and it had a disproportionate number of Ruby projects when compared to other version control hosts (Sourceforge, etc.). A lot of projects are moving to GitHub from other hosts now, and Ruby pretty much had nowhere to go but down as a percentage of the total.
Also, the tendency for many small Rubygems (and Bundler's support for installing gems from git) meant you had many more repos than you would for languages like Java, where it's pretty common to build multiple jars out of a single repo. The npm community seems to be if anything even more prolific in producing large quantities of very narrowly tailored libraries.
I think this is a case where the pie has just gotten bigger, rather than anyone's piece getting smaller.
Languages have momentum (growth, static, decay), and to change the momentum, something big has to happen, and usually big things do not happen. Past performance can generally be used as a predictor of future performance.
Ruby has been continually decreasing on Github for a little while now. It makes more sense when you realize that Github was dominated by Ruby when it started out and other languages are still coming on to the platform in proportion. Try switching the view between percentage and total number and it becomes more clear.
on HN trends up to 2012 it was about the same as python or sometimes higher now Ruby is way below. The general trendline of up is the result of HN's Who is hiring thread popularity and increasing number of job posts in absolute numbers. Here it's more evident, Ruby was the most popular language 3 years ago
http://www.ryan-williams.net/hacker-news-hiring-trends/2015/...
OCaml desperately needs some wind on its sail. It fares poorly than PowerShell in terms of # of repos, and that says it all really. Compared to Haskell and Clojure, which are soaring to put it mildly.
I think OCaml is doing pretty well for itself. For example see the graph of package-growth at [1] and the recent news from people at Facebook about Hack and Flow (both in OCaml). Not all repos are on GitHub and it's not really fair to expect them to be — especially for the sake of vanity metrics, such as this visualisation.
In addition, some of the repos that have OCaml code may not be recorded as such. Repos where the 'brains' is in an expressive language might be overshadowed by boilerplate from elsewhere.
Thanks, I was starting to get a bit depressing seeing the stats on the website. In fact, I was shocked to find OCaml so far off from Clojure.
I think the adoption problem for OCaml is compounded because it suffers from lack of stackoverflow hits for any given errors that you might encounter or any given queries you might have. Searching for something as mundane as "how to read large files in OCaml" leads to just a single hit (Streams at OCaml.org) [0].
Also, OCaml needs a "recipe/patterns" book-- on how-to get some of the things done the right way in OCaml.
> Compared to Haskell and Clojure, which are soaring to put it mildly.
Soaring comparatively.
I am a big fan of OCaml, but I think one thing this infographic is heavily biased by is ease of adoption for programmers of all levels. Javascript, Python and others in the top all have that. OCaml and the like are all a bit steep on that front.
However if you find OCaml in the tiny graph on the bottom you'll see that it's steadily increasing at least. Up about 50% in active repos from 2013 to 2014.
Comparatively, yes. Clojure and Haskell have adoption problems too. But I think the community is growing stronger with each passing day. That simply isn't happening with OCaml.
Err... based on what exactly? I've been working in the OCaml community for several years now and it's going from strength to strength. Do we have different definitions of 'strength' or something?
Based on blog posts that show up, based on Google results that show when you try to find tutorials on how to get certain things done (in a production environment, that is).
Also, another thing that's peculiar with OCaml is that a lot of libraries are LGPL 3+ which makes it that much harder for corporates to adopt. And sometimes, alternatives to certain libraries are either hard to find, or are not actively maintained. It could also be that, I have been looking in all the wrong places.
R's bump in Q1Y14 is probably when CRAN, R's largest "official" repo archive pushed all of its packages to Github. Pretty neat to see the volume right there.
I think there's steady growth in active R repos that's masked by the huge impact of Hopkin's Coursera course on reproducible research. Tens of thousands of people fork a repo for a homework (https://github.com/search?q=forks%3A%3E30000&type=Repositori...), and then never touch it again
Part of that is the fact that many R packages are actually predominantly written in C++, C, Java, etc..., so they show up in Github searches as being written in a language other than R...
Agreed. I like the sort of informal feel to it, calling back to a sketch.
I'm a big fan of the parallel lines chart, and this one is well executed. The data labels are unobtrusive, appearing on hover to let you dive in. The data set is coordinated with navigation on the timeline above using the principle of object constancy [0]. I really like that you can click a language to pin it; you can focus on a few languages and watch their evolution over time by scrubbing the line chart. (I don't like that if a pinned language falls off the chart at one point in time it isn't restored when you go back to a time that it's on the chart.)
I like the idea with the small multiples below, but I wish there was less wasted space; it's hard to see very many at one time. There's not really a need for full-blown small multiples here - vertically-aligned sparklines would be more effective. If they were in the same table as the parallel lines it would allow a deeper exploration of the data.
I've been trying to learn Rust myself by spending ~30 mins everyday on on it for the past 2 months. It's strange how simple it is to make something, but it's hard if you have no experience in functional programming.
I can relate to your pain. I came back to hardcore Scala, 2 years after a very quick introduction to functional programming. But, in the end, every seconds of head banging was absolutely worth it. Once you step into The Immutable Functional World, it will change the way you design software and there is no coming back :)
Or they use them for getting work done and not for open source projects on GitHub. Clojure seems to be one of those languages where you can be just as productive creating the software yourself from whole-cloth as trying to grok someone else's DSL.
If that's true then large bodies of application specific code would exist off the GitHut radar, I suspect.
I think that these statistics are a bit under-rated and a bit misleading
-under-rated:
CSS: has 80% more pushes than C++ WOW :O
Javascript: remains to be super for small projects but man this sure brings a tear to your eye when you see 10.69 pushes per repo i think i may have misunderstood JS alot
Safe Languages: are probably not as safe as we think
-misleading: the fact that this isn't talking in anyway about the industry itself but about the LOVE given to each programming language for the following reasons:
a)Developers in general contribute to opensource programming projects with the same concept gcc devs used when saying "compiling GCC as C++, we are writing code if you want it as C do so your self" as i understood it
b)Interest and Time and Location on Github diverge from reality:
Interest: Developers are interested in doing new things when it comes to Open Source so this may affect numbers alot
Time:time changes everything
Location: i think Github is number 1 place when it comes to Front-end programmers although every one likes it but in Javascript i think Github is the super man
I think module size is a big factor. I would predict that C++, Java and C# tend to be larger, more monolithic projects, whereas JavaScript and Ruby have more broken up module ecosystems. JavaScript especially, with its "modularity shaming"...
This is really good example of a useful slopegraph. I find so few of these in the wild thus I often fail to articulate the value of the approach such that a client will buy into the idea before I build it.
It would be incredibly enlightening to see what languages people are moving to/from. (like this for but for programming languages - https://www.facebook.com/notes/facebook-data-science/coordin...). I'd like to know what people are switching to from Ruby.
The top five languages were all created initially between 1991 and 1996. Is that by coincidence? Probably languages have a lifecycle and age matters a lot. The current top crop are about 20y.o. - just becoming adults. Would that mean that Swift and Rust will get in top 5 after year 2030?
I don't think that open issues/LoC is very relevant since some languages despite doing a lot in few lines of code(or even characters) still require an amount of mental effort similar to more verbose languages.
I very much like how GitHut used issues/commits. In my interpretation:
(1) If your project has a lot of commits and few issues it has a very high quality.
(2) If your project has a few of commits and many issues it has either very low quality or is not being developed.
(3) Having a lot of commits and a lot of issues and vice versa is kinda expected, since new features(commits) often introduce new bugs and small projects often have few of both.
When you cross that with popularity(new forks, new watchers) over the years you can narrow (2) with some confidence.
Using that approach is trickier when it comes to comparing languages, but the data GitHut gives seems to be in line with common knowledge, at least when it comes down to open source software and and when you compare the most popular languages.
Not so sure about that. Lots of commits and few issues could mean that it's cool or interesting somehow, but isn't actually being used for much. Vice versa for few commits and many issues.
Hard to say much for sure without breaking down the details, who's discovering the issues, how many are real, how many are serious/blockers versus minor annoyances with workarounds or feature requests, are the commits new features, bug fixes, refactoring, etc.
The point I am(should be) trying to make is that, the graphs tell us a lot more about how the projects on each language are developing than it does about the individual merits of the language it self.
Ruby's development started in 1993 (February 24, to be exact). First full release was 1995 (December 21, that one I had to look up :)).
This is very interesting in my opinion: whenever someone asks why one of those languages doesn't do $featureA "like Java", you can just reply: "because Java wasn't a thing back then".
Groovy's creation year is wrong, it should be 2003 not 2004. It was first announced by creator James Strachan on 29 August 2003, and its very first release (Groovy 1.0 beta 1) was on 11 December of that same year.
Unfortunately, someone who became a "despot" of the project at its repository (Codehaus) on 4 May 2004 started referring to himself as Groovy's creator in publicity articles about a year ago. A few months ago, someone even tried deleting the Wikipedia link to James Strachan's webpage announcing the Groovy Language.
It's interesting to me that so many Objective C repositories have so few pushes yet so many forks. I wonder if it's because companies like Facebook and Square "release" open source projects on Github then move on to something else.
like V-2's top-level comment explains, there are lots of ways to interpret these statistics and it's easy to jump to false conclusions if you don't take the time to look for context.
In Ruby's case, the total number of repositories on Github has continually increased -- it's just that since it was such a huge part of Github's early user base (the Rails community was probably the first big adopter of Github, which makes sense since Github itself is written in Rails) percentage-wise it has dropped significantly as more communities adopt Github.
Just a heads up; the page header (and footer?) does not render correctly on mobile. The top graph is centered and its data is impossible to read because of label overlapping. Interesting analysis nevertheless.
Swift wins the popularity contest: most watches per repository, third most forks per repository (R has most forks per repository). Anyone up for an iOS/Mac App with a statistics backend?
I'm not sure that kind of graph has a name, it looks like diagrams in circuits/code. My guess is that its a custom kind of graph inspired in the tools they knew for visualizing complex relationships.
But I'd like to learn more about those kinds of graph if I'm wrong :)
According to the link below a hammock plot is generalization of a parallel coordinate plot where the lines are replaced by rectangles that are proportional to the number of observations they represent. This plot is different from both the traditional parallel coordinate plot and the hammock plot since the category's width is proportional to it's activity, not the line width. Maybe it's type is still unnamed.
Tcl is still actively developed and has a sizable community interested in it. (see http://wiki.tcl.tk)
It doesn't show up much on Github since its main repositories are hosted on on Sourceforge and Fossil. That includes the core language and most of the major extensions. Check the wiki for details.
This confirmed some of my suspicions. Ruby seems to be in decline, just like Perl, but not declared dead yet, maybe in a few years. Python looks like it's getting to that point as well, hard to tell though, will be clear in a year. Go is growing fast and already ahead of Perl.
First, many repositories are not a single language. For example, this PHP framework is reported as a CSS project [0]. While it has more lines of CSS than PHP, it only has a single CSS file [1].
Second, GitHub has a problem with correctly identifying programming languages. For example, PrimeCoin [1] is identified as one of the most popular TypeScript repositories, but it has 0 lines of TypeScript code. Instead, it has... large localization files with the extension *.ts [2]. BitCoin used to have the same problem, but it looks like GitHub hack fixed it for that particular repository as less popular forks of BitCoin still have this issue.
It took me a few minutes to find these examples, just by examining trending repositories [4]. I'm sure there are many more. So do not be rash in drawing conclusions from this data! :)
[0] https://github.com/laravel/laravel
[1] https://github.com/laravel/laravel/blob/master/public/css/ap...
[2] https://github.com/primecoin/primecoin
[3] https://github.com/primecoin/primecoin/tree/master/src/qt/lo...
[4] https://github.com/trending