Interesting. To me pip-tools seems like a better fit for this use case, in terms of letting you specify a requirements.in file that gets compiled to a requirement.txt file. This allows you to selectively pin dependencies of other packages if you have some use case for doing so.
Something related that would be nice though is having pip list -o only output dependencies in the requirements.in file if it exists, or else that are primary dependencies.
My development environment is not pinned. When I set up CI, I do it with bot pinned (as it goes to production) and unpinned (from dev) requirements, both generated using pip-chill.
Eh, that would miss my two stars django package then.[1] 100 is steep, and django packages are almost universally titled django-project-name so you could just find repos starting with django- with recent commits
I would also be interested in seeing if a project has any significant amount of stars to filter out "homework assignments" that likely doesn't see any real use.
Most Django sites probably aren't public github projects though
These are more likely Django apps... it'd be interesting to consider how many of them shouldn't even be mentioning Django at all in their requirements.txt files to avoid clashing with the Django version of the project you're importing their app into.
This gets to the setup.py/requirements.txt confusion.
setup.py is for specifying dependencies of a piece of code you're distributing, while requirements.txt is for specifying a known-good environment to run a piece of code in.
For example, my own personal site has a requirements.txt specifying Django 1.11.2 because that's the version I'm deploying on and test it on. But it uses several applications I've written and distribute, and those use setup.py to specify a dependency on Django 1.8, 1.10 or 1.11 (and those all use tox/Travis to test on the full matrix of Python and Django versions they support).
A note about the use of BigQuery here: this problem is one of the very few cases where there is so much data that you'll actually have to pay money to run the query. (the query processes 2.21TB of data; you get 1TB free, then $5/TB).
There might be insight to be gleamed from a better dataset than public repos on github. It's a big leap to make inferences about the use of Django in the wild from this.
I'm happy to see that people are using the LTS release as intended. Not surprised at all that the newest releases are the least used ones. More than a little surprised that version 1.6 still has any users at all, let alone how many it actually does have.
For those not familiar with django's release history the 1.6 -> 1.7 major release was a very large change in terms of how database migrations are handled. In 1.6 (and earlier) there was no built in too for it, but a very popular django extension library called South was the standard. In version 1.7 the creator of South (Andrew Godwin) wrote a migration tool for django core that was based on his previous work with South. There is a migration path from South to django core migrations and it's not that scary to do but it's a little work. That was several years ago at this point though. I wonder if some projects just abandoned upgrading at 1.6 because of this.
Any piece of software that doesn't essentially force an upgrade on you will end up used in a lot of deployments that never upgrade. Not much we can do about that.
Though you're right that people tend to "stick" on a version where the next step up is a trickier upgrade. That's happened a couple times in Django's history -- a lot of early deployments (pre-1.0) pinned for a long time on 0.91 since the Django ORM got completely rewritten for 0.95.
And post-1.0, the switch to class-based generic views also left some people behind. 1.11 is likely to be "sticky" too since it's the last version that will support Python 2 (the next release -- Django 2.0 -- will be Python-3-only, and 1.11's LTS support cycle is timed to expire when upstream support for Python 2.7 does).
It doesn't seem possible to reliably identify forks from the BigQuery GitHub dataset (you could estimate by deduping repos of the same name), which seems like a weakness.
Probably because 95% of projects on GitHub are homework assignments for job interviews that never get updated after they're submitted.