Hacker News new | past | comments | ask | show | jobs | submit login
Analyzing Django requirement files on GitHub (pyup.io)
143 points by jayfk on June 8, 2017 | hide | past | favorite | 30 comments



> Among all projects, more than 60% use a Django release with one or more known security vulnerabilities. Only 2% are using a secure Django release.

Probably because 95% of projects on GitHub are homework assignments for job interviews that never get updated after they're submitted.


I tend not to pin versions. When I generate requirements files, I use pip-chill (disclaimer: I made it) to avoid listing redundant dependencies.


Interesting. To me pip-tools seems like a better fit for this use case, in terms of letting you specify a requirements.in file that gets compiled to a requirement.txt file. This allows you to selectively pin dependencies of other packages if you have some use case for doing so.

Something related that would be nice though is having pip list -o only output dependencies in the requirements.in file if it exists, or else that are primary dependencies.


You can tell it to generate the list without version numbers so you can get a clean requirements.in for pip-tools to compile.


I didn't use to pin versions until last weekend i experienced https://pypi.python.org/pypi/requests going through 15 revs within 3 days.


I say I tend. I never said I never do it. ;-)

My development environment is not pinned. When I set up CI, I do it with bot pinned (as it goes to production) and unpinned (from dev) requirements, both generated using pip-chill.


Agreed. Would be interested to hear what happens when you filter for projects that have had any activity in the last month (say).

Jay (at PyUp) - any idea?


The query I ran against the GitHub dataset has no activity/age/popularity data, just plain requirements files.

I'll write a follow up post on this :)


I'd add in a minimum of 100 or so stars also. Anything less than that is probably a personal project and not a package.


Eh, that would miss my two stars django package then.[1] 100 is steep, and django packages are almost universally titled django-project-name so you could just find repos starting with django- with recent commits

[1] https://github.com/audiolion/django-groups-cache


You may be interested in the GHTorrent database on BigQuery:

https://bigquery.cloud.google.com/dataset/ghtorrent-bq:ght


I would also be interested in seeing if a project has any significant amount of stars to filter out "homework assignments" that likely doesn't see any real use.


Awesome, looking forward to it, and nice work on PyUp!


Really?, i know "professional" projects using django versions seven years old...


One of the companies I work with has been using Django for sites since it was in beta. They are always years behind in releases.

Sometimes you can build a small but profitable business and not do everything right from a technical perspective.

Still, you'd never see the code on a public Github repo.


Most Django sites probably aren't public github projects though

These are more likely Django apps... it'd be interesting to consider how many of them shouldn't even be mentioning Django at all in their requirements.txt files to avoid clashing with the Django version of the project you're importing their app into.


This gets to the setup.py/requirements.txt confusion.

setup.py is for specifying dependencies of a piece of code you're distributing, while requirements.txt is for specifying a known-good environment to run a piece of code in.

For example, my own personal site has a requirements.txt specifying Django 1.11.2 because that's the version I'm deploying on and test it on. But it uses several applications I've written and distribute, and those use setup.py to specify a dependency on Django 1.8, 1.10 or 1.11 (and those all use tox/Travis to test on the full matrix of Python and Django versions they support).


A note about the use of BigQuery here: this problem is one of the very few cases where there is so much data that you'll actually have to pay money to run the query. (the query processes 2.21TB of data; you get 1TB free, then $5/TB).


Yep. If I remember correctly I've paid ~$8 to run the query. That's one of the reasons I've made the raw result public on https://github.com/pyupio/github-requirements


Oh my, X%! Did the upvoters see the many placeholders in the article or were they asked to vote by someone?


It's still pretty interesting; you can eyeball those other percentages. The "Most projects are still on Django 1.8" insight is still a good one.

Not sure how this short article could have rocketed to the very top of the front page so quickly, though...


There might be insight to be gleamed from a better dataset than public repos on github. It's a big leap to make inferences about the use of Django in the wild from this.


I'd love to get my hands on a better dataset than public repos on GitHub. Any ideas?


Snarkiness aside, fixed :)


I'm happy to see that people are using the LTS release as intended. Not surprised at all that the newest releases are the least used ones. More than a little surprised that version 1.6 still has any users at all, let alone how many it actually does have.

For those not familiar with django's release history the 1.6 -> 1.7 major release was a very large change in terms of how database migrations are handled. In 1.6 (and earlier) there was no built in too for it, but a very popular django extension library called South was the standard. In version 1.7 the creator of South (Andrew Godwin) wrote a migration tool for django core that was based on his previous work with South. There is a migration path from South to django core migrations and it's not that scary to do but it's a little work. That was several years ago at this point though. I wonder if some projects just abandoned upgrading at 1.6 because of this.


Any piece of software that doesn't essentially force an upgrade on you will end up used in a lot of deployments that never upgrade. Not much we can do about that.

Though you're right that people tend to "stick" on a version where the next step up is a trickier upgrade. That's happened a couple times in Django's history -- a lot of early deployments (pre-1.0) pinned for a long time on 0.91 since the Django ORM got completely rewritten for 0.95.

And post-1.0, the switch to class-based generic views also left some people behind. 1.11 is likely to be "sticky" too since it's the last version that will support Python 2 (the next release -- Django 2.0 -- will be Python-3-only, and 1.11's LTS support cycle is timed to expire when upstream support for Python 2.7 does).


Team 2% unite. 1.11 rocks.


Since 1.11 is an LTS release, presumably its share will increase. It'll be the default for new projects now.


Any reason why you didn't exclude forks from this data?


It doesn't seem possible to reliably identify forks from the BigQuery GitHub dataset (you could estimate by deduping repos of the same name), which seems like a weakness.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: