Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A midterm test on basic SQL for my journalism class (padjo.org)
120 points by danso on Oct 24, 2014 | hide | past | favorite | 24 comments



This is fantastic!

We're in a technological world now, and not knowing how to use that technology puts one at a severe disadvantage.

I'm very glad to see other fields start using technology and actually teaching it in the Universities.

I have a friend who is in the Political Science program at his University, and they are teaching everyone how to use the R programming language. He couldn't stop complaining until I just laid out everything he could do with it.


I think teaching a declarative language like SQL along with some basic, common queries is a lot easier than teaching R to someone with no technical background. Even as a CS student, it took me a couple weeks to get the hang of R.

In general, I see a trend towards teaching data science and analytics to people who aren't software engineers. I think SQL is an excellent way to bring analytics skills to the masses (over python, R, java).


R is taught specifically because it is a statistical analysis language, which is what a Poli Sci student can expect to be doing a lot of after school.

R isn't really "for CS students", it's for statisticians and analytics professionals.

Just like chemists often write their own chemical modelling programs, it takes a lot of domain-specific knowledge to craft the correct algorithms for things like this.


That's good to know. As someone with a VERY solid grasp of SQL, I found R to be baffling (though I've only tinkered with it a few times).


Trying to use R to do statistics -- graphs, linear regression, t-tests, etc. -- is the best way to get your feet wet. It's ironically not great for data management and relations. (Although there are very good packages for that sort of thing: datatable, dplyr, and sqldf all come to mind.)


Especially when highlighting the parallels of SQL with Excel.

"Note: Alternatively, if you wanted to reduce the amount of Pivot Table work, you could've done the aggregations inside of SQL."


Cool! It'd be great for more journalists to learn SQL. It was originally designed to let managers run reports without asking the programmers, so perhaps there is hope.

I see you have discussed GROUP BY. In that case, you should also talk about HAVING. It's just like WHERE, but it runs after grouping instead of before. It can be really useful, and a lot of people don't even know it's there.



Did you learn regular expressions as well? I bet they'd be very useful for journalists / journalism majors.


Yes...teaching regexes is one of my goals...since the beginning I've forced them to jump through the hoops of setting up Github and submitting Markdown files, to get used to the idea of dealing with plain text (which of course, is key for understanding CSVs)...so regexes are a natural thing to learn for finding patterns and, on a day to day basis, cleaning data...I think in terms of technical skills, it's probably the most useful thing I could teach given how much time is ultimately spent on data munging.


Have you seen http://software-carpentry.org/ ?

I haven't looked close enough at what you are doing or what they are doing to pretend to make a comparison, but I guess it isn't too risky to say that they are at least thinking about some of the same things you are.


Yes I have...they're hitting a lot of what I want to do in a class for next quarter, which will be heavily command-line focused. Besides using some of their lessons, hopefully I'll come up with a few of my own to contribute to the project. But I do agree that both scientists and journalists could benefit a lot from learning how to work with their computers at a lower-level.


http://regex101.com/ is (I think) a really nice way to learn/use regex. It explains each piece, displays capture groups from the regex applied to sample data, and has built-in documentation. Simple example: http://regex101.com/r/nB1bR1/1


I think this is just excellent. I'm more a tech person and not a journalist myself, so I don't expect to find something new for me there in sense of using technology, yet I surely will go through it, as I find the approach itself interesting enough to do it.

By the way, maybe there are other courses/book/sets of exercises/whatever I could use to learn more about that non-technological, "liberal" side of journalism? I was always wondering what makes person a journalist, but unfortunately never achieved it myself.


This is excellent! I've taught a graduate journalism class in the past and <danso>'s approach here is inspiring -- I hope other instructors pay attention.

Though note the class appears to be "COMM273D/Public Affairs Data Journalism," so the students enrolling presumably are willing to put in the effort required to learn a bit about SQL (I don't know if this approach would be as successful in a basic journalism course).


Interesting! I'm teaching journalism students how to build mobile apps on iOS with Objective-C at UT-Austin. For most of them, this is their first experience programming.


It's so great to see such accessible teaching materials for the next generation of data journalists. Almost makes me wish I studied journalism in college.


With the heavy reliance on ORM magic in modern frameworks, how long before students in a class like this know SQL better than some professional developers?


You can use SELECT inside of SELECT now, and MySQL is usually smart enough to plan and optimize the join for you.


Why MySQL and SQLite, but not PostgreSQL?


Besides being the two variants I'm most familiar with, MySQL and SQLite have the most variety of GUIs and, ostensibly, the most help docs...but the point of the SQL isn't for them to learn even the most basic things about database admin (I decided to skip over indexing and I just provide the SQL dumps for imports)...it's: 1) be able to handle datasets of more than a million rows (Excel and Spreadsheets top out at around 1M), 2) be able to join datasets on foreign keys (in the past, I've tried Fusion Tables, but the merging function is a bit wonky, and is pretty inflexible)...and 3) ...something that I've now since realized, it's a great way to show the difference between learning how to use software (i.e. Excel, Google Spreadsheets), and learning how to tell the computer explicitly what you are trying to find.

The most practical difference for me is that it's much easier for me to describe a data querying process through pseudo-SQL than it is to describe all the submenus and clicking and highlighting you have to do in a spreadsheet....the tradeoff being, you have to get past the wall of learning SQL syntax. But I was very surprised...the students picked it up very quickly, even a student-athlete who hasn't been able to come to a single class...(and no, I don't think someone is doing his homework for him, since I meet with him weekly and he's since been able to work on SQL and datasets on top of what I've assigned him)...

They're still a long ways from understanding SQL to the point of being able to administer a DB, but that's OK, the class is about querying and investigating data...and using SQL to the point where they can wrangle the data to analyze/visualize it in other software (such as a spreadsheet)


You're doing great work! The students that graduate with this SQL knowledge will be empowered to find needles in haystacks.


This is pretty cool - I had no idea journalism students had to know SQL. Thanks for sharing!


Having worked with journalists, I'm still shocked that don't know SQL. It seems like a perfect fit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: