Show HN: SQL Teaching – Codecademy for SQL

krat0sprakhar · on Feb 19, 2015

As a backend developer, I've been increasingly trying to reduce my reliance on ORMs and instead focus on writing SQL. Owing to my poor SQL chops, I spent a couple of weeks working through @danso's data journalism course[0]. It is a fantastic introduction to SQL, even for all non-technical people (forwarded to my business analyst friends as well).

Amongst other topics, it covers aggregations, joins, grouping and gives a good well-rounded introduction to performance characteristics and best practices. For those wondering "where to go from here", don't look any further and give the course a shot.

[0] - http://www.padjo.org/

PJDK · on Feb 19, 2015

Why the move away from ORMs? I've certainly banged my head against the wall with them any number of times, but I wouldn't want to give up the compile time checking of queries when I make a db change.

collyw · on Feb 19, 2015

Maybe it depends on your ORM.

I use the Django ORM most of the time, and it is great for quick stuff.

Say I have to do a bulk update, with a few conditions thrown in. MySQL becomes too much hassle trying to parse the first digit out of a string ,and updating another field with that. A simple script using the Django ORM won't take long to do that and will be far easier than trying to work around MySQL's limited string functions.

On the other hand Django's ORM doesn't handle some pretty common cases (conditional aggregates are the first thing that comes to mind). Or complex joins on more than one field. I looks around, and there are ways of hacking these things into the ORM, but they really don't seem worth the bother, when you can use SQL for the query.

jeffasinger · on Feb 19, 2015

This is why I'm so excited for django 1.8, it introduces the ability to use pretty much any SQL function in aggregates, annotations and order bys, including CASE.

collyw · on Feb 22, 2015

Where can I read about this?

PJDK · on Feb 19, 2015

Good point. I'm coming from the slightly spoiled .NET/LINQ world so you get (I think) 100% of SQL functionality built in.

leereeves · on Feb 19, 2015

I think learning SQL when you already know LINQ is a bit like learning C when you know C#.

Knowing one makes it easier to learn the other, and there's value to understanding the underlying abstraction layer, but you'll avoid using it if you can.

rjbwork · on Feb 19, 2015

I disagree. I love C#/LINQ, but after a couple years of Linq2SQL and EF, I'm back to preferring raw SQL via Dapper. I think it's simply a better programming model.

akbar501 · on Feb 19, 2015

I moved away from ORMs a while ago b/c I found myself bumping into edge cases too frequently and/or spending too much time trying to figure out how to write complex queries in the ORMs syntax.

In general, the move to SQL was initially painful, but is now a breeze.

With SQL, I don't have to learn an abstraction that is limited to a specific language. So whether I'm working in Java, Python or JavaScript/Node.js, the SQL is the same.

Also, there are numerous technologies that now provide an SQL-like language. For example, learning SQL made it easy to learn CQL (Cassandra), HiveQL, etc.

leereeves · on Feb 19, 2015

You can keep an ORM and use SQL only when needed, for example passing raw queries to the ORM, or using a view in the database engine.

Or use homegrown database wrapper module(s) and keep compile time checking in the vast majority of the code.

veb · on Feb 19, 2015

I believe that visualization plays a huge part in learning. In reality, you'll never see "current tables" looking like that. I checked Google Images ("sqlite result") and it seems that you could benefit heaps by designing your tables a bit better so they look like http://i.stack.imgur.com/9CXVO.png perhaps.

I would also pay a little more attention to explaining what something is, and why you do it like that. Instead to me it feels more like "copy and paste this command, click run" and so you're not really learning at all. Why SELECT? Why am I using *?

I realise these explanations may come in the later lessons, but from the very beginning I should know what these are (syntax-wise). Then, as things become progressively different I can understand and follow.

I love sites like this, I think they're so beneficial but my main gripes are usually just that: the results, they really need to feel more real-world, and practical-like. Then, I want to know the why/what/how about everything in what I'm doing.

That's just me though. I may be good at SQL, but I'm damn sure I've probably missed a lot of simple things!

Is there a way to sign my email up for updates? I'd like to follow the progress!

EDIT: Ah there is: https://github.com/rhc2104/sqlteaching

rhc2104 · on Feb 19, 2015

Thanks for the feedback!

SQL Teaching is designed for non-technical people that want to learn the basics of data analysis. Just learning SELECT with WHERE clauses and basic joins get them pretty far.

The Github repo is: https://github.com/rhc2104/sqlteaching , so I guess following that repo is a way of following progress.

veb · on Feb 19, 2015

Heh, found it just before your comment. :)

Keep up the great work.

chernevik · on Feb 19, 2015

IMHO, I think the material here is too brief, and insufficiently progressive, to help a beginner move to confidence. To master concepts, people need repetition, and application of concepts to problems of sufficient difference to see the fundamentals of the concept apart from any particular. Giving people a dozen tasks, each introducing some different concept, won't leave them with a basis for a confident understanding.

I also think beginners need:

- A strong grasp of DISTINCT. This prepares them for the division of tables into subsets by GROUP BY, which is vital to an understanding of how aggregation and window functions operate. ORDER BY can used to show beginners how they can first organize the data, and see how different values in different fields can be used to segregate records into groups.

- WHERE is important but actually not that hard to communicate. The key things about WHERE are one, you can specify true / false expressions whose evaluation dictates inclusion or exclusion of a record (this is crucial to understanding how a JOIN works), and two, you can use IN to compare a record's value to a _list_ of values. Once someone grasps this, it becomes very easy for them to see how a subquery works.

- A strong grasp of how subqueries work. One of the great strengths of SQL is the ability to encapsulate complexity into a subquery and pass that result on to another query for further use.

- A thorough walk-through on "how" JOINs work. What does it mean to JOIN two records? To JOIN two tables? How do we "filter" that join of two tables? (All of this at a _conceptual_ level, taking care to note that how we might think through a complicated join is _not_ how the operation is implemented by the software.) What is the difference between an INNER JOIN and a LEFT JOIN, and how are these differences useful?

- Some discussion of just what exactly SQL is, what it is used for, and what it isn't used for. The tool has to be placed into some context so that beginners are oriented to where they will encounter it and what they can use it for. In particular they need to understand its interface, and the textual / non-visual nature of that interface. Most people are accustomed to working with computers through GUIs. The textual nature of SQL makes it very different, in profound ways, from the vast majority of tools people use on computers.

buckbova · on Feb 19, 2015

If your code is littered with DISTINCT clauses either you are doing something wrong or the data design is poor.

DISTINCT leads to expensive SORT operations and effectively poor performance.

swyman · on Feb 19, 2015

I'm genuinely curious. Would you mind elaborating or pointing me to another resource that goes into detail on why this is the case?

buckbova · on Feb 20, 2015

The poor design part or the performance part?

The use of DISTINCT everywhere smells of denormalized tables. Now if you are in data warehouse or reporting position, then this likely makes sense. I tend to work in transactional applications and keep the redundancy down to an absolute minimum. I abstract away some of the complicated queries with views, procs, and functions where allowable.

As for performance, different rdbms implement this differently, but the general query plan category for GROUP BY and DISTINCT is SORT. If you are querying something where you have multiple subqueries with DISTINCT and you and ordering the final results, you are adding extra sort operations to the query plan, hence hurting performance.

I tend to design everything to optimize read operations because there tends to be much more read than write in systems I work in. For me this means denormalized and heavily indexed where table/index scans are extremely rare. On smaller data sets, some rdbms always scan tables because it saves operations based on table statistics.

I hope this helps.

xtrumanx · on Feb 19, 2015

I guess this is the client-side alternative to SQLZOO[0] which is a fantastic resource I used to use back in the day to sharpen my SQL skills.

Been doing SQL for so long now but only recently have I been exposed to how powerful indices and learning how to read the execution plan is.

I realize this is beginner stuff based on a local browser-based SQLite but I wish more learning resources like this and SQLZOO had an advanced section to discuss performance issues you may encounter once you have a large enough dataset. For years I just assumed we just had so much data that having to wait for the query to complete was natural until I tried figuring out how to resolve a deadlock issue we've been having an accidentally figured out how to tune queries to run instantaneously.

There are probably many people out like my former self who know the basics and also have to maintain huge databases.

By the way, if you're one of them, "Use the Index, Luke"[1] is where I learned everything I needed to know about creating indices (though I don't think it has much about reading execution plans).

[0] http://sqlzoo.net [1] use-the-index-luke.com

hobs · on Feb 19, 2015

It depends on what engine you are using once you are talking about xplans, if you are into SQL Server and you are looking for a great book to reference thats almost all practical "this is how stuff is going to be interpreted", check out http://www.amazon.com/SQL-Server-Query-Performance-Tuning/dp... by Grant Fritchey (the previous book had Sajal Dam)

Additionally, SQLPass puts out a great DVD every year and in the past year or two they had some great talks about how to read xplans, why they are actually lies, and going into things like statistics io, measuring recompile/cpu time, determining if implicit conversions are causing issues, etc etc etc

xtrumanx · on Feb 19, 2015

It seems that the SQLPass requires registration to enter the "Session Recordings" page so I'll have to look into it later.

I googled "execution plan lie" and the only stuff that comes up is regarding how Oracle's Explain Plan sometimes provided inaccurate information. Is that what you were referring to?

I guess explains why Microsoft labelled their version of Explain Plan "Estimated Execution Plan". I mostly just include the actual execution plan when running the query so I can get the real execution plan and the io stats all in one go.

hobs · on Feb 19, 2015

Actual execution plan's costs in SQL Server still contain a lot of estimates, I have seen SQL Server guide you in the wrong direction if you are looking at the thing "costing" the most, when in fact I find when I am doing perf tuning my biggest first steps are: 1. Is the code pants on head stupid and not set based or not sargable 2. Do the query plans clearly misinterpret the data and why that is

My current process is: 1. set statistics io on; set nocount on; 2. Grab actual execution plan, throwing it in sql sentry plan explorer (free edition) (holy crap good) 3. Finding which estimates are off in the breakdown in 2 4. Investigating specific code areas where estimates are vastly different than actual, fixing whatever issue (implicit conversions, udfs, old stats, etc) arises 5. Consider adding indexes if needed 6. Paste outputs of both stats into http://www.statisticsparser.com/ (you can have it print things as headers, so I like print 'test a' go exec sp_proc @params go print 'test b' go exec sp_proc2 @params ) 7. Compare if I did it right or not by checking overall reads, cpu time, etc

Update: I have the videos in question on my dropbox, I dont know if they would like a public link, so if you dont want to sign up let me know via my email in my profile and I will send them to you.

brudgers · on Feb 19, 2015

One thing that makes CodeAcademy successful is an attention to gamification. I think sketching some of that out is a priority for the next iteration. Clear game mechanics is what keeps people coming back to a platform like this.

andrewstuart2 · on Feb 19, 2015

I would most definitely love to see some table design thrown in there as well, with ER diagrams, cardinality, and normalization (bonus points for really explaining why well). I just noticed this is on GitHub, so if I get a chance, I'd certainly like to contribute :-)

From my talks with a few of my great programmer friends, it seems this is an area that more people could use some bolstering.

clay_to_n · on Feb 19, 2015

In the 'Select specific columns' lesson, FROM isn't capitalized. Not an error but could cause confusion. Same with AND in its lesson. It might also be nice in the first lesson to explain that the caps aren't necessary, but make it more understandable.

In the first lesson, after the paragraph "Imagine we have a table...", it might help to display an actual table there. I was a bit confused, reading through the whole page and not visually seeing any table. Only after I started writing the command did I realize I had a table underneath the input box.

Overall, looks really good! Not a bad place to start for developers who haven't used SQL before but want to get a simple working knowledge of it.

teddyuk · on Feb 19, 2015

<personalOpinionPleaseDontCry>Capitalization doesn't make it more readable - just use a ide (or whatever) with syntax colouring and your sorted. </personalOpinionPleaseDontCry>

vertex-four · on Feb 19, 2015

It is a convention, much the same as variable/function/class naming conventions and whitespace conventions, that makes it easier to read an arbitrary piece of code when you come up to it for the first time.

rhc2104 · on Feb 19, 2015

Thanks for letting me know! Someone created a pull request to fix this, and I have pushed the change live.

contradictioned · on Feb 19, 2015

Also: "SQL Island", where you text-adventure-like play a game through SQL http://wwwlgis.informatik.uni-kl.de/extra/game/

solaris999 · on Feb 19, 2015

That took me to the German version, you can get to the English version (which is at the same url) through this page: http://wwwlgis.informatik.uni-kl.de/cms/courses/informations...

adamnemecek · on Feb 19, 2015

There's also Schemaverse https://schemaverse.com/

I've never played it but it looks interesting.

Orangeair · on Feb 19, 2015

Is there by any chance an English version available?

contradictioned · on Feb 19, 2015

Mh on my settings it was english by default. With this parameter it should work: http://wwwlgis.informatik.uni-kl.de/extra/game/?lang=en

Orangeair · on Feb 19, 2015

Weird. It did not want to be in English for me on Chrome or IE. Thanks for the link.

nissimk · on Feb 19, 2015

Looks good for the basics, but as some other folks mentioned here, it would benefit from some articles about design, and even more importantly, rationale. I hear from so many programmers "why?" They don't understand the benefits of relational. This perspective ranges from old school programmers that just want to store data in the file system, to newcomers that like mongoDB or other object stores.

In my point of view, the main answer to "why relational" is that with an rdbms you can answer ad-hoc questions by writing a query rather than writing a program. Even as a programmer, having the power of SQL on the server is really great for analytics.

Furthermore, as an ORM hater, I would say that query results should be processed in your programming language as a table structure rather than converting to specific objects per table. If you're building a CRUD system and the screens update one or two objects/tables at a time, then an ORM is useful, but if you're doing any kind of analytics, any object structure other than a table or list of dictionaries is cumbersome and encourages code over sql which is a bad thing.

kiyoto · on Feb 19, 2015

>Looks good for the basics, but as some other folks mentioned here, it would benefit from some articles about design, and even more importantly, rationale.

I agree about rationale, but probably in a different way. For a great many people, especially in the "data analyst camp" (i.e., people coming from finance/consulting background with mind-blowing Excel skills), SQL is not something they want to learn in the vacuum but more like necessary evil. For them, it's hugely valuable to learn how and why SQL can be useful.

http://blog.treasuredata.com/2014/12/05/learn-sql-by-calcula... is my humble attempt to address that issue, and it seems to have resonated well with the data analyst type folks.

Just offering a non-programmer's perspective.

thanksgiving · on Feb 19, 2015

> encourages code over sql which is a bad thing

Not saying you are incorrect but you made an assertion here. Can you please elaborate?

astine · on Feb 19, 2015

Not certain about the OP's reasoning, but doing aggregate functions on the database server is often more performant than doing the same in your application. This is, in part, because of the smaller result set sizes.

Also, it's useful having logic baked into the database rather than your application in the case that you need a different application to connect to your database. It's a problem, for example, if you're maintaining your foreign key relationships through ActiveRecords's relations and ignoring it on the server. It makes it much easier for junk data to get in the database.

AlisdairO · on Feb 19, 2015

Good stuff!

You might want to be a little careful with ensuring an aggressive query timeout. I ran the following:

select * from family_members f1, family_members f2, family_members f3, family_members f4, family_members f5, family_members f6, family_members f7, family_members f8

This took a while to run - I didn't go any further as I didn't want to damage the site during your launch, but add a few more cartesian products in there and it could hurt.

rhc2104 · on Feb 19, 2015

Thanks for the heads up. I actually use Sql.js (SQLite in JavaScript), so there is currently no server-side component of the website.

AlisdairO · on Feb 19, 2015

Ah, fantastic! I assumed based on the mention of SQLite on the front page.

mdellabitta · on Feb 19, 2015

This is pretty much the DOS version of "why are you hitting yourself"

AlisdairO · on Feb 19, 2015

Indeed :-)

I actually had no idea sql.js existed up until now. Seems like it could be pretty useful.

asafira · on Feb 19, 2015

I was looking for a way to sharpen my SQL skills a few months ago, and after being annoyed with some of the very, very beginner stuff out there I found this:

http://www.sql-ex.ru/

The site can be slow sometimes, but I have found queries/puzzles that (a) have shown me use-cases for functions and capabilities I have never used and (b) can be challenging! Later they apparently focus more on optimization, but I haven't gotten there. (Especially since they have hundreds of these more difficult queries). I've done about 70ish so far.

What do you guys think about it? Overall, never going to be useful, or a good resource to get better with SQL?

villek · on Feb 19, 2015

Looks like a great start!

One thing that would be essential is better feedback on errors. Currently, if I enter an invalid SQL query I’m only told that it was incorrect. For beginners, this can be frustrating.

avinassh · on Feb 19, 2015

I have learnt SQL from https://sqlzoo.net It's a really good site with options of other database system syntax also.

hobs · on Feb 19, 2015

As someone who uses a LOT of SQL, and who has taught people before, I like this as a start, but it definitely needs more visualzation, and I recommend how sqlzoo represents tables in the top right with an overlay, its a simple and clear break down.

Something like http://blog.codinghorror.com/a-visual-explanation-of-sql-joi... may help with additional explanations.

I also think that things like sargability need to be discussed early on because while sql/set based stuff can be odd to grok at first, knowing why you are doing the things you do is important in laying the right foundations.

AlisdairO · on Feb 19, 2015

This is a self-plug, but you might find http://pgexercises.com useful as well.

avinassh · on Feb 22, 2015

Hey your exercises are really good. I am half way through it and so far I am enjoying it a lot.

AlisdairO · on Feb 22, 2015

fab - I'm really glad you like it!

MegaLeon · on Feb 19, 2015

Returns a "webpage not available" for me. However, if I access it from http://sqlzoo.net/wiki/Main_Page it seems fine.

elyase · on Feb 19, 2015

I also like http://datamonkey.pro.

uberneo · on Feb 19, 2015

looks good and might be helpful for somebody starting with SQL . You can add some advanced version as well with some more complex examples and use cases with inner queries , UPSERTS n all

domoarevil · on Feb 19, 2015

Cool, will check, thanks.

Despite the clunky UI, I've found that www.sql-ex.com provides the best medium to tricky problems for those wanting a refresher. (MS T-SQL centric.)

zer00 · on Feb 19, 2015

This is awesome!

One thing though, any reason you have Mary making 10% less than Dave in your first data set? Seems like kind of a weird thing to include.

warkid · on Feb 19, 2015

Cool! One should be able to run his code by pressing something like Ctrl/Alt-Enter, instead of reaching for mouse all the time.

nodesocket · on Feb 19, 2015

Very nice. Love to see a few more intermediate examples using LEFT and RIGHT OUTER and INNER JOIN, HAVING, UNION, and sub-queries.

gauravgupta · on Feb 19, 2015

Looks useful. You should submit this on http://hackr.io as well.

allworknoplay · on Feb 20, 2015

Good start. Many others have suggested other query types. I would also suggest that you cover inserts as well.

mohdmaqboolalam · on Feb 19, 2015

completed the course thanks for the tutorials.