As a backend developer, I've been increasingly trying to reduce my reliance on ORMs and instead focus on writing SQL. Owing to my poor SQL chops, I spent a couple of weeks working through @danso's data journalism course[0]. It is a fantastic introduction to SQL, even for all non-technical people (forwarded to my business analyst friends as well).
Amongst other topics, it covers aggregations, joins, grouping and gives a good well-rounded introduction to performance characteristics and best practices. For those wondering "where to go from here", don't look any further and give the course a shot.
Why the move away from ORMs? I've certainly banged my head against the wall with them any number of times, but I wouldn't want to give up the compile time checking of queries when I make a db change.
I use the Django ORM most of the time, and it is great for quick stuff.
Say I have to do a bulk update, with a few conditions thrown in. MySQL becomes too much hassle trying to parse the first digit out of a string ,and updating another field with that. A simple script using the Django ORM won't take long to do that and will be far easier than trying to work around MySQL's limited string functions.
On the other hand Django's ORM doesn't handle some pretty common cases (conditional aggregates are the first thing that comes to mind). Or complex joins on more than one field. I looks around, and there are ways of hacking these things into the ORM, but they really don't seem worth the bother, when you can use SQL for the query.
This is why I'm so excited for django 1.8, it introduces the ability to use pretty much any SQL function in aggregates, annotations and order bys, including CASE.
I think learning SQL when you already know LINQ is a bit like learning C when you know C#.
Knowing one makes it easier to learn the other, and there's value to understanding the underlying abstraction layer, but you'll avoid using it if you can.
I disagree. I love C#/LINQ, but after a couple years of Linq2SQL and EF, I'm back to preferring raw SQL via Dapper. I think it's simply a better programming model.
I moved away from ORMs a while ago b/c I found myself bumping into edge cases too frequently and/or spending too much time trying to figure out how to write complex queries in the ORMs syntax.
In general, the move to SQL was initially painful, but is now a breeze.
With SQL, I don't have to learn an abstraction that is limited to a specific language. So whether I'm working in Java, Python or JavaScript/Node.js, the SQL is the same.
Also, there are numerous technologies that now provide an SQL-like language. For example, learning SQL made it easy to learn CQL (Cassandra), HiveQL, etc.
I believe that visualization plays a huge part in learning. In reality, you'll never see "current tables" looking like that. I checked Google Images ("sqlite result") and it seems that you could benefit heaps by designing your tables a bit better so they look like http://i.stack.imgur.com/9CXVO.png perhaps.
I would also pay a little more attention to explaining what something is, and why you do it like that. Instead to me it feels more like "copy and paste this command, click run" and so you're not really learning at all. Why SELECT? Why am I using *?
I realise these explanations may come in the later lessons, but from the very beginning I should know what these are (syntax-wise). Then, as things become progressively different I can understand and follow.
I love sites like this, I think they're so beneficial but my main gripes are usually just that: the results, they really need to feel more real-world, and practical-like. Then, I want to know the why/what/how about everything in what I'm doing.
That's just me though. I may be good at SQL, but I'm damn sure I've probably missed a lot of simple things!
Is there a way to sign my email up for updates? I'd like to follow the progress!
SQL Teaching is designed for non-technical people that want to learn the basics of data analysis. Just learning SELECT with WHERE clauses and basic joins get them pretty far.
IMHO, I think the material here is too brief, and insufficiently progressive, to help a beginner move to confidence. To master concepts, people need repetition, and application of concepts to problems of sufficient difference to see the fundamentals of the concept apart from any particular. Giving people a dozen tasks, each introducing some different concept, won't leave them with a basis for a confident understanding.
I also think beginners need:
- A strong grasp of DISTINCT. This prepares them for the division of tables into subsets by GROUP BY, which is vital to an understanding of how aggregation and window functions operate. ORDER BY can used to show beginners how they can first organize the data, and see how different values in different fields can be used to segregate records into groups.
- WHERE is important but actually not that hard to communicate. The key things about WHERE are one, you can specify true / false expressions whose evaluation dictates inclusion or exclusion of a record (this is crucial to understanding how a JOIN works), and two, you can use IN to compare a record's value to a _list_ of values. Once someone grasps this, it becomes very easy for them to see how a subquery works.
- A strong grasp of how subqueries work. One of the great strengths of SQL is the ability to encapsulate complexity into a subquery and pass that result on to another query for further use.
- A thorough walk-through on "how" JOINs work. What does it mean to JOIN two records? To JOIN two tables? How do we "filter" that join of two tables? (All of this at a _conceptual_ level, taking care to note that how we might think through a complicated join is _not_ how the operation is implemented by the software.) What is the difference between an INNER JOIN and a LEFT JOIN, and how are these differences useful?
- Some discussion of just what exactly SQL is, what it is used for, and what it isn't used for. The tool has to be placed into some context so that beginners are oriented to where they will encounter it and what they can use it for. In particular they need to understand its interface, and the textual / non-visual nature of that interface. Most people are accustomed to working with computers through GUIs. The textual nature of SQL makes it very different, in profound ways, from the vast majority of tools people use on computers.
The use of DISTINCT everywhere smells of denormalized tables. Now if you are in data warehouse or reporting position, then this likely makes sense. I tend to work in transactional applications and keep the redundancy down to an absolute minimum. I abstract away some of the complicated queries with views, procs, and functions where allowable.
As for performance, different rdbms implement this differently, but the general query plan category for GROUP BY and DISTINCT is SORT. If you are querying something where you have multiple subqueries with DISTINCT and you and ordering the final results, you are adding extra sort operations to the query plan, hence hurting performance.
I tend to design everything to optimize read operations because there tends to be much more read than write in systems I work in. For me this means denormalized and heavily indexed where table/index scans are extremely rare. On smaller data sets, some rdbms always scan tables because it saves operations based on table statistics.
I guess this is the client-side alternative to SQLZOO[0] which is a fantastic resource I used to use back in the day to sharpen my SQL skills.
Been doing SQL for so long now but only recently have I been exposed to how powerful indices and learning how to read the execution plan is.
I realize this is beginner stuff based on a local browser-based SQLite but I wish more learning resources like this and SQLZOO had an advanced section to discuss performance issues you may encounter once you have a large enough dataset. For years I just assumed we just had so much data that having to wait for the query to complete was natural until I tried figuring out how to resolve a deadlock issue we've been having an accidentally figured out how to tune queries to run instantaneously.
There are probably many people out like my former self who know the basics and also have to maintain huge databases.
By the way, if you're one of them, "Use the Index, Luke"[1] is where I learned everything I needed to know about creating indices (though I don't think it has much about reading execution plans).
It depends on what engine you are using once you are talking about xplans, if you are into SQL Server and you are looking for a great book to reference thats almost all practical "this is how stuff is going to be interpreted", check out http://www.amazon.com/SQL-Server-Query-Performance-Tuning/dp... by Grant Fritchey (the previous book had Sajal Dam)
Additionally, SQLPass puts out a great DVD every year and in the past year or two they had some great talks about how to read xplans, why they are actually lies, and going into things like statistics io, measuring recompile/cpu time, determining if implicit conversions are causing issues, etc etc etc
It seems that the SQLPass requires registration to enter the "Session Recordings" page so I'll have to look into it later.
I googled "execution plan lie" and the only stuff that comes up is regarding how Oracle's Explain Plan sometimes provided inaccurate information. Is that what you were referring to?
I guess explains why Microsoft labelled their version of Explain Plan "Estimated Execution Plan". I mostly just include the actual execution plan when running the query so I can get the real execution plan and the io stats all in one go.
Actual execution plan's costs in SQL Server still contain a lot of estimates, I have seen
SQL Server guide you in the wrong direction if you are looking at the thing "costing" the most, when in fact I find when I am doing perf tuning my biggest first steps are:
1. Is the code pants on head stupid and not set based or not sargable
2. Do the query plans clearly misinterpret the data and why that is
My current process is:
1. set statistics io on; set nocount on;
2. Grab actual execution plan, throwing it in sql sentry plan explorer (free edition) (holy crap good)
3. Finding which estimates are off in the breakdown in 2
4. Investigating specific code areas where estimates are vastly different than actual, fixing whatever issue (implicit conversions, udfs, old stats, etc) arises
5. Consider adding indexes if needed
6. Paste outputs of both stats into http://www.statisticsparser.com/ (you can have it print things as headers, so I like print 'test a' go exec sp_proc @params go print 'test b' go exec sp_proc2 @params )
7. Compare if I did it right or not by checking overall reads, cpu time, etc
Update: I have the videos in question on my dropbox, I dont know if they would like a public link, so if you dont want to sign up let me know via my email in my profile and I will send them to you.
One thing that makes CodeAcademy successful is an attention to gamification. I think sketching some of that out is a priority for the next iteration. Clear game mechanics is what keeps people coming back to a platform like this.
I would most definitely love to see some table design thrown in there as well, with ER diagrams, cardinality, and normalization (bonus points for really explaining why well). I just noticed this is on GitHub, so if I get a chance, I'd certainly like to contribute :-)
From my talks with a few of my great programmer friends, it seems this is an area that more people could use some bolstering.
In the 'Select specific columns' lesson, FROM isn't capitalized. Not an error but could cause confusion. Same with AND in its lesson. It might also be nice in the first lesson to explain that the caps aren't necessary, but make it more understandable.
In the first lesson, after the paragraph "Imagine we have a table...", it might help to display an actual table there. I was a bit confused, reading through the whole page and not visually seeing any table. Only after I started writing the command did I realize I had a table underneath the input box.
Overall, looks really good! Not a bad place to start for developers who haven't used SQL before but want to get a simple working knowledge of it.
<personalOpinionPleaseDontCry>Capitalization doesn't make it more readable - just use a ide (or whatever) with syntax colouring and your sorted.
</personalOpinionPleaseDontCry>
It is a convention, much the same as variable/function/class naming conventions and whitespace conventions, that makes it easier to read an arbitrary piece of code when you come up to it for the first time.
Looks good for the basics, but as some other folks mentioned here, it would benefit from some articles about design, and even more importantly, rationale. I hear from so many programmers "why?" They don't understand the benefits of relational. This perspective ranges from old school programmers that just want to store data in the file system, to newcomers that like mongoDB or other object stores.
In my point of view, the main answer to "why relational" is that with an rdbms you can answer ad-hoc questions by writing a query rather than writing a program. Even as a programmer, having the power of SQL on the server is really great for analytics.
Furthermore, as an ORM hater, I would say that query results should be processed in your programming language as a table structure rather than converting to specific objects per table. If you're building a CRUD system and the screens update one or two objects/tables at a time, then an ORM is useful, but if you're doing any kind of analytics, any object structure other than a table or list of dictionaries is cumbersome and encourages code over sql which is a bad thing.
>Looks good for the basics, but as some other folks mentioned here, it would benefit from some articles about design, and even more importantly, rationale.
I agree about rationale, but probably in a different way. For a great many people, especially in the "data analyst camp" (i.e., people coming from finance/consulting background with mind-blowing Excel skills), SQL is not something they want to learn in the vacuum but more like necessary evil. For them, it's hugely valuable to learn how and why SQL can be useful.
Not certain about the OP's reasoning, but doing aggregate functions on the database server is often more performant than doing the same in your application. This is, in part, because of the smaller result set sizes.
Also, it's useful having logic baked into the database rather than your application in the case that you need a different application to connect to your database. It's a problem, for example, if you're maintaining your foreign key relationships through ActiveRecords's relations and ignoring it on the server. It makes it much easier for junk data to get in the database.
This took a while to run - I didn't go any further as I didn't want to damage the site during your launch, but add a few more cartesian products in there and it could hurt.
I was looking for a way to sharpen my SQL skills a few months ago, and after being annoyed with some of the very, very beginner stuff out there I found this:
The site can be slow sometimes, but I have found queries/puzzles that (a) have shown me use-cases for functions and capabilities I have never used and (b) can be challenging! Later they apparently focus more on optimization, but I haven't gotten there. (Especially since they have hundreds of these more difficult queries). I've done about 70ish so far.
What do you guys think about it? Overall, never going to be useful, or a good resource to get better with SQL?
One thing that would be essential is better feedback on errors. Currently, if I enter an invalid SQL query I’m only told that it was incorrect. For beginners, this can be frustrating.
As someone who uses a LOT of SQL, and who has taught people before, I like this as a start, but it definitely needs more visualzation, and I recommend how sqlzoo represents tables in the top right with an overlay, its a simple and clear break down.
I also think that things like sargability need to be discussed early on because while sql/set based stuff can be odd to grok at first, knowing why you are doing the things you do is important in laying the right foundations.
looks good and might be helpful for somebody starting with SQL . You can add some advanced version as well with some more complex examples and use cases with inner queries , UPSERTS n all
Amongst other topics, it covers aggregations, joins, grouping and gives a good well-rounded introduction to performance characteristics and best practices. For those wondering "where to go from here", don't look any further and give the course a shot.
[0] - http://www.padjo.org/