*Actually it doesn't.* Can you elaborate on what you mean? The presence of dupli...

davidmathers · on April 23, 2009

People who get hot and bothered about that are almost invariably cranks, in my experience.

Honestly I think you, being a database expert, don't fully appreciate how bad the situation really is. Relational databases are almost universally not understood by programmers. Why? I think it might have something to do with the fact that programmers can't actually use the relational database as a relational database.

They have to programmatically construct a string (seriously, wtf?) to do some relational processing "over there" in the scary database and then serialize the result and import it into a data structure in their language--the complete destruction of the relational idea.

Using a relational database should be similar to using jQuery in javascript programming. The anti-"relational database" backlash is huge right now. And since most of the people involved don't understand what a relational database is they don't understand that it isn't relational databases that they have a problem with.

Have you seen this new thing called Redis? It's "like a key-value store except values can be sets." Wait, it's a "structure server." Pretty soon someone is going to have the brilliant idea of letting the keys be sets also.

I predict that within 3 years the anti-relational people are going to invent the relational database, without realizing it, and be amazed at how awesome it is. Because for the first time in their lives they'll be able to actually use an actual relational database.

That's my database rant.

P.S. As for the SQL != Relational people like Chris Date and Hugh Darwen I don't see how they could be considered cranks. Bitter old men? Maybe. But cranks? No.

silentbicycle · on April 23, 2009

Good point, about redis.

I think part of it, also, is that a lot of peoples' exposure to relational databases is via MySQL and PHP, which is ... not representative.

Most people don't understand why the N-normal forms are significant, etc., because they learned SQL from a simple "how to save and load rows" kind of tutorial, rather than learning it from the big relational ideas outward. (I'm a "learn how it works as a whole" guy, but I recognize that most people aren't.) If you don't know why they matter, it just seems like a bunch of bizarre busywork, and then you get lousy performance.

davidmathers · on April 23, 2009

The presence of duplicates is orthogonal, if that's what you're referring to.

Yes. How is it orthogonal?

noblethrasher · on April 23, 2009

Because a relation is a function that maps tuples to logical values (true, false, null, etc). We just happen to define these function as table lookups so duplicate rows are indeed not a problem. Thinking of tables as sets instead of as functions causes the confusion.

davidmathers · on April 23, 2009

Dude, noblethrasher, what you just said does not make sense. I have no idea what it means or how to parse it.

noblethrasher · on April 23, 2009

My bad.

Suppose I want to define a function that squares integers. Well, I can either define it as a formula:

Square i => i * i

or I can list all of the integers in a table

1 1

2 4

3 9

4 16

etc...

Both of those are perfectly valid definitions for the Square function. The signature of the function is Int -> Int. If a row is listed more than once (e.g. the row 3 9) it still doesn't affect the definition of my function. 3 still maps to 9.

Now suppose I want to define a relation (remember, a relation is a function) of employees that work in my company. In a typical RDMS I might create an Employee "table" with the following schema:

emp_id, first_name, last_name.

The signature of this function is Int -> String -> String -> Bool

It just so happens that people (tuples) explicitly listed in the table map to true and anyone else maps to false.

Conceptually it's the same as if I had a function called Employees where I could do something like this:

  Employees(12, 'bob', 'morton')

which would return true or false depending on whether or not Mr. Morton is an employee in the company. The difference is that in SQL, what is returned is not the result of the function but the definition of the function itself which just happens to look like... a table.

davidmathers · on April 23, 2009

remember, a relation is a function

That's backwards. In a Venn diagram relations are the big circle and functions are the smaller circle inside it. A function is a deterministic relation.

If you mean to say a table is a function, then yes. All relations, including functions, can be represented using a table.

If a row is listed more than once (e.g. the row 3 9) it still doesn't affect the definition of my function. 3 still maps to 9.

This is the issue really. I think what you're saying is:

  {1,2,3,3} == {1,2,3}

That's true mathematically, but that's not what's happening in SQL. Try creating the table to represent the square function. Now try inserting (3,9) into it twice. SQL won't let you because it forces tables to represent relations. Not queries though.

In my example {1,2,3} is the direct representation of the value of the set and {1,2,3,3} is a different representation. The tables and queries in SQL are direct representations. If a SQL query looks like |1|2|3|3| then what you actually have, mathematically, is {{1,2,3},{3}}, aka multi-set or bag.

  {{1,2,3},{3}} != {1,2,3}

That's what's happening when a row is listed more than once. It does affect the definition of your function.

noblethrasher · on April 24, 2009

> That's backwards. In a Venn diagram relations are the big circle and functions are the smaller circle inside it. A function is a deterministic relation.

That's partially right. In a Venn diagram, whenever you see a little circle S contained in a big circle T what you have is a relation that says S is a subset of T. Specifically this is a relation that maps a set of sets to {TRUE, FALSE}. Fundamentally we have sets and mappings[1] between sets. A relation is a mapping between some set and a set of logical values. So relations are mappings but not all mappings are relations. In the case of SQL a relation is a well-defined mapping so we call it a 'function'.

> If a SQL query looks like |1|2|3|3| then what you actually have, mathematically, is {{1,2,3},{3}}, aka multi-set or bag.

No sir. What you have, formally, is a programmatically generated relation/function (under a closed-world assumption) in which one of the entries in the mapping has been overly specified. Your relation maps the set {1,2,3, 4 ...} to the set {TRUE, FALSE}. If you pass it the value 1 or 2 or 3 you get TRUE. If you pass it the value 4 you get FALSE. The domain of this function is {1,2,3, 4, ...} and the codomain (or range) is {TRUE, FALSE}. Having an entry listed twice in the definition is similar to the following:

  f(x) = 0 when x = 0 and x * x if x is real.

That function has been overly specified. Now if we had a table with entries like this:

  1 Fred Jones

  2 James Smith

  3 Jane Doe

  4 Alice Baker

  3 George Washington

then we have a problem since the 'function' is now ambiguous (and hence not a function).

Now, back to my employee table example. If I have a table called Employees defined as:

EMP_ID (int), FIRST_NAME (string), LAST_NAME (string)

where EMP_ID is the primary key then what I have is a composition of functions (a relation in SQL is always a function)

One function, we'll call it G maps full names to Integers and has this 'signature': S x S -> Z where S is the set of strings and Z is the set of integers.

The other function, we'll call it H maps Integers to {TRUE, FALSE} so my table is actually equivalent to the function H of G.

Now I realize that this is all abstract and a bit pedantic but it is nonetheless what SQL and relational algebra are all about even if we don't ordinarily think of it like this.

[1] Formally, mappings, functions, transformations and operations are all the same things just used in specific contexts.

silentbicycle · on April 23, 2009

I think what he's saying is that a relation can be seen as a function that returns whether or not something is present, rather than a set of the results. I think.