ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


aboutSQL

More on JOINS

06/01/2001

In the last column, I introduced you to one of the most important workhorse commands in SQL -- the JOIN command. Now we'll take a good look at what joins are and how to create them by focusing on the INNER JOIN, the most commonly used SQL join.

Because the JOIN clause is probably one of the most used and most confusing parts of SQL, I'll do my best to demystify it over the next few columns.

So, tell me again, what's a Join?

One of the fundamental precepts of SQL is that databases are relational. As we discussed in previous columns on database design, one of the goals of good database development is to use relationships to decompose data tables into the most atomic constituents possible.

While that's the point of good database design, the point of a good SQL statement is to reverse the process and pull that data back into some semblance of its original shape. In essence, we must create a virtual table from the information stored in two (or more!) difference tables in the database.

As I discussed last week, you can use the SELECT ... WHERE statement to combine two different tables on a common field:

SELECT * FROM Artist,CD WHERE (CD.ArtistID=Artist.ArtistID)

which can be translated into the equivalent JOIN statement

SELECT * FROM Artist INNER JOIN CD ON CD.ArtistID=Artist.ArtistID

Comment on this articleWhat are your questions, or your comments, about using the JOIN statements.
Post your comments

Also in aboutSQL:

SQL Data Types

Working With Tables

Data Definition Language

Introducing SQL Sets

SQL Subqueries

In both cases, we're using a SELECT statement to pull data from the tables in the database, so the only real difference is syntax, right? Well, yes. These queries test the equality of the tables -- finding the places where the two columns in the join overlap. So obviously the question becomes ...

Why use the JOIN statement at all?

Well, there isn't a completely compelling reason to prefer one way of writing a JOIN command over another. Didn't think that would be my answer, did you? But's important that you understand that they are equivalent statements if you work with SQL created by other people -- as long as you can translate in your head you should be fine.

BUT ... Oracle doesn't give you the option of using the INNER JOIN syntax. Oracle uses the symbolic syntax exclusively and will throw an error if you try to use the inner join method. Most other databases are more forgiving.

One argument in favor of using the explicit JOIN syntax, especially as we learn more about the various kinds of joins, is that the text makes it explicit which kind of join we're trying. I find x LEFT OUTER JOIN y clearer than the symbolic x *= y. The downside of this approach is that your SQL code will simply not work in Oracle. Here's one area where your choice of databases makes a difference in how you'd develop SQL code.

A little more on syntax

You'll notice that in both examples of the SQL JOIN, the join was performed where

CD.ArtistID=Artist.ArtistID

Note that we've fully-qualified the fields that will be joined -- this is required whenever a column name is ambiguous. This has not been a problem when we've been dealing with single tables (DBMSes make sure column names are unique in a single table), but will become an increasingly common issue as we join tables together. DBMSes will generate an error when the query is executed if the column names are ambiguous.

While we're looking at syntax, we can also examine using an alias for a table name. Typing SQL statements like:

SELECT * FROM Artist,CD WHERE CD.ArtistID=Artist.ArtistID

can become more time-consuming as more tables are added to the JOIN statement. It's extremely common to see DBAs to create queries that have shorter table names by aliasing the table name to a one- or two-letter abbreviation. For example,

SELECT * FROM Artist AS A, CD AS C WHERE C.ArtistID=A.ArtistID

The AS statement can be used in many DBMS to assign an alias to a table name that can be used throughout the SQL statement. Of course, Oracle is slightly different (and many other databases support this notation as well).

SELECT * FROM Artist A, CD C WHERE C.ArtistID=A.ArtistID

No AS keyword, otherwise everything is the same.

If we were after just the name of the artist and the album for a summary list, we could combine all these techniques together into the following block of SQL

SELECT A.ArtistName, C.Title FROM Artist A, CD C WHERE C.ArtistID=A.ArtistID

This doesn't look like that much of a savings, but wait until we're writing complex multi-table joins with aggregate functions and filtering rules and then you'll appreciate these tips!

Next steps

In this article, I showed you a little bit about the INNER JOIN statement and some general syntax that applies to all of the SQL JOIN statements. Next week, I'll jump into the various other types of joins -- CROSS, LEFT OUTER, RIGHT OUTER, and NATURAL.

Until then, feel free to contact me with comments and questions about this column, or use the Talk Back feature to share your comments with other readers. Believe it or not, I read all the e-mail you send me and answer the vast majority of it. I'm contemplating using this column to address real-world SQL problems -- be it a thorny SQL challenge, a poorly written SQL query you inherited from another developer, or a newbie question.

If you'd like to see that sort of analysis, let me know -- and if you need that sort of analysis, drop me a line. I won't promise to solve your problem, but I'd like to see what questions are out there. I've gotten some mail already and am incorporating the suggestions into planning for future columns.

John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.


Read more aboutSQL columns.

Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.