ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


aboutSQL

Introducing SQL Sets

08/02/2001

SQL has many language elements that are not commonly introduced in quick surveys of the language. Because SQL is based on relational algebra, it should come as no surprise that the standard contains several set-oriented operators.

In the interest of completeness, and to provide a tool that is extremely useful in dealing with heterogeneous data, it's time we take a look at the UNION operator.

SQL and sets

If you were to dredge up the memories of middle school, you'll probably remember vague notions of learning about sets. If you have ever studied logic, you may have even worked with Venn diagrams.

Sets are simply groups of discrete elements. In the strict mathematical world, each set is made up of unique numerical elements; in the SQL world, a set is a collection of rows of data that may or may not contain duplicates. You've actually been working with SQL sets for quite a while -- queries are simply a set (strictly speaking, a subset) of a database table.

There are three simple ways to compare sets:

While all of these functions are defined in the SQL standard (at least in the SQL-92 standard), the exact implementation varies widely by vendor. Many database engines leave out some or all of these functions because they are not used as frequently as other commands. Or vendors may omit them because the function of the operators can be recreated using SQL and/or the database engine's programming language. In rough order of implementation, most databases allow the UNION operator, some provide the INTERSECT operator, and a few provide the functionality of the EXCEPT operator using a different operator name (for example, Oracle's MINUS operator).

Comment on this articleAsk John Paul your questions about the UNION operator and SQL sets in general.
Post your comments

Also in aboutSQL:

SQL Data Types

Working With Tables

Data Definition Language

Using SQL sets

I'm sure that some of you may be wondering why I dedicated a column to this dusty corner of the SQL world. Fear not! There's a very good reason to learn about SQL sets, particularly the UNION operator.

Working with sets provides the flexibility to manipulate two tables that otherwise have no relationship -- where joins are not possible or useful. Let's start with an simple example. You have a database with a table of employees, and a table of suppliers.


Employees
EmployeeID EmployeeName Address ZIP
61 Sue Smith 100 South St. 12345
62 David Jones 2525 1st St. 12345


Suppliers
SupplierID Contact Address ZIP
A1-674 Troy Parker 1100 Main St. 23456
ZZ-1A1 Claire Smith-Jones 400 East Main, Apt 5 56789

If you were sending Christmas cards to all suppliers and employees, you could certainly use a single table of name, address, and ZIP code data that could be the data source for a mail merge. Of course, you could create a new table, execute two SQL statements to insert the appropriate columns, and call it a day.

SQL in a Nutshell

SQL in a Nutshell
By Kevin Kline with Daniel Kline, Ph.D.
January 2001
1-56592-744-3, Order Number: 7443
224 pages, $29.95

But what if this is an activity that you need to repeat frequently? Or what if the dataset is large enough that it is impractical to create a new database table? Or what if you simply don't have the permissions to do anything other than SELECT queries in your database? A UNION query is the answer!

The syntax for a UNION query is very straightforward:

query 1 UNION [ALL] query2 [ORDER BY sort_order]

The optional ALL keyword indicates the UNION should include all duplicates which would be ignored by default. So to create our table of addresses, we can use the following SQL statement

SELECT Contact AS Name, Address, ZIP FROM Suppliers
UNION
SELECT EmployeeName AS Name, Address, ZIP FROM Employees

which would produce a table something like the following:

Name Address ZIP
Sue Smith 100 South St. 12345
David Jones 2525 1st St. 12345
Troy Parker 1100 Main St. 23456
Claire Smith-Jones 400 East Main, Apt 5 56789

This doesn't seem like such a big deal. But what if the data tables were somewhat more heterogeneous? One interesting thing about the UNION operator is that in general, any two queries can be joined as long as they have the same number of columns. Note that the data types don't have to be the same. That's pretty useful!

In the past, we've discussed using one-to-one tables to store specific data such as information about a music CD or a book that is linked to the table of common data about items in a store. If you instead chose to create a table of CDs and a table of books, you could create a SQL statement like the following that will generate a single result set consisting heterogeneous data:

SELECT CatalogID, Price, Description, PlayingTime AS Custom1, NumOfTracks AS Custom2, Artist AS Custom3, Label AS Custom4 FROM CDs
UNION
SELECT CatalogID, Price, Description, PageCount AS Custom1, Author AS Custom2, PublishDate AS Custom3, ISBN AS Custom4 FROM Books

In this scenario, not only do the columns in the new set contain values that mean different things, they can even contain different types of data (playing time is probably a time or text field, the page count is probably an integer).

Next steps

Why don't more people use UNION queries? And what about INTERSECT and EXCEPT? The truth is that these operations can be computationally intensive and there are often alternatives -- such as pulling out two data sets and using a C++ program to create the union, intersection, or difference set. But if your database supports them, they can occasionally be a lifesaver.

Starting in the next column, we're going to switch from discussing SQL for queries to discussing SQL statements for constructing tables and databases. That topic will occupy us for several columns and lead directly in to issues such as triggers, constraints, and all the myriad other tools that can save you time and effort. Until then, feel free to contact me with comments and questions.

John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.

Read more aboutSQL columns.


Return to ONLamp.com.

John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.

Copyright © 2009 O'Reilly Media, Inc.