SQL Problems with old style outer JOINs
SQL
Download (.zip)
-- examine some issues with the old-style \ -- outer join using a simple example of only two tables, -- Customers and Orders: DROP TABLE Orders GO DROP TABLE Customers GO CREATE TABLE Customers ( Cust_ID int PRIMARY KEY, Cust_Name char(20) ) GO
CREATE TABLE Orders ( OrderID int PRIMARY KEY, Cust_ID int REFERENCES Customers(Cust_ID) ) GO
INSERT Customers VALUES (1, 'Cust 1') INSERT Customers VALUES (2, 'Cust 2') INSERT Customers VALUES (3, 'Cust 3') INSERT Orders VALUES (10001, 1) INSERT Orders VALUES (20001, 2) GO -- At a glance, in the simplest case, -- the new-style and old-style syntax appear -- to work the same. Here’s the new syntax: SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers LEFT JOIN Orders ON Customers.Cust_ID=Orders.Cust_ID
--And here’s the old-style syntax: SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders WHERE Customers.Cust_ID *= Orders.Cust_ID
-- As you begin to add restrictions, things get tricky. -- What if you want to filter out Cust 2? -- With the new syntax it’s easy, but remember -- not to filter out the row with NULL -- that the outer join just preserved!
SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers LEFT JOIN Orders ON Customers.Cust_ID=Orders.Cust_ID WHERE Orders.Cust_ID <> 2 OR Orders.Cust_ID IS NULL
-- Now try to do this query using the old-style syntax -- and filter out Cust 2:
SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders WHERE Customers.Cust_ID *= Orders.Cust_ID AND (Orders.Cust_ID <> 2 OR Orders.Cust_ID IS NULL)
-- Notice that this time, we don’t get rid of Cust 2. -- The check for NULL occurs before the JOIN, -- so the outer-join operation puts Cust 2 back. -- This result might be less than intuitive, -- but at least we can explain and defend it. -- That’s not always the case, as you’ll see in a moment. -- If you look at the preceding query, -- you might think that we should have filtered out -- Customers.Cust_ID rather than Orders.Cust_ID. -- How did we miss that? -- Surely this query will fix the problem:
SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders WHERE Customers.Cust_ID *= Orders.Cust_ID AND (Customers.Cust_ID <> 2 OR Orders.Cust_ID IS NULL)
-- Oops! Same result. -- The problem here is that Orders.Cust_ID IS NULL -- is now being applied after the outer join, -- so the row is presented again. -- If we’re careful and understand exactly -- how the old outer join is processed, -- we can get the results we want -- with the old-style syntax for this query. -- We need to understand that the OR Orders.Cust_ID IS NULL -- puts back Cust_ID 2, so just take that out. -- Here is the code:
SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders WHERE Customers.Cust_ID *= Orders.Cust_ID AND Customers.Cust_ID <> 2
-- Finally! This is the result we want. -- And if you really think about it, -- the semantics are even understandable -- (although different from the new style). -- So maybe this is much ado about nothing; -- all we have to do is understand how it works, right? -- Wrong. Besides the issues of joins with -- more than two tables and the lack of a full outer join, -- we also can’t effectively deal with subqueries and views. -- For example, let’s try creating a view with -- the old-style outer join: GO DROP VIEW Cust_old_OJ GO CREATE VIEW Cust_old_OJ AS (SELECT Orders.Cust_ID, Customers.Cust_Name FROM Customers, Orders WHERE Customers.Cust_ID *= Orders.Cust_ID) GO -- A simple select from the view looks fine: SELECT * FROM Cust_old_OJ
-- But restricting from this view doesn’t -- seem to make sense: SELECT * FROM Cust_old_OJ WHERE Cust_ID <> 2 AND Cust_ID IS NOT NULL
-- If we expand the view to the full select and -- we realize that Cust_ID is Orders.Cust_ID, -- not Customers.Cust_ID, perhaps we can understand -- why this happened. But we still can’t filter out -- those rows! -- In contrast, if we create the view with -- the new syntax and correct semantics, -- it works exactly as expected: GO DROP VIEW Cust_new_OJ GO CREATE VIEW Cust_new_OJ AS (SELECT Orders.Cust_ID, Customers.Cust_Name FROM Customers LEFT JOIN Orders ON Customers.Cust_ID=Orders.Cust_ID ) GO
SELECT * FROM Cust_new_OJ WHERE Cust_ID <> 2 AND Cust_ID IS NOT NULL
-- In these examples, the new syntax performed -- the outer join and then applied the restrictions -- in the WHERE clause to the result. In contrast, -- the old style applied the WHERE clause to the -- tables being joined and then performed the outer join, -- which can reintroduce NULL rows. -- This is why the results often seemed bizarre. -- However, if that behavior is what you want, -- you could apply the criteria in the JOIN clause -- instead of in the WHERE clause.
-- The following example uses the new syntax to -- mimic the old behavior. The WHERE clause is shown -- here simply as a placeholder to make clear that -- the statement Cust_ID <> 2 is in the JOIN section, -- not in the WHERE section.
SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers LEFT JOIN Orders ON Customers.Cust_ID=Orders.Cust_ID AND Orders.Cust_ID <> 2 WHERE 1=1
-- As you can see, the row for Cust 2 was -- filtered out from the Orders table before the join, -- but because it was NULL, -- it was reintroduced by the OUTER JOIN operation.
|