The EXISTS Flaw


In September 1989, C. J. Date published an article titled 'Beware of SQL EXISTS!' in Database Programming & Design. Even though the article appeared over 6 years ago, the EXISTS flaw is still alive and well in modern RDBMSs. This section details the EXISTS flaw documented by Date, using an improved version of the example used by Date.

Example Database

The example database contains a single table, the sp table (shipments). The example table has 3 rows:
sp   sno  pno  qty
     ---- ---- ----
     S1   P1   NULL
     S2   P1   200
     S3   P1   1000
The meaning of a given row is that the indicated supplier (sno) supplies the indicated part (pno) in the indicated quantity (qty). The null in the row (S1,P1,NULL) means value unknown; in other words, supplier S1 does supply part P1, but the relevant quantity is not known, missing.

Example Queries

The basic query we wish to make is - find supplier numbers for suppliers who are known to supply part P1, but not in a quantity of 1,000. The correct answer is S2. All suppliers in the table supply P1, but we do know S3 supplies the part in quantity 1,000 and we don't know in what quantity S1 supplies the part. The only supplier we do know that supplies P1 but definitely not in quantity 1,000 is S2, therefore S2 is a supplier who is known to supply part P1, but not in a quantity of 1,000.

A SQL query to retrieve this result would be:

SELECT DISTINCT spx.sno
FROM   sp spx
WHERE  spx.pno = 'P1'
AND    1000 NOT IN
       ( SELECT spy.qty
         FROM   sp spy
         WHERE  spy.sno = spx.sno
         AND    spy.pno = 'P1') ;
According to ANSI SQL, this query on the example database should return only S2, but when we transform the query into an equivalent version, using EXISTS instead, we obtain:
SELECT DISTINCT spx.sno
FROM   sp spx
WHERE  spx.pno = 'P1'
AND    NOT EXISTS
       ( SELECT spy.qty
         FROM   sp spy
         WHERE  spy.sno = spx.sno
         AND    spy.pno = 'P1'
         AND    spy.qty = 1000) ;
whose result according to ANSI SQL should be S1, S2. This is the wrong answer because we don't definitely known whether S1 supplies P1 in quantity 1,000. We don't know if the null quantity for part P1 from supplier S1 might be 1,000 (it might be 200 or 500 or some other value; the actual value is missing).

As Date explains - "... suppose the variable spx takes on the value (S1,P1,NULL). Then the EXISTS argument logically becomes:

( SELECT spy.qty
  FROM   sp spy
  WHERE  spy.sno = 'S1'
  AND    spy.pno = 'P1'
  AND    spy.qty = 1000)
This evaluates to the empty set since there is no row (S1,P1,1000) in table sp. The EXISTS reference therefore evaluates to false (... EXISTS in SQL is defined to return false if its argument set is empty, and true otherwise). The NOT EXISTS therefore evaluates to true and the overall logically becomes:
SELECT DISTINCT spx.sno
FROM   sp spx
WHERE  spx.pno = 'P1'
       /* true for spx = (S1,P1,1000) */
AND    true
Hence, supplier number S1 appears in the final result (remember that variable spx currently represents the row for supplier S1). Error!"

Date implies in the article's title and in his explanation that the error is with SQL EXISTS. However, as examined in the next section, The Problem is Bigger than Date Thought!.

Return to Contents Page: DBMSs Process Subqueries Correctly?     Return to Issues Page


Copyright © 1998 FFE Software, Inc. All Rights Reserved WorldWide