Monday, April 26, 2010

ANSI join syntax vs. non-ANSI join syntax

Recently I've participated in discussion about the SQL ANSI join syntax vs. non-ANSI. In my opinion, non-ANSI join syntax is much more simpler to read and 99% of the statements could be effectivlly written using Oracle syntax. The only case when i use ANSI syntax is FULL OUTER JOIN, because it's simple to use, to understand and to write. One of the additional arguments of using ANSI syntax was the following statement:

 

SQL> with A as
2 (select 1 COL1, 'A' COL2
3 from DUAL
4 union all
5 select 2, 'B'
6 from DUAL
7 union all
8 select 3, 'C' from DUAL),
9 B as
10 (select 1 COL1, 'D' COL2
11 from DUAL
12 union all
13 select 2, 'E' from DUAL)
14 select A.COL2, B.COL2 from A left outer join
15 B on (A.COL1 = B.COL1 and B.COL2 in ('D', 'E'))
16 /

COL2 COL2
---- ----
A D
B E
C

SQL>




as you can see LEFT OUTER JOIN is beign using with IN clause which is not allowed in non-ANSI syntax:


 

SQL> with
2 a as (select 1 col1, 'A' col2 from dual
3 union all
4 select 2, 'B' from dual
5 union all
6 select 3, 'C' from dual),
7 b as (select 1 col1, 'D' col2 from dual
8 union all
9 select 2, 'E' from dual)
10 select a.col2, b.col2
11 from a, b
12 where a.col1 = b.col1 (+)
13 and b.col2 in ('D','E')
14 /

COL2 COL2
---- ----
A D
B E

SQL>
SQL> with
2 a as (select 1 col1, 'A' col2 from dual
3 union all
4 select 2, 'B' from dual
5 union all
6 select 3, 'C' from dual),
7 b as (select 1 col1, 'D' col2 from dual
8 union all
9 select 2, 'E' from dual)
10 select a.col2, b.col2
11 from a, b
12 where a.col1 = b.col1 (+)
13 and b.col2(+) in ('D','E')
14 /

with
a as (select 1 col1, 'A' col2 from dual
union all
select 2, 'B' from dual
union all
select 3, 'C' from dual),
b as (select 1 col1, 'D' col2 from dual
union all
select 2, 'E' from dual)
select a.col2, b.col2
from a, b
where a.col1 = b.col1 (+)
and b.col2(+) in ('D','E')

ORA-01719: outer join operator (+) not allowed in operand of OR or IN

SQL>



Yes, it's not allowed, but could we re-write the query with non-ANSI syntax without much overhead? Yes, we are:

 

SQL> with
2 a as (select 1 col1, 'A' col2 from dual
3 union all
4 select 2, 'B' from dual
5 union all
6 select 3, 'C' from dual),
7 b as (select 1 col1, 'D' col2 from dual
8 union all
9 select 2, 'E' from dual)
10 select a.col2, b.col2
11 from a, b
12 where a.col1 = b.col1 (+)
13 and (b.col2 in ('D','E') or b.col2 is null)
14 /

COL2 COL2
---- ----
A D
B E
C

SQL>



let's check the execution plans for ANSI syntax and non-ANSI:

 

SQL> select * from v$version
2 /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production




ANSI:

 


---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 |
| 1 | NESTED LOOPS OUTER| | 1 | 6 | 3 |00:00:00.01 |
| 2 | VIEW | | 1 | 3 | 3 |00:00:00.01 |
| 3 | UNION-ALL | | 1 | | 3 |00:00:00.01 |
| 4 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 5 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 7 | VIEW | | 3 | 2 | 2 |00:00:00.01 |
|* 8 | VIEW | | 3 | 2 | 2 |00:00:00.01 |
| 9 | UNION-ALL | | 3 | | 6 |00:00:00.01 |
| 10 | FAST DUAL | | 3 | 1 | 3 |00:00:00.01 |
| 11 | FAST DUAL | | 3 | 1 | 3 |00:00:00.01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("A"."COL1"="B"."COL1")



non-ANSI syntax:

 

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | | | |
|* 1 | FILTER | | 1 | | 3 |00:00:00.01 | | | |
|* 2 | HASH JOIN OUTER| | 1 | 1 | 3 |00:00:00.01 | 1348K| 1348K| 907K (0)|
| 3 | VIEW | | 1 | 3 | 3 |00:00:00.01 | | | |
| 4 | UNION-ALL | | 1 | | 3 |00:00:00.01 | | | |
| 5 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 7 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 8 | VIEW | | 1 | 2 | 2 |00:00:00.01 | | | |
| 9 | UNION-ALL | | 1 | | 2 |00:00:00.01 | | | |
| 10 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 11 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter((INTERNAL_FUNCTION("B"."COL2") OR "B"."COL2" IS NULL))
2 - access("A"."COL1"="B"."COL1")



What we see is that both execution plans has 11 operations, but the main different is in JOIN operations. In case ANSI we have NESTED LOOPS OUTER which loops 3 times through B for each record in A. In case non-ANSI syntax we have HASH JOIN OUTER join which executed only once but with additional overhead with filter predicate.

As result i could state that non-ANSI example looks much better especially in case with much data in source, let's check:

ANSI:

 

with
a as (select level col1, dbms_random.string(1, 3) col2 from dual connect by level <= 1000),
b as (select level col1, dbms_random.string(1, 3) col2 from dual connect by level <= 100)
select A.COL2, B.COL2 from A left outer join
B on (A.COL1 = B.COL1 and B.COL2 in ('D', 'E'))
/

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.47 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.47 |
| 2 | NESTED LOOPS OUTER | | 1 | 1 | 1000 |00:00:01.46 |
| 3 | VIEW | | 1 | 1 | 1000 |00:00:00.01 |
| 4 | CONNECT BY WITHOUT FILTERING | | 1 | | 1000 |00:00:00.01 |
| 5 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 6 | VIEW | | 1000 | 1 | 0 |00:00:01.46 |
|* 7 | VIEW | | 1000 | 1 | 0 |00:00:01.46 |
| 8 | CONNECT BY WITHOUT FILTERING| | 1000 | | 100K|00:00:00.11 |
| 9 | FAST DUAL | | 1000 | 1 | 1000 |00:00:00.01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

7 - filter(("A"."COL1"="B"."COL1" AND INTERNAL_FUNCTION("B"."COL2")))



non-ANSI:

 

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | | | |
|* 2 | FILTER | | 1 | | 900 |00:00:00.01 | | | |
|* 3 | HASH JOIN OUTER | | 1 | 1 | 1000 |00:00:00.01 | 1452K| 1452K| 1264K (0)|
| 4 | VIEW | | 1 | 1 | 1000 |00:00:00.01 | | | |
| 5 | CONNECT BY WITHOUT FILTERING| | 1 | | 1000 |00:00:00.01 | | | |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 7 | VIEW | | 1 | 1 | 100 |00:00:00.01 | | | |
| 8 | CONNECT BY WITHOUT FILTERING| | 1 | | 100 |00:00:00.01 | | | |
| 9 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter((INTERNAL_FUNCTION("B"."COL2") OR "B"."COL2" IS NULL))
3 - access("A"."COL1"="B"."COL1")



As result we have elapsed time for NESTED LOOPS OUTER operation equal to 00:00:01.46 in ANSI case and 00:00:00.01 for HASH JOIN OUTER in non-ANSI case. It's important to note that in ANSI case we have 100000(100K) records in step ID = 8 which lead to use more resourses and consequently more time for execution.

Any way the example doesn't show that ANSI join syntax is something to be avoid, but shows that ANSI / non-ANSI join syntax may be treated differently by SQL engine.

1 comment:

electronic signature for sharepoint said...

I never knew about these two variation of join syntax.Although I am using this in my SQL queries.Thanks for writing as it added to my knowledge