oracle执行计划中的Semi-Join和Anti-Join(6)

Prerequisites and Hints that Impact Anti-Joins

The key prerequisite for the anti-join access paths is that the subquery of a NOT IN clause cannot be capable of returning a null value. As mentioned before, this means that either the columns being selected must have NOT NULL constraints or there must be predicates in the WHERE clause of the subquery to ensure there are no nulls. Even if every row in the table has a non-null value in a nullable column, you must still add the extra predicate to the WHERE clause or else Oracle will refuse to use the merge and hash anti-join access paths.

Assuming that an anti-join access path is possible, Oracle chooses the anti-join algorithm to use while evaluating execution plans. In Oracle 9i an anti-join can be performed using the nested loops, merge, or hash join algorithms. As with a conventional join, Oracle will choose the join algorithm with the lowest cost, or it may choose to use the filter technique instead of an anti-join altogether. In Oracle 8i an anti-join can be performed using the merge or hash join algorithms, but as with semi-joins, Oracle 8i will not choose the join algorithm with the lowest cost. Instead, the always_anti_join instance parameter dictates which approach Oracle 8i should use. (Oracle 8i does not implement the nested loops anti-join access path, so if the always_anti_join parameter is set to nested_loops, the filter technique is used instead.) The always_anti_join instance parameter is obsolete in Oracle 9i.

Oracle provides the NL_AJ, MERGE_AJ, and HASH_AJ hints in order for you to manipulate the anti-join process if you need to. As with the semi-join hints, the hint is applied to the subquery and not the main body of the query itself.

Anti-Join Example #1

Suppose you want a list of customers who have not placed an order within the last ten days. You might start with a query that looks like:

        SELECT   C.short_name, C.customer_id
        FROM     customers C
        WHERE    NOT EXISTS
                 (
                 SELECT 1
                 FROM   orders O
                 WHERE  O.customer_id = C.customer_id
                 AND    O.order_date > SYSDATE - 10
                 )
        ORDER BY C.short_name;
The execution plan looks like:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
             11  SORT ORDER BY (cr=18707 r=301 w=0 time=22491917 us)
             11   FILTER  (cr=18707 r=301 w=0 time=22491555 us)
           1000    TABLE ACCESS FULL CUSTOMERS (cr=38 r=36 w=0 time=15277 us)
            989    VIEW  (cr=18669 r=265 w=0 time=22365647 us)
            989     HASH JOIN  (cr=18669 r=265 w=0 time=22347234 us)
         100000      INDEX RANGE SCAN ORDERS_N1_CUST_ID (cr=2207 r=208 w=0 time=277584 us)(object id 34555)
        5338680      INDEX RANGE SCAN ORDERS_N2_ORD_DATE (cr=16462 r=57 w=0 time=9036516 us)(object id 34556)
You can see that the query took over 22 seconds to run and performed 18,707 logical reads. Oracle did not use an anti-join access path in this execution plan, instead opting to read every row in the CUSTOMERS table and use the subquery as a filter by executing it once for each customer. Although Oracle found a clever way to hash join two indexes on the ORDERS table and thereby make the subquery very efficient, the query overall is still resource intensive because the subquery must execute 1000 times—once for each customer.

This query seems like a good candidate for an anti-join access path because executing the subquery once and performing a merge or hash anti-join might be more efficient than performing the subquery once for each customer as a filter operation. We could rewrite the query using a NOT IN clause instead of NOT EXISTS to see if Oracle chooses an anti-join access path:

        SELECT   C.short_name, C.customer_id
        FROM     customers C
        WHERE    C.customer_id NOT IN
                 (
                 SELECT O.customer_id
                 FROM   orders O
                 WHERE  O.order_date > SYSDATE - 10
                 )
        ORDER BY C.short_name;
Unfortunately, the results are quite disappointing—elapsed time shot up to 695 seconds! The execution plan looks like:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
             11  SORT ORDER BY (cr=5360749 r=4870724 w=0 time=695232973 us)
             11   FILTER  (cr=5360749 r=4870724 w=0 time=695232475 us)
           1000    TABLE ACCESS FULL CUSTOMERS (cr=38 r=129 w=0 time=61614 us)
            989    TABLE ACCESS BY INDEX ROWID ORDERS (cr=5360711 r=4870595 w=0 time=695088325 us)
        5359590     INDEX RANGE SCAN ORDERS_N2_ORD_DATE (cr=16520 r=0 w=0 time=22299946 us)(object id 34556)
You can see that Oracle did not choose an anti-join access path. Instead, Oracle is still reading every row from the CUSTOMERS table and executing the subquery once for each of the 1000 customers. Moreover, Oracle is no longer hash joining the two ORDERS indexes to make the subquery very efficient. For each customer, Oracle is now grabbing all orders from the last 10 days and sifting through them to find those belonging to the customer currently being processed. It turns out that the CUSTOMER_ID column in the ORDERS table is nullable, and so Oracle conceptually rewrites the query in order to implement the special null value semantics of NOT IN. The rewritten query looks like:

        SELECT   C.short_name, C.customer_id
        FROM     customers C
        WHERE    NOT EXISTS
                 (
                 SELECT 1
                 FROM   orders O
                 WHERE  O.order_date > SYSDATE – 10
                 AND    NVL (O.customer_id, C.customer_id) = C.customer_id
                 )
        ORDER BY C.short_name;
You can see how the implicit NVL() function that Oracle inserts during the transformation would be necessary in order to cause the predicate to evaluate to false if a null value is returned. You can also see how this defeats the index on the CUSTOMER_ID column of the ORDERS table.

You might wonder why the CUSTOMER_ID column in the ORDERS table can be null. Let’s suppose the developers tell you that this handles a special case and you should not worry about it. The developers want the query to ignore nulls. Thus you can add an extra predicate to the WHERE clause of the subquery in order to meet the prerequisite for anti-join access paths. The resulting query is:

        SELECT   C.short_name, C.customer_id
        FROM     customers C
        WHERE    C.customer_id NOT IN
                 (
                 SELECT O.customer_id
                 FROM   orders O
                 WHERE  O.order_date > SYSDATE - 10
                 AND    O.customer_id IS NOT NULL
                 )
        ORDER BY C.short_name;
The execution plan becomes:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
             11  SORT ORDER BY (cr=311 r=132 w=98 time=1464035 us)
             11   HASH JOIN ANTI (cr=311 r=132 w=98 time=1463770 us)
           1000    TABLE ACCESS FULL CUSTOMERS (cr=38 r=34 w=0 time=37976 us)
          20910    VIEW  (cr=273 r=98 w=98 time=1318222 us)
          20910     HASH JOIN  (cr=273 r=98 w=98 time=1172207 us)
          20910      INDEX RANGE SCAN ORDERS_N2_ORD_DATE (cr=58 r=0 w=0 time=43870 us)(object id 34556)
         100000      INDEX FAST FULL SCAN ORDERS_N1_CUST_ID (cr=215 r=0 w=0 time=196953 us)(object id 34555)
You can see that Oracle now chooses to perform a hash anti-join. The CUSTOMERS table gets a full table scan as before, but the critical difference is that now the subquery executes just once. The two indexes on the ORDERS table are hash joined once in order to develop a list of order dates for each customer, and this is hash anti-joined to the CUSTOMERS table in order to yield the list of customers with no orders in the last 10 days. The result is a query that completes in 1.46 seconds and performs just 311 logical reads.

In this example we did not need to use hints to get Oracle to find the most efficient execution plan. All we had to do was write the query in a way that made it eligible for the anti-join access paths. (This entailed using NOT IN instead of NOT EXISTS, and adding a predicate to the subquery’s WHERE clause to make it impossible for the subquery to return a null value.)

Anti-Join Example #2

Along the lines of the previous example, suppose you needed to find out if a specific customer has not placed an order within the last 10 days. Your query might look like:

        SELECT   C.short_name, C.customer_id
        FROM     customers C
        WHERE    C.customer_id = 22
        AND      NOT EXISTS
                 (
                 SELECT 1
                 FROM   orders O
                 WHERE  O.order_date > SYSDATE - 10
                 AND    O.customer_id = C.customer_id
                 )
        ORDER BY C.short_name;
The execution plan looks like:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
              0  TABLE ACCESS BY INDEX ROWID CUSTOMERS (cr=6 r=0 w=0 time=341 us)
              0   INDEX UNIQUE SCAN CUSTOMERS_PK (cr=6 r=0 w=0 time=333 us)(object id 34552)
              1    TABLE ACCESS BY INDEX ROWID ORDERS (cr=4 r=0 w=0 time=170 us)
              2     INDEX RANGE SCAN ORDERS_N1_CUST_ID (cr=2 r=0 w=0 time=65 us)(object id 34555)
As you can see, Oracle uses a filter operation instead of an anti-join. This is not surprising, because we used the NOT EXISTS construct instead of NOT IN. The query performed 6 logical reads and completed in 341 microseconds. Customer 22 has 110 orders in the system. By looking at the row counts in the execution plan, you can see that Oracle stopped the filter process as soon as it found the first order placed by the customer within the last 10 days. There is no need to look at all of the orders placed by this customer because we’ve already found one that excludes the customer from the result set.

This is an example of a query with an anti-join where the main query is extremely selective—we are only looking at one customer record. This begs the question: Would this be a good opportunity to use the nested loops anti-join access path? It is hard to imagine an execution plan for this query that is more efficient than what we have just seen, but let’s try. The query rewritten with a NOT IN clause and an NL_AJ hint looks like:

        SELECT   C.short_name, C.customer_id
        FROM     customers C
        WHERE    C.customer_id = 22
        AND      C.customer_id NOT IN
                 (
                 SELECT /*+ NL_AJ */ O.customer_id
                 FROM   orders O
                 WHERE  O.order_date > SYSDATE - 10
                 AND    O.customer_id IS NOT NULL
                 )
        ORDER BY C.short_name;
The execution plan is:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
              0  SORT ORDER BY (cr=115 r=86 w=0 time=16824 us)
              0   NESTED LOOPS ANTI (cr=115 r=86 w=0 time=16724 us)
              1    TABLE ACCESS BY INDEX ROWID CUSTOMERS (cr=3 r=2 w=0 time=467 us)
              1     INDEX UNIQUE SCAN CUSTOMERS_PK (cr=2 r=2 w=0 time=429 us)(object id34552)
             10    TABLE ACCESS BY INDEX ROWID ORDERS (cr=112 r=84 w=0 time=16213 us)
            110     INDEX RANGE SCAN ORDERS_N1_CUST_ID (cr=2 r=0 w=0 time=308 us)(object id 34555)
You can see that Oracle obeyed our hint and used the nested loops anti-join access path. Unfortunately, the number of logical reads went up from 6 to 115 and the elapsed time went up by a factor of 50 to 16824 microseconds. Why? The row counts in the execution plan tell the story. Oracle looked at all 110 orders from the customer and found that 10 were placed less than 10 days ago. Oracle did a lot of unnecessary work. In this regard the filter access path seems much smarter than the nested loops anti-join access path. This behavior was seen on an Oracle 9.2.0.4 database. Let’s hope Oracle fixes this in a future release. Until then, I can’t think of a case where the nested loops anti-join access path would be useful.

By the way, it turns out that Oracle would have chosen the nested loops anti-join access path even without the NL_AJ hint. So it turns out that this query was more efficient when written with a NOT EXISTS than a NOT IN. Again, hopefully this is just an anomaly that will be fixed in a future release of Oracle.

Anti-Join Example #3

I recently used the concepts presented in this section to bring a client’s query down from 33 minutes (where one CPU on the server was pegged) to less than four seconds. It turned out to be another case of the implicit NVL() function in a NOT IN clause defeating an index. This time it was a query that showed a count of calls to the customer service center placed by users who did not belong to corporate customers. The original query looked like this:

        SELECT COUNT(*)
        FROM   calls C
        WHERE  C.requestor_user_id NOT IN
               (
               SELECT CM.member_user_id
               FROM   corp_members CM
               );
(Once again names have been changed to protect the innocent.) The execution plan looked like this:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
              1  SORT AGGREGATE (cr=12784272 r=1864678 w=0 time=1978321835 us)
              0   FILTER  (cr=12784272 r=1864678 w=0 time=1978321817 us)
         184965    TABLE ACCESS FULL CALLS (cr=3588 r=1370 w=0 time=979769 us)
          61032    TABLE ACCESS FULL CORP_MEMBERS (cr=12780684 r=1863308 w=0 time=2948544268 us)
It appears that Oracle is reading rows from the CALLS table one at a time and, for each row, is scanning the CORP_MEMBERS table until it finds a row with a matching MEMBER_USER_ID. Since there are over 180,000 calls in the system, it takes over 180,000 partial scans of the CORP_MEMBERS table to complete this query. (The CORP_MEMBERS table has 584 blocks below its high water mark.)

A quick check of the CORP_MEMBERS table shows that the MEMBER_USER_ID column is indexed. However, the column is also nullable and so Oracle cannot use the index because of the implicit NVL() function associated with the NOT IN clause.

Since the client saw records in the CORP_MEMBERS table with a null MEMBER_USER_ID as an unrelated special case, they wanted the query to ignore such records. So my first step was to rewrite the query with the extra WHERE predicate in the subquery that makes the NOT IN clause ignore nulls:

        SELECT COUNT(*)
        FROM   calls C
        WHERE  C.requestor_user_id NOT IN
               (
               SELECT CM.member_user_id
               FROM   corp_members CM
               WHERE  CM.member_user_id IS NOT NULL
               );
The resulting execution plan was:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
              1  SORT AGGREGATE (cr=3790 r=3906 w=420 time=5450615 us)
              0   HASH JOIN ANTI (cr=3790 r=3906 w=420 time=5450591 us)
         184965    TABLE ACCESS FULL CALLS (cr=3588 r=3480 w=0 time=646554 us)
          81945    INDEX FAST FULL SCAN CORP_MEMBERS_USER_ID (cr=202 r=6 w=0 time=113178 us)(object id 34580)
Now not only did Oracle use the index on the MEMBER_USER_ID column when accessing the CORP_MEMBERS table, but more importantly Oracle also switched to a hash anti-join access path. This means that Oracle only had to access the CORP_MEMBERS table once instead of more than 180,000 times. The execution time dropped by 99% to 5.45 seconds.

The users were thrilled with the improvement and the problem was considered resolved. But to better understand Oracle’s data access paths and join methods, I benchmarked two more versions of the query. First I added a MERGE_AJ hint to cause a merge anti-join instead of a hash anti-join. This version of the query performed the same number of logical reads as the hash anti-join but used a little more CPU time. In this situation it looked like the merge anti-join used up more CPU time on sorting than did the hash anti-join.

I also tried rewriting the query with a NOT EXISTS clause to see how it would perform using a filter operation instead of an anti-join access path. This version of the query looked like:

        SELECT COUNT(*)
        FROM   calls C
        WHERE  NOT EXISTS
               (
               SELECT 1
               FROM   corp_members CM
               WHERE  CM.member_user_id = C.requestor_user_id
               );
The execution plan was:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
              1  SORT AGGREGATE (cr=125652 r=3489 w=0 time=3895569 us)
              0   FILTER  (cr=125652 r=3489 w=0 time=3895547 us)
         184965    TABLE ACCESS FULL CALLS (cr=3588 r=3489 w=0 time=906699 us)
          61032    INDEX RANGE SCAN CORP_MEMBERS_USER_ID (cr=122064 r=0 w=0 time=1842121 us)(object id 34580)
This version of the query performed a lot more logical reads than the hash anti-join version, but actually used less CPU time. This query appeared to hammer on the blocks of the CORP_MEMBERS_USER_ID index very heavily. Most likely the whole index segment was sitting in the buffer cache and thus the logical reads were relatively inexpensive. Meanwhile, the large sort required by a hash anti-join was avoided and the result was a net reduction in CPU time.

These examples demonstrate the value of the anti-join access paths and the significance of the way NOT IN treats null values. Again, your gains may not be as spectacular as those shown here, but hopefully you can see from this discussion how the proper use of anti-joins and NOT EXISTS and NOT IN clauses can make a certain class of queries much more efficient. These examples also demonstrate the importance of the difference between NOT EXISTS and NOT IN, and the apparent behavior (although not fully documented as far as I can tell) that Oracle typically will not apply anti-join access paths to a NOT EXISTS clause.

Conclusion

The semi-join and anti-join are two special ways of joining data from multiple tables in a query. Although SQL does not have a special symbol or keyword to designate a semi- or anti-join, we use the EXISTS, IN, NOT EXISTS, and NOT IN constructs to indicate when one of these joins is desired. Over the years Oracle has gotten better at optimizing semi-joins and anti-joins, and new data access paths make it possible to write very efficient queries that use these two types of joins. However, there are certain prerequisites to be met before Oracle can use these access paths, and sometimes hints or specific coding styles are necessary in order for the query optimizer to find the best execution plan possible.

Understanding how semi-joins and anti-joins work—and how Oracle implements the relevant data access paths—will enable you to write very efficient SQL and dramatically speed up an entire class of queries. This in turn will make your employer (or client) very happy.
标签: 暂无标签
oraunix

写了 199 篇文章,拥有财富 1026,被 339 人关注

转播转播 分享分享 分享淘帖
回复

使用道具

成为第一个吐槽的人

您需要登录后才可以回帖 登录 | 加入社区

本版积分规则

意见
反馈