The optimizer eliminates tables that are redundant due to primary key-foreign key constraints. Consider the following example tables:
create table jobs ( job_id NUMBER PRIMARY KEY, job_title VARCHAR2(35) NOT NULL, min_salary NUMBER, max_salary NUMBER );
create table departments ( department_id NUMBER PRIMARY KEY, department_name VARCHAR2(50)); create table employees ( employee_id NUMBER PRIMARY KEY, employee_name VARCHAR2(50), department_id NUMBER REFERENCES departments(department_id), job_id NUMBER REFERENCES jobs(job_id));
and the query:
select e.employee_name from employees e, departments d where e.department_id = d.department_id;
In this query, the join to departments is redundant. The only column referenced in the query appears in the join predicate, and the primary key-foreign key constraint guarantees that there is at most one match in departments for each row in employees. Hence, the query is equivalent to:
select e.employee_name from employees e where e.department_id is not null;
The optimizer will generate this plan for the query:
------------------------------------------- Id Operation Name ------------------------------------------- 0 SELECT STATEMENT * 1 TABLE ACCESS FULL EMPLOYEES ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("E"."DEPARTMENT_ID" IS NOT NULL)Note that the IS NOT NULL predicate is not necessary if the column has a NOT NULL constraint on it. The optimizer will also eliminate tables that are semi-joined or anti-joined. Consider the following query:
select e.employee_id, e.employee_name from employees e where not exists (select 1 from jobs j where j.job_id = e.job_id);
Since employees.job_id is a foreign key to jobs.job_id, any non-null value in employees.job_id must have a match in jobs. So only employees with null values for employees.job_id will appear in the result. Hence, this query is equivalent to:
select e.employee_id, e.employee_name from employees e where job_id is null;
and the optimizer can choose this plan:
------------------------------------------- Id Operation Name ------------------------------------------- 0 SELECT STATEMENT * 1 TABLE ACCESS FULL EMPLOYEES ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("E"."JOB_ID" IS NULL)
Suppose employees.job_id has a NOT NULL constraint:
alter table employees modify job_id not null;
In this case, there could not possibly be any rows in EMPLOYEES, and the optimizer could choose this plan:
------------------------------------------- Id Operation Name ------------------------------------------- 0 SELECT STATEMENT * 1 FILTER 2 TABLE ACCESS FULL EMPLOYEES ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(NULL IS NOT NULL)
The "NULL IS NOT NULL" filter is a false constant predicate, that will prevent the table scan from even taking place.
The optimization is available for ANSI compliant joins. For this query:
select employee_name from employees e inner join jobs j on e.job_id = j.job_id;
...the optimizer can eliminate JOBS and produce this plan:
------------------------------------------- Id Operation Name ------------------------------------------- 0 SELECT STATEMENT 1 TABLE ACCESS FULL EMPLOYEES -------------------------------------------
Table elimination is available for outer joins, which does not require PK-FK constraints. For the example, we require a new table and an addition to EMPLOYEES:
create table projects ( project_id NUMBER UNIQUE, deadline DATE, priority NUMBER ); alter table employees add project_id number;
Now consider a query that outer joins employees and projects:
select e.employee_name, e.project_id from employees e, projects p where e.project_id = p.project_id (+);
The outer join guarantees that every row in employees will appear at least once in the result. The unique constraint on projects.project_id guarantees that every row in employees will match at most one row in projects. Together, these two properties guarantee that every row in employees will appear in the result exactly once. Since no other columns from projects are referenced, projects can be eliminated, and the optimizer can choose this plan:
------------------------------------------- Id Operation Name ------------------------------------------- 0 SELECT STATEMENT 1 TABLE ACCESS FULL EMPLOYEES -------------------------------------------
All of the example queries in this post are very simple, and one would be unlikely to write a query where the join is so obviously unnecessary. There are many real world scenarios where table elimination may be helpful, including machine-generated queries and elimination of tables in views. For example, a set of tables might be exposed as a view, which contains a join. The join may be necessary to retrieve all of the columns exposed by the view. But some users of the view may only access a subset of the columns, and in this case, the joined table can be eliminated.
For example, consider the view:
create view employee_directory_v as select e.employee_name, d.department_name, j.job_title from employees e, departments d, jobs j where e.department_id = d.department_id and e.job_id = j.job_id;
This view might be exposed to a simple employee directory application. To lookup employee names by job title, the application issues a query:
select employee_name from employee_directory_v where department_name = 'ACCOUNTING';
Since the job_title column is not referenced, jobs can be eliminated from the query, and the optimizer can choose this plan:
-------------------------------------------- Id Operation Name -------------------------------------------- 0 SELECT STATEMENT * 1 HASH JOIN 2 TABLE ACCESS FULL EMPLOYEES * 3 TABLE ACCESS FULL DEPARTMENTS -------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID") 3 - filter("D"."DEPARTMENT_NAME"='ACCOUNTING')
Note that prior to Oracle Database 12c Release 2, multi-column primary key-foreign key constraints were not supported.
The WHERE clause in the last example should read,
WHERE department_name='ACCOUNTING';
Thanks Ted! Have made the correction in the text.
Cheers,
Maria
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1277688700346608142#9535983800346671794