X

Celebrating the joy and power of Oracle SQL with the Oracle Developer Advocate team

  • March 3, 2020

How to Skip Duplicate Key Errors (ORA-00001) in Oracle Database

Chris Saxon
Developer Advocate

It's the bane of data loads everywhere. Having 99% of a process complete, only for it to fail on the last few rows:

insert into target_table
  select *
  from   massive_table
  where  last_row_is_a_duplicate = 'Y';

ORA-00001: unique constraint (...) violated

This typically aborts the whole process. Rolling everything back.

Meaning you need to clean your data. Then run the whole load again.

Aaaaaargghhh!

This is particularly problematic if the load takes hours to complete.

Fortunately Oracle Database has several methods you can use to skip duplicate rows stop this happening. In this post you'll see how you can do this by:

But first, a quick recap on what primary keys and unique constraints are and why you have them.

What is a Primary Key or Unique Constraint?

Being able to locate one specific row in a table is a fundamental feature of SQL. In fact it's a requirement of first normal form.

Take the username for accounts in your application. To ensure someone has entered the correct password and pull up their details on login you need to find the row for that username. And be sure each username appears at most once in the accounts table.

Declaring the username unique enforces this second property. And raises an error if anyone tries to insert a duplicate name:

create table accounts (
  username varchar2(10)
    not null
    unique,
  given_name varchar2(512)
    not null
);

insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Chris' );
insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Christopher'  );

ORA-00001: unique constraint (CHRIS.SYS_C0018134) violated

This guarantees that each username appears at most once.

A primary key is a special case of a unique constraint. It has these extra properties:

  • All its columns are mandatory (not null)
  • Each table can have at most one primary key.

So how do you choose between using a primary key or unique constraint?

Some tables can have many unique identifiers. A common reason for this is surrogate keys – sequence assigned or GUID values that have no meaning outside of the application. For example you may add an account_id column to the accounts table:

create table accounts (
  account_id integer
    generated as identity
    not null
    primary key,
  username varchar2(10)
    not null
    unique,
  given_name varchar2(512)
    not null
);

The table can only have one primary key. So at least one of account_id and username must be a unique constraint. But which should be the primary key?

The accounts table is likely to have many child tables that refer to a particular row in it. Such as orders, invoices, and payments. To ensure these child rows point to a valid account, create a foreign key from the child table to the parent.

The columns foreign keys point to should be immutable. Unchanging. This avoids problems with having to cascade updates of key values from parent to children. For example, if someone wants to change their username.

Both primary keys and unique constraints can be the target of foreign keys. But because you can only have one primary key, by convention foreign keys point to this constraint.

So by making account_id the primary key, you're telling other developers:

"This is the immutable value. Use this as the target for foreign keys."

Creating unique constraints improves data quality. Usually when someone tries to insert an existing value – such as creating a new account with an existing username – you want to stop this happening.

But sometimes you may want to skip this error. Say when loading keys that already exist.

Skip Duplicate Rows with a Subquery

Image by Pexels from Pixabay

If the source data are likely to contain key values that are already in the table, it's best to avoid loading these completely.

Do this with a not exists subquery:

insert into accounts ( username, given_name )
  select username, given_name
  from   accounts_stage acst
  where  not exists (
    select * from accounts acct
    where  acct.username = acst.username
  );

This will only load usernames from accounts_stage missing from accounts.

This only works with insert ... select. There's no query with insert ... values. So this approach won't work with these.

And what if the source contains duplicates? How do you skip the extra rows there?

Ignore Duplicate Rows with the ignore_row_on_dupkey_index Hint

The simplest method is to add a hint to the query. Added in 11.2, the ignore_row_on_dupkey_index hint silently ignores duplicate values:

insert /*+ ignore_row_on_dupkey_index ( acct ( username ) ) */
  into accounts acct ( username, given_name )
  select username, given_name from accounts_stage;

So if accounts_stage has duplicate usernames, this will add one of the rows and bypass the other.

This also works when using the values version of insert:

insert /*+ ignore_row_on_dupkey_index ( acct ( username ) ) */
  into accounts acct ( username, given_name )
  values ( 'chrissaxon', 'Chris' );

To use the hint, put the table name or alias followed by either:

  • The name of a unique index
  • A comma-separated list of columns in a unique index

Note you can only ignore one constraint in the table. Not an issue if you want to skip extra usernames and have auto-generated primary keys. But a limitation if you have many unique keys on a table.

While adding a hint is easy, it has a couple of other major drawbacks:

  • You don't know which row will be ignored
  • There's nothing to tell you that there were duplicate values

So if you want to know if there were any duplicates, you need to post-process the data to check.

If the source contains identical rows - those where every column has the same value - this may not bother you. Of course, if this case, if you're better off adding distinct in the subquery:

insert into accounts ( username, given_name )
  select distinct username, given_name
  from   accounts_stage;

But if it's only usernames that clash, it's likely you want to know. And find out which rows you skipped. To do this you need to keep which rows failed.

Store Duplicate Values with DML Error Logging

DML error logging enables insert, update, and delete statements store exceptions in another table. Allowing the statement to complete without error. Connor discusses it in this video:

To use it, first you must create the error logging table. By default this creates a table named err$_<tablename>.

Then use the log errors clause in your SQL:

exec dbms_errlog.create_error_log ( 'accounts' );

insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Chris' )
  log errors into err$_accounts reject limit unlimited;

insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Christopher' )
  log errors into err$_accounts reject limit unlimited;

insert into accounts ( username, given_name )
  select username, given_name from accounts_stage
  log errors into err$_accounts reject limit unlimited;

The reject limit defaults to zero. Meaning you'll still get an exception if there are any duplicates. Setting this to unlimited allows you to ignore all ORA-00001 errors, but the insert still succeed!

Of course, if there are lots of errors in the load, this may be a sign of bigger problems. And you want to stop processing. So if you want to abort the load if there's more than e.g. 100 failing rows, set reject limit 100.

After the load is complete, you'll need to query the error logging table to see if there were any issues.

This could be a problem there are lots of loads running at the same time. How do you know which process the errors belong to?

To combat this problem, add a tag to the load. Then use this to fish out the offending data after:

insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Christopher' )
  log errors into err$_accounts ('load name') 
  reject limit unlimited;
  
select * from err$_accounts
where  ora_err_tag$ = 'load name';

All the techniques so far assume that you can safely ignore duplications. At least while running the load transaction.

But sometimes you may need to load all copies. Then post-process the data to keep the "correct" one.

The final method allows you to do this - provided you can do the validation in the same transaction!

Delay Constraint Enforcement with Deferrable Constraints

Photo by Mat Brown from Pexels

You can delay the point the database validates a constraint until commit by declaring it as deferrable:

create table accounts (
  username varchar2(10)
    not null
    unique deferrable initially deferred,
  given_name varchar2(512)
    not null
);

insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Chris' );
insert into accounts ( username, given_name )
  values ( 'chrissaxon', 'Christopher' );

The key difference between this and the previous methods is the target table may contain duplicates during the transaction. The database only raises ORA-00001 when you commit. At which point it rolls back all the rows!

commit;

ORA-02091: transaction rolled back
ORA-00001: unique constraint (CHRIS.SYS_C0018128) violated

select count (*) from accounts;

COUNT(*)   
          0 

So before you end the transaction, you need to find and delete any duplicate rows.

This can be useful if you have a multi-pass load where you run many insert statements. And you can only determine which rows to keep at the end.

Note deferrable keys create a non-unique index to police the constraint. This makes primary key lookups less efficient. It also breaks the ignore_row_on_dupkey_index hint, as this requires a unique index:

insert /*+ ignore_row_on_dupkey_index ( acct ( username ) ) */
  into accounts acct ( username, given_name )
  values ( 'chrissaxon', 'Chris' );
  
ORA-38913: Index specified in the index hint is invalid

HT to Richard Foote suggesting deferrable constraints for pointing out the hint limiation.

Which Approach is Best?

As always, there is an element of subjectivity to "best". And it depends on your goals. To help you decide, answer these questions:

  • Why are you getting duplicate rows?
  • What action you want to take when you try to insert them?

If the source data are clean, but you may be re-inserting existing values then changing your SQL is the way to go. The most straightforward is to add a not exists clause. Or if you want to overwrite the non-key columns with the latest data, use merge.

But when you're loading messy data, it's possible you'll have more problems than just ORA-00001 errors. Such as trying to put null into not null columns. In which case, DML error logging is the way to go, as this will capture all the issues.

For the most part, the ignore_row_on_dupkey_index hint should be a last resort. Handy if you need a quick fix. But definitely a kludgy workaround!

Similarly, deferrable primary key or unique constraints are probably a sign that you need to redesign your process. Also you can't make an existing constraint deferrable. You need to drop and recreate the constraint – and almost certainly its index too – to do this. Making it impractical to add this trick to most existing tables. But in cases where you have to load copies, then clean them after it may be the only viable method.

Join the discussion

Comments ( 2 )
  • SILVIO COMMODARO JUNIOR Monday, April 27, 2020
    Very interesting. I hadn´t known about the hint.
  • venkatesh puram Monday, August 31, 2020
    Thank you for such a excellent explanation with examples...i loved it.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.