In the Protecting People's Privacy story we faced a problem. We needed to ensure that we were not sending emails to or storing personal details about people who had opted out. This is a mutually exclusive relationship. To ensure we met this requirement we needed to do two things:
- Enforce the mutually exclusive relationship using database constraints.
- Exclude opted-out email addresses from all queries generating email lists.
There is no simple answer for how to do this. In the story we looked at four different methods of implementing the requirements:
- Store all of the details in one big table. Use a check constraint to validate that the appropriate columns are null when the opt out flag is set.
- Split the demographic and opt out information into two tables. Police the fact that an email could only appear in one table via a check constraint on a materialized view (MV).
- Split the demographic and opt out information into two tables with a master table storing all the email addresses. Ensure that the child tables have no overlapping entries either by:
- Placing a trigger on the demographic and opt out tables to maintain the master list.
- Adding an email type column to all tables and using referential integrity to ensure the tables have no duplicate entries.
Each of these approaches comes with their own advantages and disadvantages. Let's look at them.
One table with a check constraint
This is the most obvious solution. It's simple to implement - just add an opted out column to the demographic table. Then create a check constraint to ensure that the other columns are null when this is set.
This approach has a number of drawbacks. The biggest of these is ensuring that all queries have the correct where clause. Without this you can accidentally generate mailing lists including opted out emails.
There some steps you can take to mitigate this.
If you're on Oracle Enterprise Edition you can use Virtual Private Database. This can add the where clause automatically, based on a policy. This ensures queries only return opted-in or out emails as appropriate. Correctly implementing this can be complex.
An easier and more widely available option is to create two views. One with the demographic columns where the opt out flag isn't set. The other showing emails where the opt out is set. Provided no one has direct access to the underlying table this can work well. Developers must still take care to ensure they use the views rather than the base table.
This method also has system maintenance implications. You must update the check constraint when adding new columns to the table. Developers can easily overlook this without good documentation and governance.
Provided you have good development processes this technique can work.
Two tables supported by a MV
The major advantage of storing the emails separately is people are unlikely to generate a list including opted-in and opted-out email addresses. To do this you have to union the two tables together. This requires conscious effort.
The downside is Oracle doesn't have a statement to create a unique key across two tables. Therefore you must implement your own solution to validate the rule. One way to do this is to create a MV. This uses a query checking that the join of the two tables returns no rows. Provided you set it refresh on commit, a check constraint on the MV can enforce the relationship.
This solution comes with a number of caveats however.
The key issue for people with highly concurrent environments is performance. Introducing the MV adds overhead when committing transactions including the tables used in the MV. Oracle serializes this refresh process - i.e. only one session at a time can update the MV. This can become a serious bottleneck in applications where many users are triggering refreshes at the same time.
Laurent Schneider raised another important issue. Fast refreshable materialized views are brittle. There are a number of actions, such as truncating dependent tables, which can make them unusable. Oracle doesn’t validate constraints on MVs when this happens. This is a silent failure. Therefore you can have unexpected situations where the constraint is not enforced.
The final three considerations regard the maintainability of a system using MVs. Oracle only validates MV constraints on refresh. If you have transactions with many DML operations it may not be clear which statement lead to the exception. This makes it harder to debug the application.
Constraints on MVs also increase the learning curve for people new to the system. The logic applying the check is effectively separated from the dependent tables. This can confuse developers as to the cause of an exception.
The upside is you can add new columns to the demographic and opt out tables without needing to update the constraints.
Provided your rate of concurrent transactions is low and you've guarded against actions that invalidate MVs this approach can work well. If either of these are not the case then you're better off using a different technique.
Two tables supported by a master table
This method has two sub implementations. One relies on triggers to maintain the master table. The second adds an email type column to all tables. The type columns enable you to enforce the rule with foreign keys, check and unique constraints.
As with the MV approach, the clear benefit of these are reduced risk that people will accidentally generate a list including opted in and opted out addresses. The presence of the master table does reintroduce this risk however - particularly in the trigger only solution where the master table has no context to the email addresses. The best way to mitigate this is to ensure that no end users or applications have select privileges on the master table.
Both of these approaches perform better than using a MV to enforce the constraint. This is particularly noticeable in highly concurrent environments.
They also allow you to add new columns to tables without having to modify any constraints. The same applies if you need add more tables to the distributed key (provided you use lookup table on the master table instead of a check constraint if using types!)
The main drawback of relying on the triggers are the don't validate existing data. For new development this isn't an issue. If you're applying this technique to an existing system you must manually validate the data once the triggers are in place.
The downside is regards system maintainability. It's not clear just by looking at the triggers that they're used to enforce the distributed key. To overcome this ensure you include descriptive comments in the triggers explaining why they exist.
Using email type
This is a convoluted technique to implement. To do so, you need to:
- Add a type columns to all the tables
- Create an extra unique constraint on the master table over (email, type)
- Ensure that the type column in the child tables is a constant value
- Place foreign keys on the child tables back to the master using (email, type)
This leads to several "redundant" columns and constraints. On 11g and higher you can reduce storage overhead by using virtual columns for the type columns on the child tables. If you're on 12c, you can also make these invisible so they are automatically excluded from insert statements, select stars and so on. This can make the system harder to learn however. Whether this possible saving in coding effort offsets the learning curve is decision you'll have to make!
The upside is you can infer the rule just by looking at the table constraints (particularly when you're familiar with the pattern).
It's also easy to add attributes common to both demographics and opt outs to the master table, e.g. insert date. This method is also a way to implement entity subtype relationships.
If you'd like to test these techniques out for yourself use this script.
You may be thinking that the email example is a niche problem and these implementations aren't relevant to you. Mutually exclusive relationships come in many forms. It's worth keeping these in your toolbox. Here's some examples:
- A competition is open to everyone except employees.
- A person can't serve time in a prison and be employed as a guard at the same time
- A bank account can only be one of saving, checking, loan, etc.
- A social media "event" can be a like, follow, share, …
If you want scripts to try these approaches out yourself to see how they compare, use these scripts:
I also want to say thanks to SalimWer, whose question on OTN provided the initial idea for the story and Karthick_Arp and Stew Ashton for their answers that are the basis of some solutions. Any other link to real people or events is purely coincidental!
Overall using a master table with type columns is my favourite approach.
Which method do you like best to implement a mutually exclusive relationship in Oracle?
Are there any other or better ways to implement the requirements?
Let us know in the comments!