X
  • October 14, 2014

Questions that make you go Hmmmm..... #3 Why 3NF?

Guest Author
I was reading my various Oracle forums the other day. This question was posed:
"I have a table columns like id,password,age,gender and interests. now I need to insert multiple interests to single user.please help me..thank you in advance. "
I was sickened that the people who first answered the question were suggesting things like this:
"My first thought would be to store the interests in comma-delimited format."
There were several suggestions along this line (including an interesting suggestion on using XML and Oracle's object oriented features). Honestly, I was just taken aback at the total lack of understanding on 3NF and why you develop a database using 3NF. Why not use object oriented features? There are a lot of reasons some of which I describe here:
=====================================================================
Assume this set of inserts:
INSERT INTO nested_table VALUES (1, my_tab_t('A'));
INSERT INTO nested_table VALUES (2, my_tab_t('B', 'C'));
INSERT INTO nested_table VALUES (3, my_tab_t('c', 'a', 'F'));
COMMIT;
Tell me which a or A or c or C is valid and represents the source of truth. If C changes to Z does c also change to z or to Z or does it stay c? How do you know if all C represent all other C and if it does how in tarnation are you going to manage the sweeping change of C and c to Z? Application logic? Are you going to ensure that everyone else who manipulates this data in the future will follow these rules and how much code and time will it take? How are you going to validate these values? Application code? What happens when a Z code is added? What is the cost and effort required to add that new code to the application? This is called a rat hole. As you scale to millions of records and as new code comes into play that breaks these already tenuous relationships you end up with CRAP data - Completely Ridiculous Absolutely Poor data.
======================================================================
This isn't the only reason to avoid such designs of course. This is not the purpose of this post though.
In this blog post, I want to provide what I think is the correct answer using 3NF...
First of all, we have requirements stated in the OP and some I've added using best practices....
1. We need to record the following information about a person:

a. A unique identifier (primary key)

b. Last_name (I added this requirement)

c. First_name (again, I added it)

d. Age - I'm changing this to date of birth as it's a more accurate data point.

e. Gender - Seems self evident. Unless someone has a sex change.

f. Interest - one person may have many interests and a given interest may be had by more than one person.


Note the change from age to date of birth. This is one of the HUGE benefits of normalization. You discover business rules during the process. The discipline of modeling will cause you to ask questions as you develop the model that would not appear if you are just flying by the seat of your pants.
3NF and logical modeling in particular is more than a way to get to a physical model. It's a way to ferret out business rules and discover relationships, rules and data needs that you might have not thought about. This is one of the reasons that logical modeling is so important. If it's done right, logical modeling really defines the nature of data, and it also surfaces many things (like relationships) that no one thought about. Logical modeling is rarely done but it's really important.
So, we need to define entities. These are logical constructs NOT the physical manifestation of the requirements (yet). After consideration, our entities are:
person
gender
interest
Note that these are named using a singular noun phrase. This is the accepted way of naming an entity. The name should be meaningful and might include acronyms and can be shortened but it needs to make sense. If the name is to long (over 30 characters in most cases) then shorten first by removing vowels. For example people would be shortened to ppl (but obvioiusly you would not do this since people is < 30 characters.
We would also express the relationships of these entities. For example:
person has a 1:many relationship with interest.
Interest has a many:1 relationship with person
Person has a 1:1 relationship with gender
gender has a 1:many relationship with person.
In the resulting physical model we will define and enforce these relationships with foreign keys (usually - some complex relationships might require a bit more work). Not creating foreign keys is not acceptable in any model. Tom Kyte has talked about this quite often - for example here:
http://www.oracle.com/technetwork/issue-archive/2009/09-may/o39asktom-096149.html
Enforcing these relationships in application code is really a common but really bad practice. This isn't the post to discuss that issue though. Let's fast forward now to our physical database design. We have decided that we need these tables:
drop table people cascade constraints;
drop table gender cascade constraints;
drop table interest cascade constraints;
drop table assoc_person_interest;
create table people
(people_id number primary key, last_name varchar2(30), first_name varchar2(30),
gender_code number);
create table gender
(gender_id number primary key, gender_name varchar2(30) );
create table interest
(interest_id number primary key, interest_name varchar2(30) );
Notice that we have created primary keys in both people and gender.
Notice that we have a many to many relationship between people and interest. This is because a given person might have none, one or many interests. How we deal with this issue is really the crux of the question that the poster had.
The suggestion to create a single column and put in the data as a comma delimited list of values comes with many problems. How do you ensure that the data is consistent? For example, we might enter flying and FLYING as the interest for two different people. What if we want to change the definition of what an interest is - say from PILOTAGE to FLYING? Querying comma delimited data is also problematic, will not perform and will not scale.
So, what do we do?
We create another table. People have different names for these tables, Some call them
intersection tables, or join tables or associative tables. Whatever you want to call them,
their purpose is to associate two tables in many to many relationships.
In our case, our associative table will look like this:
create table assoc_person_interest
(people_id number, interest_id number);
alter table assoc_person_interest
add constraint fk_people
foreign key (people_id) references people (people_id);
alter table assoc_person_interest
add constraint fk_interest
foreign key (interest_id) references interest (interest_id);
(I deliberately left out things like cascading deletes at this point just in case you wondered).
Now, a given person can have more than one interest, but this list of interests are not recorded in the person table. Using a join table gives us a great amount of flexibility, allowing us to define any number of interests that a person might have. The foreign key relationships prevent us from assigning the same interest to a given user more than once, which is a pretty good idea since it will save us space and duplication of data (which is a bad thing).
So, now we have our model built. Will it work? If so, how do we use it?
We will need to follow the foreign key constraints. First we need to load our codeset
tables:
delete interest;
insert into interest values (1,'HOCKEY');
insert into interest values (2,'FLYING');
insert into interest values (3,'ART');
insert into interest values (4,'DRAWING');
commit;
=======================================
SQL> delete interest;
0 rows deleted.
SQL> insert into interest values (1,'HOCKEY');
1 row created.
SQL> insert into interest values (2,'FLYING');
1 row created.
SQL> insert into interest values (3,'ART');
1 row created.
SQL> insert into interest values (4,'DRAWING');
1 row created.
SQL> commit;
Commit complete.
SQL>
===================================
Now onto gender
delete gender;
insert into gender values (1, 'MALE');
insert into gender values (2, 'FEMALE');
insert into gender values (3, 'UNSURE');
commit;
====================================
SQL> insert into gender values (1, 'MALE');
1 row created.
SQL> insert into gender values (2, 'FEMALE');
1 row created.
SQL> insert into gender values (3, 'UNSURE');
1 row created.
SQL>
SQL> commit;
Commit complete.
=============================================
Now - we are ready to begin adding person records.
Note that the business rules do not demand that a person must have an interest.
So we have modeled this as an optional relationship. If there was a mandatory rule then we would have to add some additional code to manage that.
Let's enter a person record along with their interests...
-- enter information on a person.
insert into people values (1, 'FREEMAN', 'ROBERT', 1);
-- enter my interests
insert into assoc_person_interest values (1, 2);
commit;
===================
SQL> insert into people values (1, 'FREEMAN', 'ROBERT', 1);
1 row created.
SQL> commit;
SQL> insert into assoc_person_interest values (1, 2);
1 row created.
SQL> commit;
Commit complete.
======================================
If we try to insert a record that is invalid:
-- enter information on a person.
insert into people values (2, 'FREEMAN', 'CARRIE', 1);
-- enter my interests
insert into assoc_person_interest values (2, 6);
commit;
-- insert a valid interest for an invalid person
insert into assoc_person_interest values (9, 2);
commit;
-- Note that the first insert will be successful but the second one will fail.
-- unless we rollback the transaction.
===========================================
SQL> insert into people values (2, 'FREEMAN', 'CARRIE', 1);
1 row created.
SQL> insert into assoc_person_interest values (2, 6);
insert into assoc_person_interest values (2, 6)
*
ERROR at line 1:
ORA-02291: integrity constraint (ROBERT.FK_INTEREST) violated - parent key not
found
SQL> commit;
Commit complete.
=============================================
So, we can't assign an invalid interest. Also we can not add an invalid person.
Notice though how the transaction is not rolled back with the second failure. The
commit saves the first insert.
To query these tables and display the related records is easy as seen here:
SQL> l
1 select a.last_name, a.first_name, c.interest_name
2 from people a, assoc_person_interest b, interest c
3 where a.people_id=b.people_id
4* and b.interest_id=c.interest_id
SQL> /
LAST_NAME FIRST_NAME INTEREST_NAME
---------- ---------- ------------------------------
FREEMAN ROBERT FLYING
FREEMAN CARRIE DRAWING
FREEMAN ROBERT DRAWING
Of course there might be other issues to deal with like people who do not have any interests, in which case we will need to use outer joins like this:
Let's add another person without any interests:
SQL> insert into people values (3, 'HANKS','DULE''',1);
SQL> commit;
Now - what is the result of our query?
SQL> l
1 select a.last_name, a.first_name, c.interest_name
2 from people a, assoc_person_interest b, interest c
3 where a.people_id=b.people_id
4* and b.interest_id=c.interest_id
SQL> /
LAST_NAME FIRST_NAME INTEREST_NAME
---------- ---------- ------------------------------
FREEMAN ROBERT FLYING
FREEMAN CARRIE DRAWING
FREEMAN ROBERT DRAWING
Whoops, where is Mr. Hanks? We need to change our query just a bit to surface him:
1 select a.last_name, a.first_name, c.interest_name
2 from people a, assoc_person_interest b, interest c
3 where a.people_id=b.people_id (+)
4* and b.interest_id =c.interest_id (+)
SQL> /
LAST_NAME FIRST_NAME INTEREST_NAME
---------- ---------- ------------------------------
FREEMAN ROBERT FLYING
FREEMAN ROBERT DRAWING
FREEMAN CARRIE DRAWING
HANKS DULE'
Here we have used an outer join. There are different ways to do an outer join... I've used the old Oracle (+) syntax.
So - there is a quick and dirty answer to the question - how do I store multiple data values for a given person.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.