Taking a step back ... Database Normalisation 1NF, 2NF, 3NF, BCNF and 4NF (Part 1)
I am not as passionate about databases as I am about many, many other things, but database normalisation has kept me interested for some time. I was recently faced with explaining database normalisation from chaos to X-Normal Form (NF), so let me try and summarise my understanding for the benefit of others who might be wondering. Part 1 looks at chaos to 3NF, while part 2 looks at Higher-Level Normal Forms.
---------------------------------------------------------------------------------
Initial hypothetical data
The shown data is structured in a typical report view, which unfortunately presents a few challenges:
- AIRCRAFT_NUMBER is probably intended as a primary key (PK), but contains nulls.
- Data inconsistencies and anomalies are invited with open arms.
- Should mission class be bombing run, bomb run ... are these the same and captured incorrectly?
- Update anomalies ... if we update the PILOT_NUMBER for Alexander, we are potentially required to update a number of rows.
- Insertion anomalies ... to insert a new pilot, we have to create a phantom plane if pilot is not assigned to any aircraft yet.
- Deletion anomalies ... assume a pilot retires and we need to delete his information. Multiple roes may need to be deleted and we would loose aircraft information as well.
While the above view works for reports and day-day spreadsheets, we need to consider the relational database environment to overcome the inconsistencies and anomalies. We will briefly cover data normalisation, using the table above as a starting point.
---------------------------------------------------------------------------------
First Normal Form (1NF)
Characteristics
- Table Format
- No repeating groups
- Primary key (PK) identified
Steps
- Eliminate repeating groups
- Present data in a tabular format, where each cell has a single value and there are no repeating groups.
- Identify the Primary Key (PK)
- At a first glance the AIRCRAFT_NUMBER seems a good candidate for a PK, but would not uniquely identify all of the remaining row attributes.
- The combination of AIRCRAFT_NUMBER and PILOT_NUMBER is a PK candidate that will uniquely identify all row attributes.
- Identify all dependencies
- AIRCRAFT_NUMBER, PILOT_NUMBER --> AIRCRAFT_NAME, PILOT_NAME, MISSION_CLASS, FLYING_HOUR, COST_HOUR
- Primary Key (PK) dependency. The PK is also a composite key.
- AIRCRAFT_NUMBER --> AIRCRAFT_NAME
- Partial dependency ... aircraft name is only dependent on a part of the composite AIRCRAFT_NUMBER, PILOT_NUMBER key.
- PILOT_NUMBER --> PILOT_NAME
- Partial dependency ... pilot name is only dependent on a part of the composite AIRCRAFT_NUMBER, PILOT_NUMBER key.
- PILOT_NUMBER --> PILOT_NAME, FLYING_HOUR, COST_HOUR
- MISSION_CLASS --> COST_HOUR
- Transitive dependency .... COST_HOUR non-prime/non-key attribute is dependent on non-prime/non-key MISSION_CLASS attribute
- Look at the revised table view
- Create a dependency diagram
The "good" dependencies are shown above the dependency diagram and the problematic ones below the diagram.
---------------------------------------------------------------------------------
Second Normal Form (2NF)
Characteristics
- 1NF
- No partial dependencies
Steps
- Separate key components
- AIRCRAFT_NUMBER
- PILOT_NUMBER
- AIRCRAFT_NUMBER PILOT_NUMBER
- Assign dependent attributes to each key component
- AIRCRAFT ( AIRCRAFT_NUMBER, AIRCRAFT_NAME )
- PILOT ( PILOT_NUMBER, PILOT_NAME, MISSION_CLASS, COST_HOUR )
- FLIGHT ( AIRCRAFT_NUMBER, PILOT_NUMBER, FLYING_HOUR )
- Revise dependency diagram
- As we will notice, all partial dependencies have been sent to outer space.
---------------------------------------------------------------------------------
Third Normal Form (3NF)
Characteristics
- 2NF
- No transitive dependencies
Steps
- Identify new determinants
- Identify dependent attributes
- MISSION_CLASS --> COST_HOUR
- Remove dependent attributes from transitive dependencies
- PILOT ( PILOT_NUMBER, PILOT_NAME, MISSION_CLASS, COST_HOUR ) becomes
PILOT ( PILOT_NUMBER, PILOT_NAME ) - MISSION ( MISSION_CLASS, FLYING_HOURS )
- Revise dependency diagram
- View the final 3NF database
- SQL Server Diagram View
- Crowfeet Diagram View
---------------------------------------------------------------------------------
In summary moving from 1NF to 3NF:
Right, now the DB gurus can jump in ...