Featured

Database Normalization: A Step-by-Step Guide to First, Second, and Third Normal Forms

Normalization is actually not a component of relational databases, but it’s so common and prevalent that it’s as inseparable from relational databases as their components. Therefore, it makes sense to talk about it.

 

Normalization is a development inquiry through standardized steps involving best practices of relational data organization. Once we go through these steps and modify our data structures accordingly, we can expect to end up with an improved blueprint.

 

Senior developers can usually design normalized data structures on their first attempts—that comes with years of practice and experience. However, it’s good to have a guideline for novice developers to check their structures against!

 

Now that we know about tables and their relationships, the best way to understand normalization is to see it in action. The table below contains our starting point. It’s a table containing a company’s drivers, their managers, and the plate numbers of cars they’re allowed to drive.

 

Initial Look of Driver Data

 

This is what a basic spreadsheet page of drivers could look like. Can you spot the problems on this table already? If not, keep reading!

 

First Normal Form

According to the first normal form (1NF), tables should have only two dimensions. We should therefore get rid of repeating columns that cause multidimensionality in the table.

 

Our initial table features such a case: car_1, car_2, and car_3 are repeating columns. What we need to do in 1NF is to break it down into a flat structure, listing each car in a different row. The table below shows the end result.

 

First Normal Form Applied to Driver Data

 

As you can see, our table is two dimensional now. It looks nicer and cleaner, but not nice and clean enough! Let’s try to improve our data structure further.

 

Second Normal Form

According to the second normal form (2NF), redundant data should be eliminated. We need to break columns causing redundancy into distinct tables.

 

In our example, the car column has redundant data because we had to duplicate driver master data for each car the drivers can drive. Instead of having single rows for drivers 1022 and 1023, we have three of each, making six in total.

 

This table showcases the table we need. We ought to store driver-car assignments in a separate table.

 

Driver-Car Assignments after Second Normal Form

 

Our driver master table is free of cars now and contains only one row per driver without any redundancies, as shown here.

 

Driver Master Data After Second Normal Form

 

Now, we’re almost there. Just one more step!

 

Third Normal Form

According to the third normal form (3NF), we ought to eliminate columns that don’t belong to the table PK.

 

If we look at the above table closely, it’s is supposed to be a driver table, but we see manager data there, such as their names and extension numbers. That’s not good. Drivers should have their own table, and managers should have their own table. Let’s start by giving managers a dedicated table, as shown here.

 

Manager Master Data after Third Normal Form

 

Now that we have a distinct manager table, we can simply refer to it as a FK. This table shows the final driver data form.

 

Driver Master Data after Third Normal Form

 

Results

By going through the minimum steps of normalization, we’ve arrived at the ERD shown in the following figure, which is free from redundancies and potential conflicts!

 

Final ERD of Normalized Tables

 

We can go even further and create a distinct master data table for cars, but let’s stop here.

 

Mind you that having a normalized data structure is not a luxury—it’s one of the fundamental expectations from a decent developer, and for good reason too! Normalized data structures feature many benefits, such as the following:

  • Elimination of data redundancy
  • Improvement of data integrity
  • Enhanced query performance
  • Simplified database maintenance
  • Improved data relationships 

Further Normal Forms

1NF, 2NF and, 3NF are the minimum steps of normalization. Once our tables align with those forms, their structure will provide a good degree of redundancy prevention while maintaining manageable complexity.

 

There are further forms, such as the Boyce-Codd normal form (BCNF), the fourth normal form (4NF) and the fifth normal form (5NF). Since this post is based on SQL, we won’t delve deeper into those topics. Understanding the basics of normalization is enough for us, but feel free to do more research if you want to!

 

Editor’s note: This post has been adapted from a section of the book SQL: The Practical Guide by Kerem Koseoglu. Dr. Koseoglu is a seasoned software engineer, author, and educator with extensive experience in global software development projects. In addition to his expertise in ABAP, he is also proficient in database-driven development using Python and Swift. He is the author of Design Patterns in ABAP Objects (SAP PRESS), as well as several best-selling technical books in Turkey. He has a Ph.D. in organizational behavior.

Recommendation

SQL: The Practical Guide
SQL: The Practical Guide

Whether you’re managing large datasets or writing your first query, build your SQL skills with this practical guide! Learn the basic elements of relational databases and walk through the syntax, features, and applications of SQL. Then dive into the sublanguages of SQL and apply them for tasks such as modifying tables and querying data. See SQL in action with an example PostgreSQL database and detailed code samples from real-world applications!

Learn More
Rheinwerk Computing
by Rheinwerk Computing

Rheinwerk Computing is an imprint of Rheinwerk Publishing and publishes books by leading experts in the fields of programming, administration, security, analytics, and more.

Comments