Normalization is actually not a component of relational databases, but it’s so common and prevalent that it’s as inseparable from relational databases as their components. Therefore, it makes sense to talk about it.
Normalization is a development inquiry through standardized steps involving best practices of relational data organization. Once we go through these steps and modify our data structures accordingly, we can expect to end up with an improved blueprint.
Senior developers can usually design normalized data structures on their first attempts—that comes with years of practice and experience. However, it’s good to have a guideline for novice developers to check their structures against!
Now that we know about tables and their relationships, the best way to understand normalization is to see it in action. The table below contains our starting point. It’s a table containing a company’s drivers, their managers, and the plate numbers of cars they’re allowed to drive.
This is what a basic spreadsheet page of drivers could look like. Can you spot the problems on this table already? If not, keep reading!
According to the first normal form (1NF), tables should have only two dimensions. We should therefore get rid of repeating columns that cause multidimensionality in the table.
Our initial table features such a case: car_1, car_2, and car_3 are repeating columns. What we need to do in 1NF is to break it down into a flat structure, listing each car in a different row. The table below shows the end result.
As you can see, our table is two dimensional now. It looks nicer and cleaner, but not nice and clean enough! Let’s try to improve our data structure further.
According to the second normal form (2NF), redundant data should be eliminated. We need to break columns causing redundancy into distinct tables.
In our example, the car column has redundant data because we had to duplicate driver master data for each car the drivers can drive. Instead of having single rows for drivers 1022 and 1023, we have three of each, making six in total.
This table showcases the table we need. We ought to store driver-car assignments in a separate table.
Our driver master table is free of cars now and contains only one row per driver without any redundancies, as shown here.
Now, we’re almost there. Just one more step!
According to the third normal form (3NF), we ought to eliminate columns that don’t belong to the table PK.
If we look at the above table closely, it’s is supposed to be a driver table, but we see manager data there, such as their names and extension numbers. That’s not good. Drivers should have their own table, and managers should have their own table. Let’s start by giving managers a dedicated table, as shown here.
Now that we have a distinct manager table, we can simply refer to it as a FK. This table shows the final driver data form.
By going through the minimum steps of normalization, we’ve arrived at the ERD shown in the following figure, which is free from redundancies and potential conflicts!
We can go even further and create a distinct master data table for cars, but let’s stop here.
Mind you that having a normalized data structure is not a luxury—it’s one of the fundamental expectations from a decent developer, and for good reason too! Normalized data structures feature many benefits, such as the following:
1NF, 2NF and, 3NF are the minimum steps of normalization. Once our tables align with those forms, their structure will provide a good degree of redundancy prevention while maintaining manageable complexity.
There are further forms, such as the Boyce-Codd normal form (BCNF), the fourth normal form (4NF) and the fifth normal form (5NF). Since this post is based on SQL, we won’t delve deeper into those topics. Understanding the basics of normalization is enough for us, but feel free to do more research if you want to!
Editor’s note: This post has been adapted from a section of the book SQL: The Practical Guide by Kerem Koseoglu. Dr. Koseoglu is a seasoned software engineer, author, and educator with extensive experience in global software development projects. In addition to his expertise in ABAP, he is also proficient in database-driven development using Python and Swift. He is the author of Design Patterns in ABAP Objects (SAP PRESS), as well as several best-selling technical books in Turkey. He has a Ph.D. in organizational behavior.