The Ultimate Guide to Normalization in DBMS: From 1NF to BCNF

Mitaksh Jain

7 min read

Last Updated: 30 September 2024

| Published On: 30 September 2024

What is Normalization?
1NF: First Normal Form
2NF: Second Normal Form
3NF: Third Normal Form
BCNF: Boyce-Codd Normal Form
Level up Your Skills With Emeritus

View All

Imagine stacking up books in a library at random instead of their genre. It may be convenient initially, but you will struggle to find a specific title later. You will then go through the shelves to find that you have stacked the same title twice or forgot an important title. This is what happens to databases without normalization. Now, most Database Management Systems (DBMS) use Structured Query Language (SQL) to store and manage data simultaneously. The role of normalization in DBMS is crucial to prevent data anomalies and improve query performance.

Every aspiring developer must know how to leverage normalization in DBMS if they aim to succeed in the field. So, let’s dive deep into the concept, explore its various forms, and why it is critical for creating scalable and reliable databases.

What is Normalization?

It is important to understand the definition of normalization in DBMS in detail before we learn its various forms. SQL is, without reservation, the backbone of many database systems, making it one of the most popular programming languages, with 51% of developers (1) using it worldwide.

However, these systems need a proper design to prevent databases from becoming inefficient and riddled with errors. Cue normalization, a database design technique that organizes data into tables and columns to minimize redundancy and ensure data integrity.

The concept involves breaking down large, complex tables into smaller, more manageable ones while maintaining relationships between them. Here are a few objectives of normalization in DBMS:

1. Eliminate Redundancy: Normalization reduces storage space and the risk of inconsistencies because it avoids duplicating data.

2. Improve Data Integrity: It helps prevent anomalies such as insertion, deletion, and update anomalies, ensuring that data is accurate and consistent.

3. Enhance Query Performance: Normalized databases are generally efficient for querying and updating data.

4. Facilitate Database Maintenance: A normalized database is easier to maintain and modify over time than one with no normalization.

Normalization chiefly involves a series of steps, each known as a “normal form”. The most common normal forms are as follows:

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)

ALSO READ: What is Programming and How to Code Your Future Career With It

1NF: First Normal Form

First Normal Form (1NF) is certainly the most basic level of normalization in DBMS. What’s more, it stipulates that every attribute in a table must be atomic. In other words, each column in a table should contain a single value, not a group of values. A table is in 1NF if it meets the following criteria:

1. Key Characteristics

A. Atomic Values

Every field must hold only one piece of data. For example, you cannot have multiple phone numbers in one field. Each phone number must be in its row.

B. Unique Rows

Another key aspect is that each row should be distinct; no two rows should have the same data. The table must also have a primary key to ensure this uniqueness.

C. No Repeating Groups

There cannot be repeated datasets within a table. A customer’s multiple orders should appear in a separate row instead of a single field.

2. Violations of 1NF

Sr. No.	Student Name	Subjects
1	Anjali	Maths, Science
2	Rehman	English, History, Maths

Consider the table above. The subject column holds multiple values (Maths, Science for Anjali and English, History, Maths for Rahul). This violates the rule of atomicity because each field should contain a single value. As a result, you have to break down the multi-valued field into separate rows.

Sr. No.	Student Name	Subject
1	Anjali	Maths
1	Anjali	Science
2	Rehman	English
2	Rehman	History

The aforementioned table is in the 1NF, with each cell containing a single value besides no repeating groups.

ALSO WATCH: Importance of Analytics in Digital Business by Pawan Kumar, Head Data Science & Analytics

2NF: Second Normal Form

The second normal form states that every non-key attribute in a relation should significantly depend on the primary key. In short, no part of the primary key should be able to determine a non-key attribute without the other parts. Moreover, a table in 2NF must fulfill all the requirements of 1NF.

1. Characteristics of 2NF

A. Full Dependency

All non-key attributes must depend on the overall primary key, not just a portion of it.

B. No Partial Dependency

It is equally important to have no partial dependencies between non-key attributes and parts of the primary key.

2. Violations of 2NF

Consider a table storing order details to illustrate 2NF:

Order ID	Product ID	Customer ID	Product Name	Customer Name
1001	201	100	Laptop	Ahana
1002	202	100	Smartphone	Ahana

In the table above, the product name and customer name unquestionably depend on the customer ID portion of the primary key (Order ID, Customer ID). Hence, this is a partial dependency and results in a violation of 2NF. We will, thus, need to create separate tables to normalize the above table. You will create a table for orders (order ID, customer ID), products (product ID, name), and customers (customer ID, customer name). You can then bring data integrity with normalization in DBMS.

ALSO READ: What are the Unique Benefits of Keys in DBMS?

3NF: Third Normal Form

Every table in the Third Normal Form (3NF) must first be in the 2NF. Furthermore, it must eliminate transitive dependencies. This means that non-key attributes should specifically depend on the primary key and not other non-key attributes. There should also be no indirect relationships where one non-key attribute depends on another non-key attribute.

Violations of 3NF

Let’s take an example of a table that is in 2NF but violates 3NF:

Student ID	Student Name	Course ID	Course Name	Instructor ID	Instructor Name
1	Alizeh	101	Maths	501	Dr More
2	Benny	102	English	502	Dr Das

The primary key could be (Student ID and Course ID) since a student can take multiple courses, and each combination is unique. There is an explicitly transitive dependency between the non-key columns— Instructor Name and Instructor ID.

This is because the instructor’s name depends on the instructor ID (a non-key column), not on the primary key. In essence, the instructor’s name indirectly depends on the primary key through the instructor ID.

You need to split the table into two separate ones to remove the transitive dependency. As a result, you will have a course table with information related to the course and instructor. The instructor’s table will then store information about instructors.

ALSO READ: SQL vs Python: Which is the Better Programming Language for You

BCNF: Boyce-Codd Normal Form

Boyce-Codd Normal Form is a stricter version of the Third Normal Form (3NF). It states that every non-trivial dependency in a relation must undoubtedly be a superkey. A superkey is any set of attributes that can definitively identify a tuple (row) in a relation. Lastly, there should be no non-trivial dependencies that do not involve a superkey.

Violations of BCNF

Here’s a table that particularly deals with course offerings:

Course ID

Department ID

Department Name

Department Name is undeniably dependent transitively on Course ID through the Department ID. This, therefore, violates BCNF as the Department Name is not fully dependent on the Course ID, which is not a superkey. So, you have to create separate tables to normalize the above example into BCNF.

Course Table: CourseID, Department ID…
Department Table: Department ID, Department Name…

Level up Your Skills With Emeritus

In conclusion, the importance of normalization in DBMS cannot be stressed enough. It is evidently indispensable to the process of designing reliable databases. Consequently, you must have a firm grasp of database normalization to become proficient in SQL and database management.

There is a range of online IT courses at Emeritus if you want to deepen your knowledge of SQL, database design, and other IT skills. These courses will not only enhance your expertise but also offer practical examples relevant to the industry. The curriculum covers topics such as SQL, database optimization, and full-stack development. Join one of these courses at Emeritus to unlock your potential and emphatically take your career to soaring heights.

Write to us at content@emeritus.org

Source:

https://survey.stackoverflow.co/2024/technology#most-popular-technologies

Data Science programming language

Related courses

The Ultimate Guide to Normalization in DBMS: From 1NF to BCNF

What is Normalization?