Well-formed Data – Part 3: Normalization

Our lives are made up almost exclusively of relationships and that includes data.  In fact, it’s hard to think of examples of data that doesn’t involve a relationship.  Customers (hopefully) have many Orders.  Companies have multiple employees.  People (hopefully) have multiple friends, who also have multiple friends, and may even share some of each others friends.  People may have multiple phone numbers, e-mail addresses, and so on. 

Admittedly, this may be the geekiest of topics on creating well-formed data.  In part 3 of our 4 part series, we discuss one way to make sure your data follows the ”1 Concept per Row” rule.  Normalization is used whenever one concept relates to another concept.  Given that almost all data is about relationships and we want to analyze that data, we have to know a bit about normalization which is all about those relationships.  There are whole college classes dedicated to this topic, so we won’t be able to cover it all in this 8:52 video, but hopefully you’ll understand enough that you can do the basics yourself or talk intelligently to your data warehouse person.

In the video we discuss, 2 types of relationships…1 to many (e.g. 1 person may have 1 or more orders) and many to many (e.g. people have many friends that can overlap).  These require slightly different solutions.  There is a 3rd type of relationship, 1 to 1, however, dealing with that one doesn’t require normalization.  That data can just be stored in the same spreadsheet since it follows the “1 Concept per Row” rule.  The examples use Excel and the vLookup() function to illustrate the concepts.

Let me know what you think!

In the final video, we’ll wrap up our tour of well-formed data with a discussion on how to deal with missing data.

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>