A slowly changing dimension approach is not always the right choice

First of, what is a “a slowly changing dimension”. A slowly changing dimension is a table in which you record changes over time using start and end dates to mark each period of change. This approach is also known as a “type 2 dimension”, but seeing as this is a rather technical term and less self-explanatory I will use the term “slowly changing approach”. A example of where you might want to use this approach is “employee contract information”. The figure below illustrates two contracts each regarding the same employee but a different period in which they were active:

More information regarding this method can be found here. This is a very powerfull way to record history. However, as we concluded in my previous post, this way of recording changes is not very well supported by most clients (Excel in particular). Getting to information quicker and more “user friendly” is our (business intelligence consultants) primary goal. So what are the alternatives? Well, as always, this depends on the scenario. What I would like to share with you is how one scenario might scream the “slowly changing dimension” approach while it is actually more suitable for a different approach.

The example business scenario is “service management”, in particular tracking the progress of registered cases. Each case has a status which tells us where the case is in the process. Each case can have the following statuses: Unassigned – Assigned – In Progress – Testing – Delivered (for the sake of simplicty I have chosen for a small set of statuses). The main questions the customer wants be able to answer are:

  • The duration of each case in each status.
  • The total case duration.
  • The total time to fix (equals the duration of “Unassigned” plus “Assigned” plus “In Progress” plus “Testing”)

Using the “slowly changing dimension” approach our model would look like this:

The business user would be able to easily see the case duration by selecting the column “Duration in hours”. Answering the question “What is the total time to fix” is more difficult, here the business user has to be aware of the start date and end date that marks each status change. He or she will have to use the column “Is current status” in combination with the column “Time to fix” to answer this question.

Here is an alternative model which I think better fits the scenario:

Using the model shown above the business user can answer all of the stated question by simply selecting the appropiate column without any “special” conditions.

The business user should not need to be aware of any conditions to answer any of the stated questions. This is, what I think, should be the driving force behind every dimensional model you design. The choice for a modeling technique should be business driven. Always go back to the set of questions they want to get answered. You will most often be suprised to see that it is not that complex.

Point in time analysis, using the many to many approach

Being able to perform a point in time analysis is a common business requirement. Most of the business users I know use Microsoft Excel 2007 / 2010 as their main client application for analysis (with Microsoft Analysis Services). The business users expect Excel to support point in time analysis. Unfortunately this can not be done in Excel in a straightforward way.

What is point in time analysis? Imagine we record the planning per project per resource per day. Each change in the planning is recorded as a seperate line in our data warehouse fact table and tagged with a start and end date, the start and end date represent the period in which the record (planning) was current. This design pattern is also known as “Slowly Changing Fact” and is similar to the Slowly Changing Dimension pattern. 

Using the start and end date we can see what the current planning was on a certain date. For example; I have recorded the planning (and its changes) for december 2010:

Planning Facts

I would like to know what my planning was on 12 december 2010 for project “Student Lifecycle Intelligence”. In Excel this means stating the following: “Current from >= 12 december and Current to <= 12 december”. This is rather technical and not intuitive for the user. The user wants to be able to say “Give me the current planning on 12 december 2010″.

The solution can be found in the design of the underlying dimensional model. This design is based on the “many to many revolution” article by Marco Russo. This is a must read for any dimensional modeler.

The proposed design pattern takes all unique “valid from” and “valid to” combinations and places these in a seperate dimension. This dimension will not be shown to the user. A factless table called “ValidFromToTime” is placed between the newly created (hidden) dimension “ValidFromTo” and the known dimension “Time”.

This model can be used in SSAS and exposes a different time dimension (role played) from which the user can select a particular date. In our example the user would select 12 december 2010 as the date.

There are still a few drawbacks to this approach, for example, what is keeping the user from selecting a month instead of a particular date and thus accumulating planned hours which should not be accumulated. You could restrict this using dimension – attribute security in SSAS.