Why Business Intelligence on OLTP is evil

In a previous post we saw main differences between OLTP and OLAP databases. In this post I will show through a practical example what can and will probably go wrong when you try to map a OLTP data to a multidimensional model.

Let's consider a classic, normalized OLTP sales schema, i. e. a schema used in typical operation datastore.


 

For sake of simplicity the schema shows only attributes we will use in our example, an actual schema would be quite more complex and provide much more analytic alternatives.

In our case a to each client is assigned a single account manager that pertains to a branch office. Most clients are divisions of large corporations. Client sales history is tracked through their invoices, composed of multiple details. Each detail is bound to one of our products that pertains to one of our brands.

An happy start

Let's figure a simple query. We would like to compare sales of a particular brand between different sales branches. You can immediately see that this involves a join of 7 tables! I am not talking in terms of performance (that would anyway be an issue too), I am just telling you shall write 6 times something like:

WHERE TABLE_X.KEY = TABLE_Y.FOREIGN_KEY AND …

That would not be so terrible if all queries were similar, but, if instead of analyzing by branch you would like to analyze by client corporation you will need a different join (this time 1 table less). Real situations involving more than 20 tables  are not uncommon when database is well structured and correctly normalized.

This model is ultra-simplified, we have not considered, for example client buyer. Now think on all combination of possible paths. Your team will pass life writing and maintaining SQL.

OK, I know there is a visual tool that will write SQL for you by simply connecting tables. So your team will pass life visually connecting tables. And, performance problem (joining 7 tables) is still a pending issue. But at least … Your user is happy!

Things grow up

Since you have a lot of clever developers and a lot of budget, you accepted the OLTP+SQL challenge and your sales manager seems quite satisfied with first results. So satisfied he tells financial director about your success, and financial director comes to you...

“Look I basically need similar thing but … Our fiscal year do not match calendar year. That is obviously not a problem because everybody knows that computers can do date math...”.

Now your SQL gets harder, inflated by date math, but your budget gets a little larger, thanks to additional contribution of financial director.

Satisfied sales manager is back to you with an additional requirement. He would like to analyze sales by geography. “It is so trivial” he says “just add sales region to branch record!”

It happens that your sales application is part of an acquired ERP, so you can't simply put your hands on those tables. After contacting your supplier you get a quotation for implementing such customization. At this point your sales manager a little less satisfied to know that he needs to spend extra money for what he considered a trivial thing. But things go on.

ERP implementor is quite competent and takes care of putting sales region into a field called USER_1. You are happy to discover that there are still 9 other USER_X fields, just in case... Now all your SQL is a little less readable but, at least, everybody knows what was stored into USER_1.

Bypassing performance issues

Your company is growing and database too. Queries get slower and user satisfaction suffers. So you contact again ERP implementor and tell the problem. Thanks god! This time the suggested an easy solution: just switch to a higher performance database. You just need a little extra budget your users will not dare negate (ongoing joke is that response time was renamed “delivery date”).

New database engine is finally up and running and … oh ...! SQL date math is not compatible! Damn: thanks to financial director, date math is spread in a lot of queries and you already told your user that switching DB engine would be a snap. OK you are going to explain. Just a some more time and a little extra budget.

At least! New database engine went live but far away from your expectations. Response time improved by 60% but you needed more. Data is still growing and you know that in a few months things can collapse. No magic wands or extra lives left, Your last resort is what you have always been told to avoid: data redundancy. You start creating summary tables for most heavy queries and finally your user smiles again.

But maintenance is now becoming a nightmare. Hundreds of queries to be adjusted whenever something changes plus summary tables to be kept in sync. Whenever one of such sync goes wrong you hear users crying. Until one day you face a new problem.

Fatal Business Dynamics

Joe, an account manager, is moved from LA to NY branch office and surprise... all past statistics of LA sales history were updated as if Joe had always worked in at NY branch. This is obvious to you: it simply reflects the change of association between account manager and branch, but not to your user that get furious.

Other changes are on the way due to a large restructuring. Also client merges will shortly affect corporations and product-brand structure is going to change.

The solution could be to transform all simple associations into link tables with from-time to-time attributes, but you know that the complexity of queries would become unmanageable.

After so much effort, investment and upgrades you have reached a dead end. Your only sin: directly accessing OLTP data for long range, strategic, business intelligence.

All those problems could be avoided should you have built a data warehouse based on multidimensional star schema, that is the subject of a future post. Stay tuned.


Last modified on 2011-05-23 by Administrator