User Analytics: How to visualize information

In previous posts you have seen how to collect User Analytics. In this post you will see how to analyze it through OLAP by mean of simple but illustrative Inquiry. You will see how to generate a relational Data Warehouse to capture User Analytics data and how to create inquiries that satisfy your needs.

For your reference here is a list of preceding posts:

  1. User Analytics: Why you need them
  2. User Analytics: How to Collect Data

OLAP stands for On-line Analytic Processing and is usually implemented as either ROLAP (based on relational technology) or MOLAP (data stored in memory). Let's see how you can build an OLAP application for User Analytics.

OLAP require a definition of multidimensional model. In this example you will use PISA that automates most tedious tasks and making ROLAP as easy as MOLAP.

Let's start defining your model. The User dimension first:

Dimension> name=User
  Level> name=Email
  Level> name=Company
  Level> name=Country
  Level> name=Skill

PISA is configured in SOX (a simplified form of XML) that is easy to write and understand (any text editor can do the job). You define a dimension and immediately under it (indented) the bottom level, in this case email. You also define that email is the natural key. Over email you define three levels Company, Country and skill, at same level. You define from Inner to outer level.

Now let's go to dimension ruled by Counter:

Dimension> name=Counter
  Level> name=CounterName

Dimension> name=Task
  Level> name=CounterName
  Level> name=Task
  Level> name=Module
  Level> name=Application

Dimension> name=Severity
  Level> name=CounterName
  Level> name=Severity

Aha, you have defined three dimensions here instead of one! Why? Your model will require crossing, for example, counter with task, but most MDX implementations do not allow using same dimension on different axis. So you shall define three distinct dimensions pointing to same table (this is the reason of table = *, PISA takes care to do the rest).

First dimension is at finest granularity (the counter itself). Second dimension (Task) will allow you to drill down from application to task and finally to specific counter. Severity dimension is used to drill down from different severities to specific errors (counter).

And finally your facts:

Fact> name=Usage
  Dimension> name=User
  Dimension> name=Date
  Dimension> name=Counter
  Dimension> name=Task
  Dimension> name=Severity
  Measure> name=Count

You simply list the dimensions involved in each different fact ans the captured measure. You could obviously add computed measures, fields length and so on, but, to produce an initial model that's enough.

Then you ask PISA to generate your database and PISA, with a mouse click creates and submits required SQL. PISA also dynamically generates the schema to be used by ROLAP engine (Mondrian in this example).

Hungry of results, you still have no data. So you ask PISA to generate an import template for your model. PISA generates an empty spreadsheet and a configuration file for it. The idea is to develop all your model through spreadsheet input, once your multidimensional model fits your needs, you simply modify configuration files to feed data from production data sources.

Here is the template generated by PISA with 9 data rows manually added to try the model:

Last step is to import data (again, just a button click) and now you can interactively explore user analytics model:

As you can see you can visually explore dimensions, hierarchies, levels, members and functions. To submit an inquiry simply drag and drop them. PISA is different from usual cross-tabulation tools that are usually limited. A rich set of operations let you assemble complex formulas and inquiries and PISA takes care of mapping them to MDX.

Let's see a simple example. You would like to compare how many fatal errors for a specific task make users in different clients to know if some training is needed in some case. Here is your query for the specific task: “Create Prospect”:

And here is the result produced by PISA. You can see that users at “ACME” are quite more skilled (lower error rate).

This query is apparently quite simple, but PISA has generated a reasonable chunk of MDX that otherwise you should have written yourself, and formatted the result for interactive visualization and plot.

Note: PISA is simply a way to simplify your work, you could have done everything by building an application using simple relational technology (SQL).

Last modified on 2011-05-24 by Administrator