Flex 3:Feature Introductions: OLAPDataGrid
From Adobe Labs
| Table of contents |
OLAP Introduction
On-Line Analytical Processing or OLAP analysis enables analyzing data selectively from various points-of-view. Such an analysis is often useful in data-mining, trend analysis and study of the relationship between various data items.
The heart of OLAP is in the concept of a cube. A cube keeps organized and summarized data into a multidimensional structure defined by a set of dimensions and measures. Storing pre-aggregated data facilitates quick answers to the multidimensional cross-tab queries.
Many state-of-the-art implementations for OLAP analysis are available which are known for delivering scalable, reliable and secure solutions through a combination of server and client technologies. They are capable of dealing with very large amount of data sets and providing very quick results. The data in all these solutions sits on non-volatile medium.
But many a times such an analysis is needed for comparatively smaller data sets and it is plausible to explore some lighter weight solutions. The new OLAP feature targets at providing an in-memory representation of a cube which is capable of providing answers to multidimensional queries.
It follows widely accepted server client model and thus the architecture of the OLAP feature can also be broken down into server and client components. On the server side, there is OLAP API which allows creation of an in-memory cube which can be queried with the help of query APIs. When such an in-memory cube (or OLAPCube) is queried the returned result can be fetched to the new OLAPDataGrid component which serves as a consumer of the query results.
Leveraging the capabilities of AdvancedDataGrid, OLAPDataGrid provides a customizable and configurable UI which is capable of consuming the result set of a multidimensional query and displaying it in a cross-tab fashion.
Walk through via sample
Let's take an example to have a clearer picture about the capabilities of OLAP. Say we have the following flat data
| Age | Gender | Marital Status | Country | Zip Code | Area Code | Through ADs? | Purchased? | Time Spent on URL |
|---|---|---|---|---|---|---|---|---|
| 53 | Male | Married | US | 07074 | 973 | Yes | Yes | 252 |
| 53 | Male | Married | US | 07074 | 973 | Yes | Yes | 252 |
| 41 | Female | Married | US | 07074 | 973 | Yes | Yes | 325 |
| 29 | Male | Single | US | 07074 | 973 | No | No | 460 |
| 21 | Male | Single | US | 07074 | 973 | No | No | 635 |
| 36 | Male | Single | US | 07078 | 973 | No | Yes | 370 |
| 45 | Male | Single | US | 07078 | 973 | No | Yes | 296 |
| 54 | Male | Married | US | 07078 | 973 | No | No | 247 |
| 21 | Female | Single | US | 07078 | 212 | No | No | 500 |
| 46 | Male | Married | US | 07078 | 212 | Yes | Yes | 290 |
| 41 | Female | Married | US | 07078 | 718 | Yes | Yes | 325 |
| 32 | Male | Single | US | 07078 | 212 | No | No | 417 |
This data depicts website activity for a consumers product company. The Company logs the various details like gender, marital status of the customer along with whether the customer purchased something or not. In this flat form, it doesn't give any insight into which gender from which country purchases more products or whether most of the visitors are married or not or visitors coming through ads make more purchase or not? These questions are actually multidimensional queries and the analysis required is a cross-tab or in our words OLAP analysis of the data.
So let's see how answers to these queries can be given using OLAP.
- Define the Cube Schema - To build your cube you should first
decide the dimensions of the cube. Dimension allows categorizing fields
into independent entities like Time, Geography, Person etc. Each
dimension is a collections of attributes, with each attribute
corresponding to one column in the flat data table. For the sample data, let's say we choose three dimensions - Geography, Customer and Purchase.
Dimensions → Customer Geography Purchase Attributes → Age Gender Marital Status Country Zip Code Area Code Through ADs? Purchased? Refer to the samples to find out the code needed for this.
- Building the cube - Once the cube schema is defined, the
next step is to provide a data source to the cube. After assigning the
dataprovider, cube build process can be triggered by calling
cube.refresh()
- Cube events (Optional) - During the build process the cube
dispatches progress events like CUBE_PROGRESS and CUBE_COMPLETE which can be listened and appropriate actions like providing a progress feedback can be taken.
- Creating a query - An OLAP Query has three parts: column
axis, row axis and a slicer axis. While querying an OLAPCube we need to
put OLAPSet on each of these axis.
It will be easier to understand if we start with a simple example. Let us pick up Gender and Marital Status as two attributes and analyze our web activity data.
Refer to in the sample 1 to find the code required for this.
Complex queries can be made by using OLAPSet APIs like union, crossjoin and hierarchize. - Displaying it using OLAPDataGrid - Once the query returns one can specify the returned result as dataProvider to the OLAPDataGrid component i.e
olapGrid.dataProvider = result as IOLAPResult. Thus if we queried for a Gender vs Marital Status Analysis and the query result is given to the OLAPDataGrid Component, the resulting cross-tab would look like:Gender Marital Status Married Single All Male 3211 9548.5 12759.5 Female 3274.5 2107.5 5831.5 All 6935.5 11655.5 18591
BREAKTIME: Wasn't this easy? You have your cube built and ready to get queried. Let's see how this can be achieved.
Diving deeper
Now when the basic picture is clear, let's walk a bit more in OLAP and explore more features in OLAP
- Complex Queries -
As mentioned earlier, query APIs support creation of complex queries with crossjoin, union and hierarchize as the building blocks.
Refer to ComplexQueries sample for the code required to achieve this. - Slicing and Filtering -
Filtering decides what members have to be included on row and column axes. Say for example if you put age on row axis and you are only interested in the age group of 20-35, you can provide a filter to exclude the others.
The Slicer axis allows slicing or restricting the result along some values.
Say one wants to analyzeGender vs Marital Statusbut this time he is interested in only US data, he can put[Geography].[Country].[US]on slicer axis. The query result will have data considered only for US.
P.S:- Slicing is conceptually different from Filtering. Slicing does not affect the selection of the axis members, but rather the values that go into them. Refer to SlicerAndFilterSample for the code required to achieve this. - Custom Aggregators -
The default aggregation method used by an OLAPCube is SUM operation. Imagine we are dealing with a average value of a stock, where SUM may not be the aggregation method a user wants. OLAPCube supports"SUM","AVG","MIN","MAX", and"COUNT"aggregation methods. It also allows use of any custom aggregators by implementing theIOLAPCustomAggregatorinterface.
Refer to CustomAggregators sample for the code required to achieve this. - Customizing OLAPDataGrid -
OLAPDataGrid component can be fully customized and the users can specify the renderers, styles, wordwrapping and formatting of their choice. OLAPDataGrid exposesheaderRendererProvidersanditemRendererProvider, using which a user can do such customizations. A user for example can specify to use a red font color for cell data. A user may also want to use a picture (country flag) as the header renderer for countries instead of just showing US and UK.
Refer to CustomOLAPDataGrid sample for the code required to achieve this.
FAQs
- Do I need a seperate server for this? - NO
OLAP is an in-memory representation of cube which gets created on the client machine on the fly. There is nothing like a separate server involved in the cube deployment process. - When you say compartively small, how small? - It works reasonably well up to 50000 rows. Though there is not a straight forward correlation of the number of rows with performance, but as obvious it increases with the number of rows. (Actually it depends on the data which is there i.e the number of fields, the data in those fields etc, which decides the size of the cube and hence the query performance). We still recommend analysis of small to medium datasets
- Can OLAPDataGrid talk with other OLAP Servers? - YES
OLAPDataGrid consumes an IOLAPResult and if your backend is some real OLAP server or even some other cube implementation you can implement the IOLAPResult interface and thus can make OLAPDataGrid display your query results. - Can the query result be consumed by a custom UI (FLEX or non-FLEX)? - YES
An OLAPCube returns an IOLAPResult as the query result. An IOLAPResult resembles the MDDataSet format defined by XML/A standards. There are variety of OLAP clients available which understand MDDataset format and an IOLAPResult can easily be dumped as an xml which can then be fed to these clients.
Summary
This is a brief overview of the features offered by OLAP API AND OLAPDataGrid Component. Please refer to ASdocs and Help file for the detail description of each of these features.
OLAP API and OLAPDataGrid is a new feature, and we are looking forward for your valuable feedback for improving the feature for the future releases.
