OLAP learning and instance Building
Theoretical preparation:
I. OLAP
1. What is OLAP?
OLAP (on-lineanalysis processing) is a fast analysis technology that shares multidimensional information. OLAP uses multidimensional database technology to enable users to observe data from different angles; OLAP is used to support complex analysis operations and focuses on the decision-making support for management personnel. It can meet the requirements of analysts for fast and flexible Big Data repeat complex queries, the query results are displayed in an intuitive and easy-to-understand form to assist in decision-making.
2. Related Concepts
(1) dimension
It is a specific angle for people to observe data. It is a type of attribute set that considers the problem to form a dimension (such as time dimension and geographical dimension ).
(2) level)
A specific angle (I .e. a dimension) of people's observation data can also have different descriptions (such as time dimension: date, month, quarter, and year ). That is, the dimension level.
(3) member (member)
A value of a dimension is the description of the position of a data item in a dimension. ("A month or a day of a year" is a description of the time dimension ).
(4) Measurement)
The value of a multi-dimensional array, such as "salary of a month or a day of a year ".
(5) drilling (drill-up and drill-down)
Change the dimension level and the granularity of the analysis. Drill-up refers to summarizing low-level data to high-level summary data or reducing dimensions. Drill-up refers to the opposite. It refers to penetrating summarized data into details or adding new dimensions.
(6) slice and cut surface
After you select a value for a certain dimension, you are concerned about the distribution of measurement data on the remaining dimension. If there are only two remaining dimensions, the slice is used. If there are three or more dimensions, the slice is used.
(7) Rotating
Is to change the direction of the dimension, that is, to reschedule the placement of the dimension in the table (such as row and column Interchange)
(8) Star Mode
A fact table consists of fact tables and dimension tables. A fact table includes all the foreign keys and measurements of the analysis dimension. A dimension table corresponds to the analysis angle and contains description and classification information in addition to the primary key.
(9) snowflake Mode
Sometimes, the definition of dimension tables becomes complicated. For example, for product dimension, the dimension table must be divided by product type, some special products, and other brands, product brands and product categories are not the same. Therefore, a single dimension table is not an ideal solution. You can use the following method. This data model is called the snowflake model.
2. Mondrian Learning
2.1 Mondrian Architecture
1) underlying database
2)Storage layerDatabase (data warehouse) Creation
The process of converting the original underlying database into a star or snowflake model
3)Dimension LayerSchema file (key part)
The data warehouse at the storage layer is converted into a schema file, which can be accessed through Schema-workbench or handwriting.
4)Presentation LayerCompile a JSP file to show that it is displayed at the presentation layer provided by July.
July is the taglib of Mondrian performance layer,
July is based entirely on JSP + taglib;
Another thing that July may not be used to is that it is completely based on taglib rather than the familiar MVC pattern.
However, it can easily display multi-dimensional data to end users.
The following is the Mondrian architecture diagram provided on the official website. It is clear that the entire project consists of the underlying database, the data warehouse at the storage layer, the schema file at the dimension layer, and the presentation layer.
Mondrian provides a query API for the client.
So far, Mondrian does not have a common API for OLAP queries. Therefore, Mondrian provides its private API.
Even so, a person who often uses JDBC will also find it familiar. The difference is that it uses the MDX query language, not the SQL
The following Java snippet shows how to connect to Mondrian, execute a query, and finally print the result.
importmondrian.olap.*; import java.io.PrintWriter; Connectionconnection = DriverManager.getConnection("Provider=mondrian;" +"Jdbc=jdbc:odbc:MondrianFoodMart;"+"Catalog=/WEB-INF/FoodMart.xml;",null,false); Query query =connection.parseQuery("SELECT {[Measures].[Unit Sales], [Measures].[StoreSales]} on columns," +" {[Product].children} on rows " +"FROM[Sales] " +"WHERE ([Time].[1997].[Q1], [Store].[CA].[SanFrancisco])"); Result result =connection.execute(query); result.print(newPrintWriter(System.out));
Similar to JDBC, a connection is created by drivermanager. A query object is similar to a JDBC Statement. It is created by passing an MDX statement. the result object is similar to the JDBC resultset, except that it stores multi-dimensional data.
You can view the javadoc In the Mondrian help document to learn more about Mondrian APIs.
2.2 prepare development tools and environment
Environment required for this test:
Operating System: Windows 7;
Web server: tomcat6.0;
Relational Database: MySQL;
Development Tool: myeclipse;
Driver: mysql-connector-java-3.1.12-bin.jar
(1) Configure JDK and tomcat Environment Variables
(2) create an underlying database
/*NavicatMySQL Data Transfer SourceServer : localhost_3306SourceServer Version : 50096SourceHost : localhost:3306SourceDatabase : accessinfo TargetServer Type : MYSQLTargetServer Version : 50096FileEncoding : 65001 Date:2015-03-02 11:36:40*/ SETFOREIGN_KEY_CHECKS=0; -------------------------------- Tablestructure for `dim_ip`------------------------------DROPTABLE IF EXISTS `dim_ip`;CREATETABLE `dim_ip` ( `id` smallint(6) NOT NULL auto_increment, `dip` varchar(255) default NULL, PRIMARY KEY (`id`))ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1; --------------------------------Records of dim_ip------------------------------INSERTINTO `dim_ip` VALUES ('1', '61.183.248.218');INSERTINTO `dim_ip` VALUES ('2', '61.144.207.115'); -------------------------------- Tablestructure for `dim_site`------------------------------DROPTABLE IF EXISTS `dim_site`;CREATETABLE `dim_site` ( `id` smallint(6) NOT NULL auto_increment, `dSiteID` int(11) default NULL, PRIMARY KEY (`id`))ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1; --------------------------------Records of dim_site------------------------------INSERTINTO `dim_site` VALUES ('1', '542');INSERTINTO `dim_site` VALUES ('2', '548');INSERTINTO `dim_site` VALUES ('3', '543');INSERTINTO `dim_site` VALUES ('4', '552');INSERTINTO `dim_site` VALUES ('5', '551');INSERTINTO `dim_site` VALUES ('6', '549'); -------------------------------- Tablestructure for `dim_time`------------------------------DROPTABLE IF EXISTS `dim_time`;CREATETABLE `dim_time` ( `id` smallint(6) NOT NULL auto_increment, `signinTime` varchar(10) NOT NULL default '', PRIMARY KEY (`id`))ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1; -- ------------------------------Records of dim_time------------------------------INSERTINTO `dim_time` VALUES ('1', '2015-3-1');INSERTINTO `dim_time` VALUES ('2', '2015-2-28');INSERTINTO `dim_time` VALUES ('3', '2015-2-17');INSERTINTO `dim_time` VALUES ('4', '2015-2-19');INSERTINTO `dim_time` VALUES ('5', '2015-2-11'); -------------------------------- Tablestructure for `fact_logs`------------------------------DROPTABLE IF EXISTS `fact_logs`;CREATETABLE `fact_logs` ( `fID` varchar(20) NOT NULL default '', `fSiteID` varchar(20) default NULL, `fTime` varchar(10) default NULL, `fIP` varchar(20) default NULL, `fCount` int(11) default NULL, PRIMARY KEY (`fID`))ENGINE=InnoDB DEFAULT CHARSET=latin1; -- ------------------------------Records of fact_logs------------------------------INSERTINTO `fact_logs` VALUES ('1', '1', '4', '1', '87');INSERTINTO `fact_logs` VALUES ('10', '2', '5', '2', '14');INSERTINTO `fact_logs` VALUES ('2', '1', '4', '2', '128');INSERTINTO `fact_logs` VALUES ('3', '3', '4', '1', '5');INSERTINTO `fact_logs` VALUES ('4', '4', '4', '2', '4');INSERTINTO `fact_logs` VALUES ('5', '5', '4', '2', '5');INSERTINTO `fact_logs` VALUES ('6', '6', '4', '2', '3');INSERTINTO `fact_logs` VALUES ('7', '2', '4', '2', '4');INSERTINTO `fact_logs` VALUES ('8', '5', '5', '2', '15');INSERT INTO `fact_logs` VALUES ('9','6', '5', '2', '13');
An access log fact table is analyzed in three dimensions: site, IP address, and date. Fact table records.
Fact_logs is a fact table, where dim_ip, dim_site, and dim_time represent three dimension tables.
(3) Definition Mode
You can use Schema-Workbench to generate accessinfo. xml and copy it to E: \ apache-Tomcat-7.0.40 \ webapps \ Mondrian \ WEB-INF \ queries
<Schema name="Access record warehouse"> <Dimensiontype="StandardDimension" visible="true" name="AccessTime"> <Hierarchyvisible="true" hasAll="true" allMemberName="AllTime" primaryKey="id"> <Tablename="dim_time" alias=""> </Table> <Level name="Sign Time"visible="true" table="dim_time"column="signinTime" internalType="String"uniqueMembers="true"> </Level> </Hierarchy> </Dimension> <Dimensiontype="StandardDimension" visible="true" name="WebsiteNum"> <Hierarchyvisible="true" hasAll="true" allMemberName="AllSite" primaryKey="id"> <Tablename="dim_site" alias=""> </Table> <Levelname="Sign Site" visible="true" table="dim_site"column="dSiteID" internalType="int"uniqueMembers="false"> </Level> </Hierarchy> </Dimension> <Dimensiontype="StandardDimension" visible="true" name="UserIP"> <Hierarchyvisible="true" hasAll="true" allMemberName="AllIP" primaryKey="id"> <Tablename="dim_ip" alias=""> </Table> <Levelname="Sign IP" visible="true" table="dim_ip"column="dip" type="String"uniqueMembers="false"> </Level> </Hierarchy> </Dimension> <Cubename="Access Analysis" visible="true"cache="true" enabled="true"> <Tablename="fact_logs" alias=""> </Table> <DimensionUsagesource="Access Time" name="Access Time"visible="true" foreignKey="fTime"> </DimensionUsage> <DimensionUsagesource="Website Num" name="WebSite Num"visible="true" foreignKey="fSiteID"> </DimensionUsage> <DimensionUsagesource="User IP" name="User IP" visible="true"foreignKey="fIP"> </DimensionUsage> <Measurename="Amount" column="fCount" datatype="Integer"aggregator="sum" visible="true"> </Measure> </Cube></Schema>
(4) Write accessinfo. jsp in the presentation layer, which is located in E: \ apache-Tomcat-7.0.40 \ webapps \ Mondrian \ WEB-INF \ queries
<%@ page import="mondrian.olap.*"%><%@ page session="true"contentType="text/html; charset=ISO-8859-1" %><%@ tagliburi="http://www.tonbeller.com/jpivot" prefix="jp" %><%@ taglib prefix="c"uri="http://java.sun.com/jstl/core" %><jp:mondrianQuery id="query01"jdbcDriver="com.mysql.jdbc.Driver"jdbcUrl="jdbc:mysql://localhost/accessInfo"catalogUri="/WEB-INF/queries/AccessInfo.xml"jdbcUser="root" jdbcPassword="root" connectionPooling="false">select NON EMPTY {[Measures].[Amount]} ONCOLUMNS, NON EMPTY{([Access Time].[All Time], [Website Num].[All Site]) } ONROWS from [Access Analysis] where [User IP].[All IP].[61.144.207.115]</jp:mondrianQuery>
The metric mesures is the specific Log Access volume (amount). The dimensions are access time, website num, and user IP. Expand and close the drill-up and drill-down operations in OLAP corresponding to all time and all site, where [userip] in mdx. [61.144.207.115] indicates a specific aspect. You can modify the value of [user IP] to obtain the Log Access values of time and site under different aspect.
OLAP learning and instance Building