First, installation Preparation
1, operating system Centos 7.x
2. Time Issues
All nodes in the cluster must be synchronized in time. NTP, Chrony
3. Users
Create Hadoop groups and Hadoop users, and do ssh password-free login
4. Hadoop ha Cluster
2.7. 3
5. HBase
1. x
6. Hive
1.2. 1, using MySQL to store metadata
7. Prepare the Catalogue
# mkdir//-R Hadoop:hadoop/install
8, Kylin
1.6. 0 This version supports the hbase1.x version of Apache-kylin-1.6. 0-hbase1. 1.3-bin.tar.gz$ tar xf Apache-kylin-1.6. 0-hbase1.x-bin.tar.gz -C//install$ mv Apache-kylin-1.6. 0-bin/kylin
#代表在root用户下
$ on behalf of ordinary users
Second, the environment variable configuration
Deploy each node of the Hadoop user's. Bashrcexport hadooproot=/Installexport hadoop_home= $HADOOPROOT/Hadoopexport zookeeper_home= $HADOOPROOT/Zookeeperexport hbase_home= $HADOOPROOT/Hbaseexport hive_home= $HADOOPROOT/hive1.2Export Hcat_home= $HIVE _home/Hcatalogexport kylin_home= $HADOOPROOT/Kylinexport catalina_home= $KYLIN _home/Tomcatexport hive_dependency= $HIVE _home/conf: $HIVE _home/lib/*: $HCAT _home/share/hcatalog/hive-hcatalog-core-1.2.1.jarpath= $PATH: $HOME/bin: $HADOOP _home/bin: $HADOOP _home/ Sbin: $ZOOKEEPER _home/binpath= $PATH: $HBASE _home/bin: $FLUME _home/bin: $HIVE _home/bin: $HCAT _home/binpath= $PATH: $ Catalina_home/bin: $KYLIN _home/binexport PATH
The basic configuration is ready, and the installation starts with the following steps
Third, configuration Kylin
Modify bin/kylin.shexport kylin_home=/install/kylinexporthbase_classpath_prefix =${tomcat_root}/bin /bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*: $hive _dependency: $HBASE _ Classpath_prefix
Compression issues
Problems with compression
This time is not used snappy, if need to recompile the Hadoop source code, make native library support snappy
A suitable compression ratio can be achieved using snappy, which makes the intermediate and final results of this operation occupy a smaller amount of storage space.
1, Kylin.properties
1 ) Set rest Serverkylin.rest.servers=192.168. 56.201:7070 default to PST, modified to China time Kylin.rest.timezone=gmt+8
2) Do not enable compression, comments can
#kylin. HBase. default. Compression.codec=snappy (commented out or set to none)
3) define the Kylin jar package for Mr Jobs's Job.jar package and hbase for improved performance (add-ons).
kylin.job.jar=/installsoftware/kylin-1.6. 0/lib/kylin-job-1.6. 0 . Jarkylin.coprocessor.local.jar=/installsoftware/kylin-1.6. 0/lib/kylin-coprocessor-1.6. 0
2, Kylin_job_conf.xml
Set the falsemapreduce.output.fileoutputformat.compress to false without using the compression mapreduce.map.output.compress setting
3, Kylin_hive_conf.xml
Do not use compression Hive.exec.compress.output set to False
Iv. Start-up service
Kylin Working principle Diagram
Support Service start up
1, first look at whether the time synchronization
2 , start 3 nodes of zookeeperzkserver.sh StartStart-dfs.shstart-yarn.sh or start-ALL.SHMR -jobhistory-daemon.sh start Historyserver to be started on all nm, can be written as script start-hbase.sh
> List
Here you can start the hive client to see
$ hive
> Show tables;
Check
1, check the basic services
Hadoop, HBase, Hive, environment variables, working directory
2. Hive Dependency Check
find-hive-dependency.sh
3. HBase dependency Check
find-hbase-dependency.sh start kylinbin/kylin.sh start stop process bin/kylin.sh stopstop- HBASE.SHMR-jobhistory-daemon.sh stop historyserverstop-yarn.shstop-dfs.shzkServer.sh Stop can be written as a script
Five, Login
Http://node1:7070/kylin
Admin/kylin Login
Six, sample data test
After starting Kylin, run bin/sample.sh
View sample.sh Script Content
Actually manipulating the data and scripts in the Sample_cube directory
Restart Kylin Service
Look at Hive and HBase.
Meta-data information for Kylin in hive
There is a cube definition by default and a build is required.
Monitoring the entire build process in monitor
Cube builds after a successful state becomes ready state
Building the cube process varies depending on the performance of the cluster
Seven, query time comparison
Test StatementSelectPART_DT, SUM (price) asTotal_selled, COUNT (distinct seller_id) asSellers fromkylin_sales GROUP BY Part_dt Order by PART_DT;SelectPART_DT, SUM (price) asTotal_selled, COUNT (distinct seller_id) asSellers fromKylin_saleswherepart_dt<'2013-01-01'GROUP BY Part_dt Order by part_dt;hive execution time taken:168.643Seconds, fetched:365row (s) Kylin for the first time1. 33S second time0. 38s the third time0. 33s fourth time0. 34s appears to have a cacheSelectPART_DT, SUM (price) asTotal_selled, COUNT (distinct seller_id) asSellers fromKylin_sales GROUP by PART_DT have sum (price) > -ORDER BY Part_dt
The pre-installation deployment to this Kylin is complete
Kylin Installation Deployment