Original link: http://www.open-open.com/lib/view/open1455673846058.html
Environment
- CentOS 6.5 64-bit
- JDK 1.8.0_20
- Elasticsearch 1.7.3
- LogStash 1.5.6
- Kibana 4.1.4
Introduced
Elasticsearch is a well-known open source search engine, now many companies use Elk technology stack for log analysis, such as Sina use elk processing 3.2 billion records per day, detailed introduction can see here
Our data volume is not as large as Sina, one day normal level in 60 million or so, more when there is an billion record, by the Sina case inspired us to build their own simple data analysis system based on elk, just started to choose this reason: (1) I am a person toss things, (2) I will not front, But elk in the Kibana can be directly used, (3) Hadoop/hbase, Storm and other big data stacks need to learn costs, short-term difficulty is too large. (4) The number of machines available is also quite a dick wire.
Environment construction
- Need to install Java, configure Java_home,bin directory to add to PATH environment variable
ElasticSearch
- Download Elasticsearch, then unzip to/opt
- execution/opt/elasticsearch-1.7.3/bin/elasticsearch-d can be started in the background, but in order to manage the elk three processes at the same time, I chose Supervisor for unified management
- After starting Elasticsearch, we need to close the word breaker, the need for data analysis is not needed, and there are problems, but when as a search engine, this is necessary.
Kibana
- Download Kibana, then unzip to/opt
- Run/opt/kibana-4.1.4-linux-x64/bin/kibana, same for supervisor management
- Visit http://YourIP:5601 to
Logstash
Our data comes from a topic in Kafka, the format is JSON, output to Elasticsearch index, varies by day
Simple data analysis
- Ran for four hours, almost 890w of data.
Let's take a look at the OS version number of the device (Android 4.4.4 has the most devices, almost 3 million)
Equipment Model Distribution
Simple data analysis based on Elk