The previous log has been set up zookeeper cluster, see: http://www.cnblogs.com/lianliang/p/6533670.html, then continue to build Kafka cluster1, first download Kafka GZ package: Http://kafka.apache.org/downloadsUnzip to/opt/soft/kafka/directory, unzip and create folder logs for Kaf
directory under the command Line window to execute the gradle idea command, Then after a long wait, the console will appear to build a successful prompt, indicating that the Kafka source code compilation completed;(7) The development tool uses INTELLJ idea 14.1.7 (other higher versions can also be used):(8) Install the Scala plugin in IntelliJ idea, where I installed the plugin version 1.5.4:(9) Import the compiled
Now let's dive into the details of this solution and I'll show you how you can import data into Hadoop in just a few steps.
1. Extract data from RDBMS
All relational databases have a log file to record the latest transaction information. The first step in our flow solution is to get these transaction data and enable Hadoop to parse these transaction formats. (about how to parse these transaction logs, the original author did not introduce, may invol
First, install JDK and zooeleeper here omitted
Second, installation and Operation Kafka
Download
Http://kafka.apache.org/downloads.html
After the download to any directory, the author is D:\Java\Tool\kafka_2.11-0.10.0.1
1. Enter the Kafka configuration directory, D:\Java\Tool\kafka_2.11-0.10.0.12. Edit the file "Server.properties"3. Find and edit Log.dirs=d:\java\tool\kafka_2.11-0.10.0.1\
navigate to a specific file.When offset=368776 is positioned to 00000000000000368769.index|log
The second step is to find the message through the segment file by locating the segment file in the first step, and when offset=368776, navigate to the metadata physical location of 00000000000000368769.index and The physical offset address of the 00000000000000368769.log, which is then searched in 0000000000
.iteye.com/upload/attachment/0117/7244/ 355e3ba1-dae8-3e56-b779-e22eb5c590fe.png "style=" border:0px; "/>5. Consumption This performance: message structure optimization and stateless introduction of inexpensive, no need why B + Tree index.650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7246/ 6ef00c74-ceb6-3415-aa73-822b7e94d411.png "style=" border:0px; "/>In general, Kafka performance is outstanding, it is often a substitute fo
maximum is no more than 3 times times.
2. Log data file brush disk strategyIn order to significantly improve the producer write throughput, regular batch writes are required.Recommended configuration:
# every time producer write 10,000 messages, brush the data to disk log.flush.interval.messages=10000
# 1 seconds per interval, brush data to disk
log.flush.interval.ms= 1000
3. Log Retention policy config
. Run with--info or--debug option to get more log output.
BUILD FAILED Total
time:22.997 secs
Workaround: Vim kafka-0.10.0.0-src/build.gradle file
Add as Downstream
ScalaCompileOptions.metaClass.daemonServer = True
ScalaCompileOptions.metaClass.fork = True
ScalaCompileOptions.metaClass.useAnt = False
ScalaCompileOptions.metaClass.useCompileDaemon = False
Error when executing gradle idea
Failure:
/zookeeper/directories already configured on the 192.168.2.240 to 192.168.2.241 and 192.168.2.242. Then Change the contents of the corresponding myID to 2 and 3.(5) Start zookeeper clusterexecute the Start command on 3 servers, respectively/opt/zookeeper/bin/zkserver.sh startThree, installation configurationKafkaClusteraltogether 5 servers, server IP address:192.168.2.240 Node1192.168.2.241 Node2192.168.2.242 Node3192.168.2.243 Node4192.168.2.244 NODE51. Unzip the installation file to the /opt/
Deployment Readiness
Configure the Log collection system (FLUME+KAFKA), version:
apache-flume-1.8.0-bin.tar.gz
kafka_2.11-0.10.2.0.tgz
Suppose the Ubuntu system environment is deployed in three working nodes:
192.168.0.2
192.168.0.3
192.168.0.4Flume Configuration Instructions
Suppose Flume's working directory is in/usr/local/flume,Monitor a log file (such as/tmp
Replicas replication backup mechanism in Kafka Kafka copy each partition data to multiple servers, any one partition has one leader and multiple follower (can not), the number of backups can be set through the broker configuration file ( Replication-factor parameter configuration specified). Leader handles all Read-write requests, follower needs to be synchronized with leader. Follower and consumer, consume
stream processor, receiving an input stream from one or more topics, outputting an output stream of one or more topics, effectively converting an input stream into an output stream.The Connector API allows you to build and run reusable producers or consumers and connect message topics to applications or data systems.
For example, a relational database connection can get all the changes to a table.
The Kafka client communicates with the server-side co
Kafka is a highly huff and puff distributed subscription message system, which can replace the traditional message queue for decoupled data processing, cache unhandled messages, and has higher throughput, support partition, multiple replicas and redundancy, so it is widely used in large-scale message data processing applications. Kafka supports Java and a variety of other language clients and can be used in
as a stream processor, receive an input stream from one or more topics, output a stream of one or more topics, and effectively convert an input stream into an output stream.The Connector API allows you to build and run reusable producers or consumers, connecting message topics to applications or data systems.
For example, a connection to a relational database can get all changes to a table.
Kafka's client-to-server communication uses a simple, high-performance, language-independent TCP protocol
First, installationKafka relies on zookeeper, so make sure the Zookeeper cluster is installed correctly and functioning properly before installing Kafka. Although the Kafka itself has built-in zookeeper, it is recommended that you deploy zookeeper clusters separately because other frameworks may also need to use zookeeper.(a), kafka:http://mirrors.hust.edu.cn/apache/kaf
:\\tools\\spark-2.0.0-bin-hadoop2.6");
System.setproperty ("Hadoop.home.dir", "d:\\tools\\hadoop-2.6.0");
The company's environmental System.setproperty ("Spark.sql.warehouse.dir", "d:\\developtool\\spark-2.0.0-bin-hadoop2.6"); println ("Success to Init ...") Val url = "Jdbc:postgresql://172.16.12.190:5432/dataex_tmp" val prop = new Propertie S () prop.put ("User", "Postgres") prop.put ("Password", "Issing") Val conf = new sparkconf (). Setappname ("Wordcou NT "). Setmaster (" local ") V
Why are we building this system?Kafka is a messaging system that was originally developed from LinkedIn as the basis for the activity stream of LinkedIn and the Operational Data processing pipeline (pipeline). It is now used by several different types of companies as multiple types of data pipeline and messaging systems. Activity flow data is the most common part of the data that all sites use to make reports about their site usage. activity data incl
the same key Would arrive to the same partition. When consuming from a topic, it's possible to configure a consumer group with multiple consumers. Each consumer in a consumer group would read messages from a unique subset of partitions in each topic they subscribe to, s o Each message was delivered to one consumer in the group, and all messages with the same key arrive at the same consumer. "What's makes Kafka unique is the
though the IP address configuration is used when connecting , the hosts The server host name point to the local address (127.0.0.1) not on the line, according to the truth is OK, but found that the connection is no problem, but the message has not been sent successfully. Check the log carefully and discover:Info:kafka.conn:It is true that the corresponding host name is resolved to the local address, but the port has not changed the corresponding ...
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.