I just spent 3 hours last night reading the Journal: a unified concept of real-time data that every software engineer should know about.
Today, Kafka is running in a Docker container, several on GitHub, but it's all too complicated.
I'll write the simplest Python demo experience for you: Https://github.com/xuqinghan/docker-kafka
Compared with the deployment of Taiga last week, Kafka is worthy of everyone's handwriting, basically no pits, simple record:
First, Docker-compose.yml.
Version: ' 3.1 ' Services: Zoo: image:zookeeper restart:always hostname:zookeeper volumes: #-zookeeper/conf:/conf -./zookeeper/data:/data -./zookeeper/datalog:/datalog Kafka: build:kafka/ restart:always volumes: -./kafka/config:/kafka/config Ports: -"9,092:9,092 " depends_on: -Zoo Producer: stdin_open:true tty:true restart:always Build: ./app Volumes: -./app:/app depends_on: -Zoo -Kafka command: [' Python3 ', ' producer.py '] consumer: stdin_open:true tty:true build:./app restart:always Volumes: -./app:/app depends_on: -Zoo -Kafka command: [' Python3 ', ' consumer.py ']
1 A total of 4 containers, 1 zookeeper (Save the log data, similar to the backend in celery, actually more like Git), 1 Kafka (similar broker), and then the production, the consumer each
Say it separately.
1zookeeper
This has an official image: https://hub.docker.com/_/zookeeper/. Just use it, you don't have to write build.
But pay attention to look at the official website Dockerfile,./data and/datalog location, and some articles say, not the same, not/var/... In
Local build folder to hang/data and/datalog
2kafka
According to Kafka's official website tutorial Https://kafka.apache.org/quickstart, the installation is very simple, so write a simple dockerfile
fromJava:openjdk-8-jreLABELAuthor= "Xuqinghan"LABELPurpose = ' Kafka '# ENVdebian_frontend noninteractiveRUNapt-get update && apt-get install-y wgetRUNWget-q http://mirrors.hust.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgzRUNTar-xzf kafka_2.11-1.0.0. Tgz-c /HomeRUNmv/home/kafka_2.11-1.0.0/kafkaWorkdir/kafka#CMD["/bin/bash"]CMD["/kafka/bin/kafka-server-start.sh", "/kafka/config/server.properties"]
Pay attention not to leap forward, do not change Openjdk-8-jre to Openjdk-9-jre, will error.
Then local also download Kafka installation package, only 47M, solve the/config directory, change the configuration outside, and then in the Dockercompose hang in
It's mainly in server.properties.
for ' 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002 ' for all Kafka Znodes.zookeeper.connect=zoo:2181 for connecting to zookeeperzookeeper.connection.timeout.ms=6000
Note Because in net bridge created by Dockercompose, it is connected to the name of zookeeper in Dockercompose.yml (I am the zoo), not localhost
3 Producer and Consumer
Dockerfile with one on it, producer.py and consumer.py also put a folder, just in the dockercompose.yml respectively from 1 service can be
Dockerfile
from Python label author= "Xuqinghan" label Purpose = ' Kafka ' run apt update# run apt install-y nginx supervisor run PIP3 install setuptools run PIP3 install Kafka-python env pythonioencoding=utf-8run mkdir-p/app workdir /app cmd ["/bin/bash"]
Only in order to test Kafka, so unusually simple, only installed Kafka-python, there is the article said this data loss, to use C + + version, as a new, no need to care about this, use it.
And then
producer.py
fromKafkaImportKafkaproducerImport Time#Connect to KafkaProducer = Kafkaproducer (bootstrap_servers='kafka:9092')defemit (): forIinchRange (100): Print(f'send Message {i}') Str_res= f'{i}'Producer.send ('Foobar', Str_res.encode ()) Time.sleep (1)if __name__=='__main__': Emit ()
consumer.py
fromKafkaImportKafkaconsumer, Topicpartitionconsumer= Kafkaconsumer (bootstrap_servers='kafka:9092')#consumer.assign ([Topicpartition (' Foobar ', 1)])Consumer.subscribe ('Foobar')Print('Consumer connected') forMsginchConsumer:Print(msg) Res=Msg.value.decode ()Print(f'received data:{res}')
Kafka if not configured, topic is created dynamically by default and does not need to be created with the SH script in Kafka.
Note Only bytes byte codes can be sent. JSON and the like in the document also has an example http://kafka-python.readthedocs.io/en/master/, skip
The final result:
Summarize
From last night to today to fill the log, stream processing knowledge, and then do it. The total feeling. It's not just a git on the system. Producer Push commit, consumer pull in there.
Now it seems that everything is recorded (the change process) and everything is scripted. Everything can be played/replayed .
Development: Code changes with git pipe up, the code warehouse includes the entire commit change process;
Deployment: There is a Docker script, CI/CD system like a DSL script, the deployment process of all the details are recorded;
Runtime: There are Kafka, the original not much record of the event, the user operation, all to write down. Various sub-system database table instead of free, if modified, data lost, re-play the log, re-production again OK. This is true for many applications, Li Jufu.
-If the Dwarfish software system I write can do this, then the country and the Internet giant will certainly be able to record everyone's behavior.
The moral and social outlook of the future must be different from the present.
Kafka+docker+python