My Sina Weibo:Http://weibo.com/freshairbrucewoo.
You are welcome to exchange ideas and improve your technology together.
The following is my PPT on the distributed log collection system shared within the company. I want to share it with you. I hope you can get started with the users you need!
1. Distributed log Collection System: Background
Many companies' platforms generate a large number of logs (generally stream data, such as search engine PVs and queries) every day. to process these logs, a specific log system is required. Generally, these systems must have the following features:
(1) build bridges between application systems and analysis systems and decouple them;
(2) supports near-real-time online analysis systems and offline analysis systems similar to hadoop;
(3) high scalability. That is, when the data volume increases, you can scale horizontally by adding nodes.
2. Distributed log Collection System: Facebook scribe
(1) scribe introduction and system architecture
(2) scribe Technical Architecture
(3) scribe deployment structure
(4) main functions and usage of scribe
(5) specific application instance of scribe
(6) scribe Extension
(7) scribe research experience
3. Scribe Introduction
Scribe is an open-source log collection system of Facebook. It has been widely used in Facebook. Scribe is based on a thrift service that uses a non-blocking C ++ server. It can collect logs from various log sources and store the logs to a central storage system (such as NFS and distributed file systems) for centralized statistical analysis and processing. It provides a scalable and highly fault-tolerant solution for "distributed collection, unified processing" of logs.
4. Scribe System Architecture
As shown in: scribe collects data from various data sources, puts it on a shared queue, and then pushes it to the back-end central storage system. When the central storage system fails, scribe can temporarily write logs to local files. After the central storage system recovers performance, scribe will upload local logs to the central storage system.
5. Technical Architecture of scribe
As shown in: the bottom layer data communication framework of the scribe server is thrift, which is also open source and widely used by Facebook. It also uses the quasi-Standard C ++ library boost, mainly using shared pointers and file-related functions. Thrift also uses the libevent Development Library and socket programming technology.
6. Scribe deployment structure
7. Main Functions of scribe
1. Support for multiple Storage types: 7 and scalable
2. Automatic log splitting: split by file size and time
3. Flexible clients:
(1) Support for multiple common languages (supported by Thrift );
(2) can be integrated with the application system; can be used as an independent client
4. Supports log classification (Facebook has hundreds of log categories)
5. Other functions
(1) Connection Pool
(2) Flexible log cache size
(3) multithreading (Message Queue)
(4) logs can be forwarded between the scribe server.
6. the above functions can be flexibly configured through the configuration file.
8. Scribe usage plan
(1) integration with the application system that generates log files
Scribe can be well integrated with various application systems because it provides development kits for almost all development languages.
(2) The application system generates log files locally and uses an independent client.ProgramSimilarly, independent clients can also be developed in various languages. We use python to develop clients.
9. Specific application instance of scribe
1. Facebook must be widely used. It is mainly used to process Facebook-level logs. Once a new log category is generated, scribe will automatically process the logs. (Facebook has hundreds of log categories ).
2. Twitter: A Distributed Real-time statistics system, Rainbird, uses scribe
3. My company:
(1 )*****
(2 )*****
(3 )*****
(4 )*****
(5 )*****
(6 )*****
4. Miscellaneous
10. Scribe Extension: Problems
Although the scribe system is so excellent, there are still some shortcomings and problems. We can expand the scribe system to address the existing problems. We found that scribe has the following main problems:
1. single point of failure
Spof exists in three places:
(1) Central Server
(2) local server
(3) log Collection Client Program
2. Log loss problems
Log loss may occur when log files are split.
3. Historical log collection problems
4. No prompt notification is sent when the scribe server fails
11. Scribe Extension: Problem Solution
The following solutions are provided for the problems we have raised:
1. single point of failure (spof) of the central server
You can deploy multiple central servers, and then the local server can automatically switch between these servers through the configuration file.
2. Other problems are solved through the python client we have written.
The Python client is developed based on an open-source project. Because the open-source Python client function is very simple, it only tracks a log file and imports the data from the log file to the local server of scribe.
12. Scribe Extension: Python Client
The Python client we developed mainly implements the following functions:
1. Solve single point of failure (spof) on the local scribe Server
We can configure multiple local scribe servers (through configuration file configuration, which is quite flexible). The Python script will automatically switch based on the configured servers (automatically switch after a scribe fails, if the local scribe server fails to be restarted, it will automatically switch back.
2. Solve the log loss problem
The open-source Python client scans for changes in log files at regular intervals. Log switching during this period may cause log loss. We also use this method to detect log files. However, when log splitting occurs, we will check again whether the log files that are split have been collected.
3. Historical log collection
If logs have been generated before we run the python client, this part of log collection is also a new feature.
4. Solve your own spof Problems
It is not ruled out that our Python client will also be suspended. When we start the next time, we need to solve the problem of ensuring that the collected logs are not duplicated and will not be lost. Our solution is to serialize the various information of the collected log files (mainly the location where the collected log files are located)
5. How to ensure that log files are collected in the order of log generation
The log generation sequence is related to the creation time of their files, which can be achieved through this.
6. Timely notification mechanism
In order to promptly notify the relevant personnel of the information that has been sent to the scrib server, we have developed an email notification mechanism, that is, when a local scribe server fails, it will trigger an email.
13. Scribe research experience
How can we learn from our work?
1. Each person is responsible for developing limited content in the company. How can we start to study and learn more from the content we develop?
2. Example of scribe study!
14. Conclusion: some of the above content is coming to the Internet. I have added some of my understanding and hope to help people who need it!