Install and configure Scribe

Source: Internet
Author: User

Scribe Introduction

Scribe is an open-source distributed log collection system of Facebook. It has been widely used in Internet companies. It can collect logs from various log sources and store the logs to a central storage system (such as NFS and distributed file systems) for centralized statistical analysis and processing.

It provides a scalable and highly fault-tolerant solution for "distributed collection, unified processing" of logs. When the network or machine of the central storage system fails, scribe will store the logs locally or in another location. After the central storage system recovers, scribe will re-transmit the stored logs to the central storage system. It is usually used in combination with Hadoop. scribe is used to push logs to HDFS, while Hadoop regularly processes logs through MapReduce jobs.

System Architecture

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image002 "border =" 0 "alt =" clip_image002 "src =" http://www.bkjia.com/uploads/allimg/131228/0131404932-0.jpg "height =" 254 "/>

Common deployment structures:

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image004 "border =" 0 "alt =" clip_image004 "src =" http://www.bkjia.com/uploads/allimg/131228/0131406442-1.jpg "height =" 225 "/>

Preparations

System Environment: Centos 5.8 X86_64

Install the dependency package:

yum -y install gcc gcc-c++ m4 autoconf automake libtool libicu-devel python-devel libevent-devel

Refresh the Dynamic Link Library:

/sbin/ldconfig

Download the installation package:

mkdir -p /data/softwarecd /data/software/wget http://sourceforge.net/projects/boost/files/boost/1.44.0/boost_1_44_0.tar.gz/downloadwget http://archive.apache.org/dist/incubator/thrift/0.4.0-incubating/thrift-0.4.0.tar.gzwget https://github.com/downloads/facebook/scribe/scribe-2.1.tar.gz --no-check-certificate

Note: To use the wget download program on github, you must specify the -- no-check-certificate parameter.

Compile and install boost:

cd /data/software/tar zxvf boost_1_44_0.tar.gzcd boost_1_44_0./bootstrap.sh --prefix=/usr/local/boost./bjam --prefix=/usr/local/boost installecho "/usr/local/boost/lib/" >> /etc/ld.so.conf/sbin/ldconfig

Compile and install thrift:

Thrift requires python to support distutils. core. If you do not see OK, upgrade python-devel to a later version.

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image006 "border =" 0 "alt =" clip_image006 "src =" http://www.bkjia.com/uploads/allimg/131228/0131405611-2.jpg "height =" 76 "/>

cd /data/software/tar zxvf thrift-0.4.0.tar.gzcd thrift-0.4.0./configure --prefix=/usr/local/thrift --with-csharp=no --with-java=no --with-erlang=no --with-perl=no --with-php=no --with-ruby=no --with-py=yes --with-libevent --with-boost=/usr/local/boost/makemake installecho "/usr/local/thrift/lib/" >> /etc/ld.so.conf/sbin/ldconfig

Compile installation thrift-fb303:

cd /data/software/thrift-0.4.0/contrib/fb303/./bootstrap.sh --with-boost=/usr/local/boost/./configure --prefix=/usr/local/fb303 --with-boost=/usr/local/boost/ --with-thriftpath=/usr/local/thrift/makemake install
Install scribe

Compile and install:

cd /data/software/tar zxvf scribe-2.1.tar.gzcd scribe-2.1./bootstrap.sh --prefix=/usr/local/scribe --with-thriftpath=/usr/local/thrift/ --with-fb303path=/usr/local/fb303/ --with-boost=/usr/local/boost/makemake install
Understand the configuration file

The configuration file of Scribe is divided into two parts: global configuration and storage Configuration:

Global configuration items

  • Port: indicates the port on which the scribe server listens. The default value is 0. You can use the command line parameter option-P to specify the port or the configuration file to specify the port. Assign a value to the variable port in the source code.

  • Max_msg_per_second: The default value is 0. If the value of this parameter is 0, it is ignored. As this parameter is rarely associated recently, the max_queue_size parameter will be applied to the maximum number of messages per second. It is used in scribeHandler: throttleDeny.

  • Max_queue_size by byte): Maximum byte of the queue that receives messages. The default value is 5,000,000 bytes. Used in scribeHandler: Log.

  • Check_interval seconds): used to control how long the storage is checked. The default value is 5.

  • New_thread_per_category yes/no): If yes, a new thread is created for each classification scenario. Otherwise, a single thread is created for each storage defined in the configuration file. For prefix storage or default storage, if this parameter is set to "no", all messages matching this category will be processed by a separate storage. Otherwise, a new storage is created for each unique category name. The default value is "yes ".

  • Num_thrift_server_threads: Number of listening threads that receive messages. The default value is 3.

  • Max_conn: Maximum number of links.


Storage Configuration

The Scribe server determines how to write log messages based on the storage type and related parameter settings defined in the configuration. Each storage must specify a message category to handle three types of exceptions.

  • Default storage: the default storage class is used to process any category that cannot be processed by other storage classes. Here there is only one default storage.

  • Prefix storage: If a specified category ends with an asterisk (*), this storage will process all categories starting with a specified prefix.

  • Multiple categories: In a storage definition, you can use 'categories = 'to create multiple categories.

  • In the above three cases, a sub-directory will be created for each unique category in NAS unless new_thread_per_category is set to false ).

Store configuration variables

  • Category: determines which messages are stored for processing.

  • Type: storage type, including file, buffer, network, bucket, thriftfile, null, and mutil.

  • Target_write_size: The default value is 16,384 bytes, which determines the size of the Message Queue that can be increased to a given category before these messages are processed.

  • Max_batch_size: The default value is 1,024,000 bytes, which may not be open-source.) determines the total number of data records that can be processed once in the memory storage queue, and the size of the cached file rotation) A thrift call with much control is feasible.

  • Max_write_interval: The default value is 10 seconds. It determines how long the queue can be used for a given classification before these messages are processed.

  • Must_succeedyes/no): whether it must be successful. If a storage fails to process a message, whether to re-enter the Message Queue queue. If it is set to 'no', and a storage cannot process these messages, the message is discarded. The default value is yes. It is strongly recommended to use cache to indicate a fixed level-2 storage to process failed logs.

Storage Type

File Storage Configuration

File Stores the messages written to a File.

  • File_path: file path. The default value is "/tmp ".

  • Base_filename: name of the basic file. The default value is the category name.

  • Use_hostname_sub_directoryyes/no): Use the Host Name of the server to create a sub-directory. The default value is no.

  • Sub_directory: create a sub-directory with the specified name.

  • Rotate_period: file creation period. The value can be "hourly", "daily", "never", or "name [suffix]." never "is the default value, determine how long to create a new file, with the special suffixes "s", "m", "h", "d", and "w" representing the second default values respectively), minutes, hours, days, and weeks.

  • Rotate_hour: The value ranges from 0 to 23. The default value is 1. If the value of rotate_period is daily, this determines when to create a new file every day.

  • Rotate_minute: The value ranges from 0 to 59. The default value is 15. If rotate_period is set to daily or hourly, it determines how long it will take to create a new file after an hour.

  • Max_size: Maximum File size. The default value is more than 1,000,000,000 bytes, which determines the maximum size of a file before it is created in turn.

  • Write_meta: The value is yes or any other value. false is the default value. If the file is rotated, the last line will contain "scribe_meta", followed by the next file name.

  • Fs_type: file type. Two formats are supported: "std" and "hdfs". "std" is the default value.

  • Chunk_size: The default value is 0. If a block size is specified, no message in the file can span the boundary of the block unless the size of the message exceeds the block size.

  • Add_newlines: The value is 0 or 1. The default value is 0. If it is set to 1, a new row will be written for each subsequent message.

  • Create_symlink: yes or any other. The default value is yes. If yes, a symbolic link is kept pointing to the last written file.

  • Write_stats: yes/no. The default value is yes. Whether to create a scribe_stats file for each storage to maintain the path of file writing.

  • Max_write_size: The default value is 1000000 bytes. File storage will try to refresh data to the File system based on the size of the max_write_size bytes. Max_write_size cannot exceed max_size. A certain number of messages with target_write_size are cached. Therefore, NAS is called to maintain these messages. File_store stores these messages in blocks of at least max_write_size bytes at a time. The last write of File storage is smaller than max_write_size;

  • Write_category: Write to the following category;

  • Rotate_on_reopen: reopen the loop.

  • Network Storage Configuration

  • Network Storage sends messages to other scribe servers. Scribe keeps a persistent link open so that it can send messages. For the sake of error information or if the downstream machine is overloaded, it will re-open a link ). Under normal operation, scribe sends messages in batches based on the number of messages in the current cache waiting for sending. If scribe backs up and caches messages to a local disk, scribe sends messages by Block Based on the cached file size)

  • Remote_host: The name or IP address of the remote host that sends the message.

  • Remote_port: the port on the remote host.

  • Timeout: socket timeout, in MS. The default value is DEFAULT_SOCKET_TIMEOUT_MS. It is set to 5000 in store. h.

  • Use_conn_pool: yes or any other. The default value is false. Whether to use the connection pool to replace the Link opened for each remote host.

  • Smc_service:

  • Service_options:

  • Service_cache_timeout:

  • Ignore_network_error:

  • Dynamic_config_type:

Buffer Storage Configuration

This is the most commonly used store. This store contains two sub-stores, one of which is primary store and the other is secondary store. Logs are first written to the primary store. If the primary store fails, scribe saves the logs to the secondary store. After the primary store recovers performance, then copy the data in the secondary store to the primary store. Secondary store only supports two types of store: file and null.

  • Max_queue_length: 2,000,000 messages by default. If the number of messages in the queue exceeds this value, the buffer storage will switch to the secondary store.

  • Buffer_send_rate: The default value is 1. Determines how many times a group of messages are read from the secondary store and sent to the primary store within each check_interval.

  • Retry_interval: Default Value: 300 seconds. How long will it take to re-send the data to the primary store after the primary store fails to be written.

  • Retry_interval_range: The default value is 60 seconds. A resend time interval will be randomly selected within the specified retry_interval range.

  • Replay_buffer: The value is yes/no. The default value is yes. If it is set to 'no', the buffer store cannot remove messages from the secondary store and send them to the primary store.

Bucket storage Configuration

Bucket Storage Uses each message with a prefix as the key to write to multiple files. Can define a hidden or clear bucket. To define a hidden bucket, you must have a sub-bucket named "bucket". This sub-bucket can be file storage, network storage, or thriftfile storage.

  • Num_buckets: Number of buckets entered by hash. The default value is 1. Messages that cannot be hashed into any bucket will be placed in a special bucket no. 0.

  • Bucket_type: The value is "key_hash", "key_modulo", or "random ".

  • Delimiter: ascii code between 1 and; otherwise, it is ':' by default ':'. The delimiter that appears in the message prefix for the first time will be used as the key in 'hash/modulo. Random does not use this delimiter.

  • Remove_key: Value: yes/no. The default value is no. Whether to remove the key prefix from the message.

  • Bucket_subdir: If a bucket is defined separately, the name of each subdirectory is generated based on the number of buckets.

Null storage Configuration

Ignore all messages to be classified. No parameter.

Mutil storage Configuration

A mutil storage is a storage that forwards all messages to the sub-storage. A mutil storage may have multiple sub-storages named "store0", "store1", and "store2.

  • Report_success: The value is "all" or "any". The default value is "all ". Whether all substores or any substores must successfully record the message, and the report-successful message is recorded in the log message

Thriftfile storage Configuration

Thriftfile storage is similar to file storage. thriftfile Storage Uses Thrift TFileTransport file to store messages.

  • File_path: file path. The default value is "/tmp ".

  • Base_filename: name of the basic file. The default value is the category name.

  • Rotate_period: file creation period. The value can be "hourly", "daily", "never", or "name [suffix]." never "is the default value, determine how long to create a new file, with the special suffixes "s", "m", "h", "d", and "w" representing the second default values respectively), minutes, hours, days, and weeks.

  • Rotate_hour: The value ranges from 0 to 23. The default value is 1. If the value of rotate_period is daily, this determines when to create a new file every day.

  • Rotate_minute: The value ranges from 0 to 59. The default value is 15. If rotate_period is set to daily or hourly, it determines how long it will take to create a new file after an hour.

  • Max_size: Maximum File size. The default value is more than 1,000,000,000 bytes, which determines the maximum size of a file before it is created in turn.

  • Fs_type: file type. Currently, only "std" is supported, and "std" is the default value.

  • Chunk_size: The default value is 0. If a block size is specified, no message in the file can span the boundary of the block unless the size of the message exceeds the block size.

  • Create_symlink: yes or any other. The default value is yes. If yes, a symbolic link is kept pointing to the last written file.

  • Flush_frequency_ms: the unit of milliseconds. If not specified, the default 300 TFileTransport is used. Determine the disk Time Frequency from step to thrift file.

  • Msg_buffer_size: in bytes. If not specified, TFileTransport with the default value 0 will be used. If the value is not zero, the write operation is denied to be greater than this value.

Configure an instance
Mkdir-p/usr/local/scribe/conf stores server scripts, test tools, and configuration files.) mkdir-p/usr/local/scribe/logs stores log files) mkdir-p/usr/local/scribe/data to store data files) cd/usr/local/scribe/conf/

Edit the script for starting the scribe Server

vim start_scribe_service.sh
#!/bin/shexport LANG=de_DE.UTF-8/usr/local/scribe/bin/scribed /usr/local/scribe/conf/scribe_service.conf 1>/usr/local/scribe/logs/scribe_service.log 2>&1 &

Edit the scribe service configuration file

vim scribe_service.conf
port=1463max_msg_per_second=2000000check_interval=3<store>category=defaulttype=buffertarget_write_size=20480max_write_interval=1buffer_send_rate=1retry_interval=30retry_interval_range=10<primary>type=networkremote_host=192.168.1.25remote_port=1463</primary><secondary>type=filefs_type=stdfile_path=/data/scribe/database_filename=logmax_size=3000000</secondary></store>

Edit the scribe_ctrl Tool

vim scribe_ctrl
#!/usr/bin/env pythonimport sysfrom fb303_scripts import *# thrift python packages need to be installedimport thriftfrom thrift import protocol, transportfrom thrift.transport import TTransportfrom thrift.protocol import TBinaryProtocolif (len(sys.argv) > 2):port = int(sys.argv[2])else:port = 1463if (len(sys.argv) > 1):retval = fb303_simple_mgmt.service_ctrl(sys.argv[1],port,trans_factory = TTransport.TFramedTransportFactory(),prot_factory = TBinaryProtocol.TBinaryProtocolFactory())sys.exit(retval)else:print 'Usage: scribe_ctrl command [port]'print ' commands: stop counters status version name alive'sys.exit(2)

Edit Test Tool

vim scribe_cat
#!/usr/bin/pythonimport sysfrom scribe import scribefrom thrift.transport import TTransport, TSocketfrom thrift.protocol import TBinaryProtocolif len(sys.argv) == 2:category = sys.argv[1]host = '127.0.0.1'port = 1463elif len(sys.argv) == 4 and sys.argv[1] == '-h':category = sys.argv[3]host_port = sys.argv[2].split(':')host = host_port[0]if len(host_port) > 1:port = int(host_port[1])else:port = 1463else:sys.exit('usage (message is stdin): scribe_cat [-h host[:port]] category')log_entry = scribe.LogEntry(dict(category=category, message=sys.stdin.read()))socket = TSocket.TSocket(host=host, port=port)transport = TTransport.TFramedTransport(socket)protocol = TBinaryProtocol.TBinaryProtocol(trans=transport, strictRead=False, strictWrite=False)client = scribe.Client(iprot=protocol, oprot=protocol)transport.open()result = client.Log(messages=[log_entry])transport.close()if result == scribe.ResultCode.OK:sys.exit()elif result == scribe.ResultCode.TRY_LATER:print >> sys.stderr, "TRY_LATER"sys.exit(84) # 'T'else:sys.exit("Unknown error code.")

Grant execution permission

chmod +x start_scribe_service.shchmod +x scribe_ctrlchmod +x scribe_cat

Start the scribe Service

./start_scribe_service.sh

View service startup ports

netstat -nutpl|grep 1463

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image008 "border =" 0 "alt =" clip_image008 "src =" http://www.bkjia.com/uploads/allimg/131228/0131403553-3.jpg "height =" 30 "/>

View server logs

tail -f scribe_service.log

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image010 "border =" 0 "alt =" clip_image010 "src =" http://www.bkjia.com/uploads/allimg/131228/013140II-4.jpg "height =" 103 "/>

Control tools:

cd /usr/local/scribe/conf/

Usage:

scribe_ctrl command [port]

The available command parameters are as follows:

  • Status-if the server runs normally, 'alive' is returned'

  • Version-returns the version of the current Scribe server,

  • Alive-return the server running time

  • Stop-stop the Scribe Server

  • Reload-reload the Scribe configuration file

  • Counters-returns the following statistics (if not zero ):

  • Received good: returns the number of messages received after the Scribe server is started.

  • Received ed bad: number of illegal messages received

  • Sent: number of messages sent to another Scribe Server

  • Denied for queue size: Number of forbidden requests due to full information queue

  • Denied for rate: Number of requests prohibited due to speed restrictions

  • Retries

  • Requeue: the number of times a message is sent to a Store by Scribe (if must_succeed is enabled ).

  • Lost: Number of unrecorded messages. (Recommended Configuration: Use Buffer Stores to avoid information loss)

  • Received blank category: the number of messages that have not been received.

The usage is illustrated as follows:

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image011 "border =" 0 "alt =" clip_image011 "src =" http://www.bkjia.com/uploads/allimg/131228/01314035I-5.png "height =" 138 "/>

Test the scribe function.

The test command is as follows:

cd /usr/local/scribe/conf/echo "hello world" | ./scribe_cat testcd /usr/local/scribe/data/

The test results are shown as follows:

650) this. width = 650; "style =" border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px "title =" clip_image012 "border =" 0 "alt =" clip_image012 "src =" http://www.bkjia.com/uploads/allimg/131228/013140A55-6.png "height =" 166 "/>

Here, the installation and configuration methods of scribe are basically complete. If you have any questions, please contact me. Pai_^

Step on how to apply scribe to your business: http://cyr520.blog.51cto.com/714067/1265181

This article from the "small Cui's growth path" blog, please be sure to keep this source http://cyr520.blog.51cto.com/714067/1209485

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.