Rsync collects binary log files

Source: Internet
Author: User
Tags hadoop fs

Please indicate the source when this log is reprinted; otherwise, you will be held accountable!

  1. Rsync Introduction

Rsync is an application software in Unix. It can synchronously update files and directories of two computers and use differential encoding to reduce data transmission. An important feature in rsync that is not seen in most similar programs or protocols is that the image only needs to be sent once for each target. Rsync can copy/display directory attributes, copy files, and perform selective compression and recursive copy.

In the resident mode (daemon mode), rsync listens to TCP port 873 by default, using native rsync transmission protocol or remote shell such as RSH or SSH servo files. The terminal to be backed up is the server, and the backup terminal is the client. Both ends must start the rsync service. The server must write the configuration file rsync. conf.

Its features are as follows:

  1. Images can be used to save the entire directory tree and file system.
  2. It is easy to maintain the permissions, time, and soft links of the original file.
  3. Installation without special permissions.
  4. Fast: During the first synchronization, rsync copies all the content, but transfers only modified files in the next synchronization. Rsync can compress and decompress data during data transmission, so it can use less bandwidth.
  5. Security: You can use SCP, ssh, and other methods to transmit files. Of course, you can also use a direct socket connection.
  6. Supports anonymous transmission to facilitate website images.

?

?

  1. Rsync binary log collection solution
  1. Requirement Analysis
    1. Log category

    Operating System: Windows logs and Linux logs

    Append method: a large number of individual files are generated in a single directory in a short time, and an append is added to a file for a long time. The two types are mixed.

    1. Collection time requirements

    Collection Cycle: Real-time, short cycle (within one day), daily, long cycle (greater than one day ).

  2. Collection scheme
    1. Collection Architecture

    The log collection process consists of three parts: Data Source processing, transfer station processing and destination. The process is as follows: Use rsync to back up files from the data source to the transfer station, and then use the hadoop upload function to transfer files from the transfer station to the destination.

    1. Data Source Processing

    The data source (rsync server) is divided into two types: The append directory (taking tcpdump as an example) and the append file (taking the system log as an example). The append file is divided into two types, readable files (taking Linux logs as an example) and unreadable files (taking windows logs as an example)

    ?

    Data Source processing is generally divided into two parts: rsync server installation configuration and necessary file transfer.

  3. Server Configuration of rsync

    Rsync servers are mainly divided into Windows and Linux. for installation and configuration of Windows Server, see Windows. For configuration of Linux server, see Linux.

  4. File Transfer

    When the file cannot be directly read by users due to permission issues (such as Windows logs), you must first transfer the file to a path that can be read by users. copy the file. bat meets this requirement. When files cannot be directly backed up because they are being read or written (for example, tcpdump traffic files), MV. Sh is required to meet this requirement.

    1. Transfer Station handling

    The processing of the intermediate station (rsync client) is mainly divided into two parts: backing up the file to the local and uploading the local backup file to the destination (hadoop HDFS ). During the upload process, there are two types of files: files in the append directory form can be directly uploaded; for files in the append file form, because HDFS does not have the UPDATE function, you need to delete the expired files first, upload the new file.

  5. Client configuration of rsync

    The Rsync client is Linux and generally does not require installation. After the rsync service is started, run the command line to input the command to implement the file backup function. For details, see Linux-client.

  6. Upload files to the destination

    ?

    You must delete the file before uploading it. For more information, see upload.

    1. Destination Processing

    HDFS file system with the destination of hadoop. It is a Linux end and hadoop service needs to be started.

  7. Configuration item

    When using rsync to collect binary log files, you need to configure several configuration items. The configuration items required for the data source mainly include the storage path of the log files. The work of the transfer station is divided into two parts: backup to local and upload to HDFS, so the configuration items are also divided into the storage address of the client backup server log and the address uploaded to HDFS, you also need to configure the service execution cycle and time. As for the destination, you only need to install and start the hadoop service, so there is no configuration item.

    ?

    1. Appendix
  8. Appendix 1: configure the Windows Server
    1. Install rsync on Windows

    Install rsync for Windows Server

    Open the "Start a Unix bash shell" program in the Start Program:

    Enter a terminal similar to CMD and enter the following command:

    $/Bin/activate-user.sh

    Input L

    Enter Administrator

    Click Back to end

    1. Start opensshd

    Open "Control Panel" --> "Administrative Tools" --> "service ":

    Find an opensshd service and start it.

    1. Configure the rsyncd. conf file

    Edit C:/installation directory/rsyncd. conf.

    Use chroot = false

    Strict modes = false

    Host allow = 172.16.28.10

    Log File =/var/log/rsyncd. log # log storage path

    PID file =/var/run/rsyncd. PID

    Lock file =/var/run/rsyncd. Lock

    Max connections = 2

    Address = 172.16.27.234 # Server IP Address

    Port = 873 # server backup path Port

    Syslog facility = local3

    Timeout = 300

    ?

    # Module Definitions

    # Remember cygwin naming conventions: C: \ work becomes/cygwin/C/work

    [Test] # Module name

    Path =/cygdrive/C/copy (this path indicates C:/copy) # files to be backed up (because Windows Log Files cannot be backed up directly, you need to copy them to the copy folder in advance)

    Comment = This is a Windows test program

    Ignore errors

    Read Only = Yes

    List = No

    Transfer logging = Yes

    Auth user = demo

    Secret file =/etc/rsyncd. Secrets

    ?

    1. Start rsync

    Open "Control Panel" --> "Administrative Tools" --> "service ":

    Locate the rsyncserver service and start it.

    1. Configure the rsyncd. Secrets File

    Edit the/etc/rsyncd/secrets file with the following content:

    User name: Password

    ?

    So far, the Windows server has been configured.

    ?

  9. Appendix 2: configure the Linux Server
    1. Configure the rsyncd. conf file

    Create the/etc/rsyncd. conf file, which is similar to the rsyncd. conf configuration file on Windows.

    1. Configure the rsyncd. Secrets File

    Similar to Windows, this step creates a file in the format of User name: password. Note that in LinuxPermissionIt is defined as 600. Otherwise, the backup fails. For security reasons, only the owner of the file can read and write the file, and other users have no permissions.

    1. Start rsync

    Run #/usr/bin/rsync -- daemon -- Config =/etc/rsyncd. conf # -- config to specify the location of rsyncd. conf. If it is in/etc, do not write

    ?

    So far, the Linux server configuration is complete.

    ?

  10. Appendix: 3: Copy. bat

    Copy. bat is used to copy unreadable windows logs to the copy folder that can be directly read, and add the task to the task scheduler in the control panel so that the task can be automatically executed periodically (copy. the specific content of BAT is xcopy "C: \ windows \ system32 \ winevt \ logs" "C: \ Copy"/e/I/D/H/R/y)

  11. Appendix 4: Overview of MV. Sh

    This script is used to remove collected tcpdump files. Usage:

    ./Mv. sh/Source Path/target path

    The Code is as follows:

    #! /Bin/bash

    # Filename: scan_mv.sh

    While true

    Do

    All_file =$ (ls $1-T)

    Count = 1

    For I in $ all_file

    Do

    If [$ count-EQ 1]; then

    Count = 2

    Else

    New_file = $1 $ I

    MV $ new_file $2

    Fi

    Done

    Sleep 1

    Done

  12. Appendix 5: configure the Linux Client
    1. Configure the rsyncd. Pas File

    The custom path. Take/etc/rsyncd. PAS as an example. The file content is the user's password. Note that the password must be consistent with the password in rsyncd. Secrets.

    1. Enable rsync

    Use rsyncd to back up windows logs

    Take this operation command as an example: rsync-Av -- password-file =/etc/rsyncd. Pas [email protected]: Test/home/wx/desktop/winevt/

    If the command is automatically executed, add the command to the/etc/crontab file in the format to make it run automatically on a regular basis, as shown in figure

    50 19 *** root rsync-Av -- password-file =/etc/rsyncd. Pas [email protected]: Test/home/wx/desktop/winevt/

  13. Rsync command

    The Command Format of rsync can be as follows:

    Rsync [Option]... SRC dest

    Rsync [Option]... SRC [[email protected] HOST: dest

    Rsync [Option]... [[email protected] HOST: SRC dest

    Rsync [Option]... [[email protected] HOST: SRC dest

    Rsync [Option]... SRC [[email protected] HOST: dest

    Rsync [Option]... rsync: // [[email protected] host [: Port]/src [DEST]

    For the preceding six command formats, rsync has six different working modes:

    1) copy a local file. This mode is enabled when the SRC and des paths do not contain a single colon ":" separator.

    2) use a remote shell program (such as RSH and SSH) to copy the content of the local machine to the remote machine. This mode is enabled when the DST path address contains a single colon ":" separator.

    3) use a remote shell program (such as RSH and SSH) to copy the contents of the remote machine to the local machine. This mode is enabled when the SRC address path contains a single colon ":" separator.

    4) copy files from the remote rsync server to the local machine. This mode is enabled when the SRC path information contains the ":" separator.

    5) copy files from the local machine to the remote rsync server. This mode is enabled when the DST path information contains the ":" separator.

    6) List of remote machine files. This is similar to rsync transmission, but you only need to omit the local machine information in the command.

    ?

  14. Parameter description

    -V, -- verbose detailed mode output

    -Q, -- Quiet simplified output mode

    -C, -- checksum: enable the verification switch to force file transfer verification

    -A, -- Archive mode, indicating that the file is transmitted recursively and all file attributes are kept, equal to-rlptgod.

    -R, -- Recursive processes subdirectories in recursive Mode

    -R, -- relative uses relative path information

    -B, -- backup creates a backup, that is, if the object already has the same file name, rename the old file ~ Filename. You can use the -- suffix option to specify different backup file prefixes.

    -- Backup-Dir: backs up files (for example ~ Filename) is stored in the directory.

    -Suffix = suffix defines the backup file prefix.

    -U, -- Update only performs updates, that is, skipping all files that already exist in DST and whose file time is later than the time to be backed up. (Do not overwrite the updated file)

    -L, -- links retains soft links

    -L, -- copy-links: process soft links like regular files

    -- Copy-unsafe-links: only copies links other than the SRC path directory tree.

    -- Safe-links ignores links other than the SRC path directory tree

    -H, -- hard-links

    -P, -- perms to keep File Permissions

    -O, -- owner keeps file owner information

    -G, -- group: Keep file group information

    -D, -- devices: Keep Device File Information

    -T, -- times preserve the file time information

    -S, -- sparse performs special processing on sparse files to save DST Space

    -N, -- dry-run which files will be transmitted

    -W, -- whole-file: Copy files without incremental Detection

    -X, -- one-file-system do not span the boundaries of the file system

    -B, -- block-size = size indicates the block size used by the algorithm. The default value is 700 bytes.

    -E, -- RSH = command specifies that RSH and SSH are used for data synchronization.

    -- Rsync-Path = path specifies the path of the rsync command on the remote server

    -C, -- CVS-exclude automatically ignores files in the same way as CVS to exclude files that do not want to be transmitted

    -- Existing only updates the files that already exist in DST, instead of backing up the new files.

    -- Delete: delete the files that are not in the SRC file in DST.

    -- Delete-excluded: Delete Files specified by this option at the receiving end.

    -- Delete-after: delete after transmission

    -- Ignore-errors is deleted when an IO error occurs in a timely manner.

    -- Max-delete = num: a maximum of num files can be deleted.

    -- Partial retains the files that are not completely transferred for any reason to speed up subsequent re-transmission.

    -- Force directory deletion, even if not empty

    -- Numeric-IDs does not match the number user and group ID with the user name and group name.

    -- Timeout = time IP timeout, in seconds

    -I, -- ignore-times do not skip files with the same time and length

    -- Size-only: when determining whether to back up a file, only check the file size, regardless of the file time

    -- Modify-window = num determines whether the timestamp window of the file is used at the same time. The default value is 0.

    -T -- temp-Dir = dir create a temporary file in Dir

    -- Compare-Dest = dir: Compare the files in Dir to determine whether to back up data.

    -P is equivalent to -- partial

    -- Progress displays the backup process

    -Z, -- compress compresses backup files during transmission

    -- Exclude = pattern specifies to exclude file modes that do not need to be transmitted

    -- Include = pattern specifies the file mode to be transmitted without exclusion

    -- Exclude-from = file: exclude files in the specified mode in the file.

    -- Include-from = file: files with the specified file pattern matching are not excluded.

    -- Version: prints version information.

    -- Address: bind to a specific address

    -- Config = file: specify other configuration files. The default rsyncd. conf file is not used.

    -- Port = port specify other rsync service ports

    -- Blocking-io: block Io for remote shell

    -Stats indicates the transmission status of some files.

    -- SS actual transmission process during transmission

    -- Log-format = format specifies the log file format

    -- Password-file = file get password from File

    -- Bwlimit = kbps limits I/O bandwidth, Kbytes per second

    -H, -- help: displays help information

    ?

    ?

  15. Appendix 6: Upload to destination

    Hadoop FS-Put/home/wx/desktop/winevt HDFS: // master: 8020/

    Hadoop FS-ls HDFS: // master: 8020/winevt

    To delete an object first, run

    Hadoop FS-rm-r HDFS: // master: 8020/winevt

    If the command is automatically executed, add the command to the/etc/crontab file in the format to make it run automatically on a regular basis, as shown in figure

    52 19 *** root hadoop FS-rm-r HDFS: // master: 8020/winevt

    54 19 *** root hadoop FS-Put/home/wx/desktop/winevt HDFS: // master: 8020/

  16. Back up tcpdump file instances
    1. Save tcpdump traffic File

    Save the traffic file generated by tcpdump in a specified folder:

    Switch to the copy folder and enter the tcpdump-C 10-W tcpdump command to save the traffic file named tcpdump in the copy folder, and then run the shell script mV. sh transfers the completed files to the tcpdump folder (. /mv. SH/home/wx/desktop/copy/home/wx/desktop/tcpdump ).

    1. Upload a tcpdump file to HDFS

    Add the command to the/etc/crontab file according to the format to make it run automatically on a regular basis, as shown in

    54 19 *** root hadoop FS-Put/home/wx/desktop/tcpdump HDFS: // master: 8020/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.