Please indicate the source when this log is reprinted; otherwise, you will be held accountable!
Rsync is an application software in Unix. It can synchronously update files and directories of two computers and use differential encoding to reduce data transmission. An important feature in rsync that is not seen in most similar programs or protocols is that the image only needs to be sent once for each target. Rsync can copy/display directory attributes, copy files, and perform selective compression and recursive copy.
In the resident mode (daemon mode), rsync listens to TCP port 873 by default, using native rsync transmission protocol or remote shell such as RSH or SSH servo files. The terminal to be backed up is the server, and the backup terminal is the client. Both ends must start the rsync service. The server must write the configuration file rsync. conf.
- Requirement Analysis
- Log category
Operating System: Windows logs and Linux logs
Append method: a large number of individual files are generated in a single directory in a short time, and an append is added to a file for a long time. The two types are mixed.
- Collection time requirements
Collection Cycle: Real-time, short cycle (within one day), daily, long cycle (greater than one day ).
- Collection scheme
- Collection Architecture
The log collection process consists of three parts: Data Source processing, transfer station processing and destination. The process is as follows: Use rsync to back up files from the data source to the transfer station, and then use the hadoop upload function to transfer files from the transfer station to the destination.
- Data Source Processing
The data source (rsync server) is divided into two types: The append directory (taking tcpdump as an example) and the append file (taking the system log as an example). The append file is divided into two types, readable files (taking Linux logs as an example) and unreadable files (taking windows logs as an example)
?
Data Source processing is generally divided into two parts: rsync server installation configuration and necessary file transfer.
- Server Configuration of rsync
Rsync servers are mainly divided into Windows and Linux. for installation and configuration of Windows Server, see Windows. For configuration of Linux server, see Linux.
- File Transfer
When the file cannot be directly read by users due to permission issues (such as Windows logs), you must first transfer the file to a path that can be read by users. copy the file. bat meets this requirement. When files cannot be directly backed up because they are being read or written (for example, tcpdump traffic files), MV. Sh is required to meet this requirement.
- Transfer Station handling
The processing of the intermediate station (rsync client) is mainly divided into two parts: backing up the file to the local and uploading the local backup file to the destination (hadoop HDFS ). During the upload process, there are two types of files: files in the append directory form can be directly uploaded; for files in the append file form, because HDFS does not have the UPDATE function, you need to delete the expired files first, upload the new file.
- Client configuration of rsync
The Rsync client is Linux and generally does not require installation. After the rsync service is started, run the command line to input the command to implement the file backup function. For details, see Linux-client.
- Upload files to the destination
?
You must delete the file before uploading it. For more information, see upload.
- Destination Processing
HDFS file system with the destination of hadoop. It is a Linux end and hadoop service needs to be started.
- Configuration item
When using rsync to collect binary log files, you need to configure several configuration items. The configuration items required for the data source mainly include the storage path of the log files. The work of the transfer station is divided into two parts: backup to local and upload to HDFS, so the configuration items are also divided into the storage address of the client backup server log and the address uploaded to HDFS, you also need to configure the service execution cycle and time. As for the destination, you only need to install and start the hadoop service, so there is no configuration item.
?
- Appendix
- Appendix 1: configure the Windows Server
- Install rsync on Windows
Install rsync for Windows Server
Open the "Start a Unix bash shell" program in the Start Program:
Enter a terminal similar to CMD and enter the following command:
$/Bin/activate-user.sh
Input L
Enter Administrator
Click Back to end
- Start opensshd
Open "Control Panel" --> "Administrative Tools" --> "service ":
Find an opensshd service and start it.
- Configure the rsyncd. conf file
Edit C:/installation directory/rsyncd. conf.
Use chroot = false
Strict modes = false
Host allow = 172.16.28.10
Log File =/var/log/rsyncd. log # log storage path
PID file =/var/run/rsyncd. PID
Lock file =/var/run/rsyncd. Lock
Max connections = 2
Address = 172.16.27.234 # Server IP Address
Port = 873 # server backup path Port
Syslog facility = local3
Timeout = 300
?
# Module Definitions
# Remember cygwin naming conventions: C: \ work becomes/cygwin/C/work
[Test] # Module name
Path =/cygdrive/C/copy (this path indicates C:/copy) # files to be backed up (because Windows Log Files cannot be backed up directly, you need to copy them to the copy folder in advance)
Comment = This is a Windows test program
Ignore errors
Read Only = Yes
List = No
Transfer logging = Yes
Auth user = demo
Secret file =/etc/rsyncd. Secrets
?
- Start rsync
Open "Control Panel" --> "Administrative Tools" --> "service ":
Locate the rsyncserver service and start it.
- Configure the rsyncd. Secrets File
Edit the/etc/rsyncd/secrets file with the following content:
User name: Password
?
So far, the Windows server has been configured.
?
- Appendix 2: configure the Linux Server
- Configure the rsyncd. conf file
Create the/etc/rsyncd. conf file, which is similar to the rsyncd. conf configuration file on Windows.
- Configure the rsyncd. Secrets File
Similar to Windows, this step creates a file in the format of User name: password. Note that in LinuxPermissionIt is defined as 600. Otherwise, the backup fails. For security reasons, only the owner of the file can read and write the file, and other users have no permissions.
- Start rsync
Run #/usr/bin/rsync -- daemon -- Config =/etc/rsyncd. conf # -- config to specify the location of rsyncd. conf. If it is in/etc, do not write
?
So far, the Linux server configuration is complete.
?
- Appendix: 3: Copy. bat
Copy. bat is used to copy unreadable windows logs to the copy folder that can be directly read, and add the task to the task scheduler in the control panel so that the task can be automatically executed periodically (copy. the specific content of BAT is xcopy "C: \ windows \ system32 \ winevt \ logs" "C: \ Copy"/e/I/D/H/R/y)
- Appendix 4: Overview of MV. Sh
This script is used to remove collected tcpdump files. Usage:
./Mv. sh/Source Path/target path
The Code is as follows:
#! /Bin/bash
# Filename: scan_mv.sh
While true
Do
All_file =$ (ls $1-T)
Count = 1
For I in $ all_file
Do
If [$ count-EQ 1]; then
Count = 2
Else
New_file = $1 $ I
MV $ new_file $2
Fi
Done
Sleep 1
Done
- Appendix 5: configure the Linux Client
- Configure the rsyncd. Pas File
The custom path. Take/etc/rsyncd. PAS as an example. The file content is the user's password. Note that the password must be consistent with the password in rsyncd. Secrets.
- Enable rsync
Use rsyncd to back up windows logs
Take this operation command as an example: rsync-Av -- password-file =/etc/rsyncd. Pas [email protected]: Test/home/wx/desktop/winevt/
If the command is automatically executed, add the command to the/etc/crontab file in the format to make it run automatically on a regular basis, as shown in figure
50 19 *** root rsync-Av -- password-file =/etc/rsyncd. Pas [email protected]: Test/home/wx/desktop/winevt/
- Rsync command
The Command Format of rsync can be as follows:
Rsync [Option]... SRC dest
Rsync [Option]... SRC [[email protected] HOST: dest
Rsync [Option]... [[email protected] HOST: SRC dest
Rsync [Option]... [[email protected] HOST: SRC dest
Rsync [Option]... SRC [[email protected] HOST: dest
Rsync [Option]... rsync: // [[email protected] host [: Port]/src [DEST]
For the preceding six command formats, rsync has six different working modes:
1) copy a local file. This mode is enabled when the SRC and des paths do not contain a single colon ":" separator.
2) use a remote shell program (such as RSH and SSH) to copy the content of the local machine to the remote machine. This mode is enabled when the DST path address contains a single colon ":" separator.
3) use a remote shell program (such as RSH and SSH) to copy the contents of the remote machine to the local machine. This mode is enabled when the SRC address path contains a single colon ":" separator.
4) copy files from the remote rsync server to the local machine. This mode is enabled when the SRC path information contains the ":" separator.
5) copy files from the local machine to the remote rsync server. This mode is enabled when the DST path information contains the ":" separator.
6) List of remote machine files. This is similar to rsync transmission, but you only need to omit the local machine information in the command.
?
- Parameter description
-V, -- verbose detailed mode output
-Q, -- Quiet simplified output mode
-C, -- checksum: enable the verification switch to force file transfer verification
-A, -- Archive mode, indicating that the file is transmitted recursively and all file attributes are kept, equal to-rlptgod.
-R, -- Recursive processes subdirectories in recursive Mode
-R, -- relative uses relative path information
-B, -- backup creates a backup, that is, if the object already has the same file name, rename the old file ~ Filename. You can use the -- suffix option to specify different backup file prefixes.
-- Backup-Dir: backs up files (for example ~ Filename) is stored in the directory.
-Suffix = suffix defines the backup file prefix.
-U, -- Update only performs updates, that is, skipping all files that already exist in DST and whose file time is later than the time to be backed up. (Do not overwrite the updated file)
-L, -- links retains soft links
-L, -- copy-links: process soft links like regular files
-- Copy-unsafe-links: only copies links other than the SRC path directory tree.
-- Safe-links ignores links other than the SRC path directory tree
-H, -- hard-links
-P, -- perms to keep File Permissions
-O, -- owner keeps file owner information
-G, -- group: Keep file group information
-D, -- devices: Keep Device File Information
-T, -- times preserve the file time information
-S, -- sparse performs special processing on sparse files to save DST Space
-N, -- dry-run which files will be transmitted
-W, -- whole-file: Copy files without incremental Detection
-X, -- one-file-system do not span the boundaries of the file system
-B, -- block-size = size indicates the block size used by the algorithm. The default value is 700 bytes.
-E, -- RSH = command specifies that RSH and SSH are used for data synchronization.
-- Rsync-Path = path specifies the path of the rsync command on the remote server
-C, -- CVS-exclude automatically ignores files in the same way as CVS to exclude files that do not want to be transmitted
-- Existing only updates the files that already exist in DST, instead of backing up the new files.
-- Delete: delete the files that are not in the SRC file in DST.
-- Delete-excluded: Delete Files specified by this option at the receiving end.
-- Delete-after: delete after transmission
-- Ignore-errors is deleted when an IO error occurs in a timely manner.
-- Max-delete = num: a maximum of num files can be deleted.
-- Partial retains the files that are not completely transferred for any reason to speed up subsequent re-transmission.
-- Force directory deletion, even if not empty
-- Numeric-IDs does not match the number user and group ID with the user name and group name.
-- Timeout = time IP timeout, in seconds
-I, -- ignore-times do not skip files with the same time and length
-- Size-only: when determining whether to back up a file, only check the file size, regardless of the file time
-- Modify-window = num determines whether the timestamp window of the file is used at the same time. The default value is 0.
-T -- temp-Dir = dir create a temporary file in Dir
-- Compare-Dest = dir: Compare the files in Dir to determine whether to back up data.
-P is equivalent to -- partial
-- Progress displays the backup process
-Z, -- compress compresses backup files during transmission
-- Exclude = pattern specifies to exclude file modes that do not need to be transmitted
-- Include = pattern specifies the file mode to be transmitted without exclusion
-- Exclude-from = file: exclude files in the specified mode in the file.
-- Include-from = file: files with the specified file pattern matching are not excluded.
-- Version: prints version information.
-- Address: bind to a specific address
-- Config = file: specify other configuration files. The default rsyncd. conf file is not used.
-- Port = port specify other rsync service ports
-- Blocking-io: block Io for remote shell
-Stats indicates the transmission status of some files.
-- SS actual transmission process during transmission
-- Log-format = format specifies the log file format
-- Password-file = file get password from File
-- Bwlimit = kbps limits I/O bandwidth, Kbytes per second
-H, -- help: displays help information
?
?
- Appendix 6: Upload to destination
Hadoop FS-Put/home/wx/desktop/winevt HDFS: // master: 8020/
Hadoop FS-ls HDFS: // master: 8020/winevt
To delete an object first, run
Hadoop FS-rm-r HDFS: // master: 8020/winevt
If the command is automatically executed, add the command to the/etc/crontab file in the format to make it run automatically on a regular basis, as shown in figure
52 19 *** root hadoop FS-rm-r HDFS: // master: 8020/winevt
54 19 *** root hadoop FS-Put/home/wx/desktop/winevt HDFS: // master: 8020/
- Back up tcpdump file instances
- Save tcpdump traffic File
Save the traffic file generated by tcpdump in a specified folder:
Switch to the copy folder and enter the tcpdump-C 10-W tcpdump command to save the traffic file named tcpdump in the copy folder, and then run the shell script mV. sh transfers the completed files to the tcpdump folder (. /mv. SH/home/wx/desktop/copy/home/wx/desktop/tcpdump ).
- Upload a tcpdump file to HDFS
Add the command to the/etc/crontab file according to the format to make it run automatically on a regular basis, as shown in
54 19 *** root hadoop FS-Put/home/wx/desktop/tcpdump HDFS: // master: 8020/