This article mainly introduces the PHP implementation of the Log collection system, has a certain reference value, now share to everyone, the need for friends can refer to
The recent business involves log collection requirements for remote servers, which are implemented using PHP for the idea of restricting the expansion of the technology stack.
Some small points to note in the implementation process, recorded as follows:
1. Active acquisition. Because of the large number of servers, if you use a schema such as Flume, you need to install software on each server, which results in operational costs. So we use the collection-side initiative to get the way. There is no need to install software on the producer (server side).
2.SSH connection. Each server is configured with SSH connection permissions, and the PHP ssh2 extension allows you to connect remotely and access server content.
3. The server log structure is unified. Log files on each server are placed in the same directory rule to simplify program logic.
4.CLI operation. Collection is a continuous running program, using the CLI mode, be aware of the INI file problem that is being used at this time.
5.SSH Connection exception. Sometimes, due to network problems, the SSH connection or authentication fails, and the delay is retried.
6. Log truncation and compression. In general, our operations will truncate and compress logs at a fixed time of day, and there are two types of files that need to be read: Compressed and uncompressed logs, which need to be handled separately.
7. Timestamp in the log. The timestamp in seconds is not enough to differentiate the request, we increase the $msec in milliseconds, the same IP source in the same millisecond, the same UA can be considered a request.
8. Read the directory. Use Readdir to read the remote directory in SSH format, Readdir ("ssh2.sft://..."); After filtering out unwanted files, sort by file creation time and process them individually.
9. Read the compressed file. If using file_get_contents will cause the interface to be unresponsive for a long time, I use fopen, fread step-by-stage reading. Read 8K at a time (no more large). Output a progress display after a certain number of reads.
10. Compress the file cache. After the read is successful, it is saved to the cache directory for backup and next use. If the program is faulted or rerun, check the cache directory first, and if there is a cache file, it will not be read from the network.
11. Unzip. You can use Gzdecode. This can cause PHP memory to burst, adjust the php.ini bar, and expand the memory limit.
12. Compress log processing to complete the record. After processing the completion of a compressed file, recorded in the database, after the PHP program runs, there is no need to repeat processing.
13. The log processing is not compressed. An uncompressed log indicates that the log is still growing. No caching is required. Use database records, current file pointers (using Ftell,fseek). Record file creation date.
14. Do not compress the log judgment. When the file date is not the same as the date of the record, or the file is smaller than the file size in the record, the file is updated and the file pointer needs to be reset.
Otherwise, you can navigate directly (fseek) to continue from the location you last processed.
15. Journal line decomposition. Use regular can, according to space and delimiter to distinguish. You can also use the LogParser third-party class library for processing. To save memory overhead. You can use the iterator mode to return line by row.
16. Log weight. Read the last log timestamp (ms) and Ip,ua of each server in advance.
17. Log save. I used MySQL to save the log. Each line of log execution once MySQL will be a great waste of running time, you can accumulate 4000 lines and then insert again.
18. Error handling. In addition to the SSH connection failure, the half-line log is read, causing the decomposition to fail, and an exception is thrown at this point. Captured by the main program and run again.