Construction of enterprise security building open source SIEM platform. SIEM (security information and event management), as its name implies, is a management system for security information and events. It is not a cheap security system for most enterprises. This article uses the author's experience to introduce how to use open source software to analyze data offline and use attack modeling Way to identify attacks.
Review system architecture
To the database, for example, through logstash collect mysql query log, near real-time backup to hdfs cluster, offline analysis of the attack by hadoop script.
Database log collection of common data log collection in three ways:
Mirroring Most database auditing products support this pattern by analyzing database traffic, decoding database protocols, identifying SQL predictions, extracting SQL logs
Agent is a typical way db-proxy way, the current Baidu, Sohu, the United States Mission, Jingdong and other related open source products, the front-end through db-proxy to access the back-end real database server. SQL log can be collected directly in the db-proxy.
Clients through the database server to install the client to collect SQL logs, the more typical way is through logstash to collect, this article to explain the way the client, the rest of the way is essentially similar.
logstash configuration and installation download logstash https://www.elastic.co/downloads/logstash The latest version 5.2.1 open mysql query log
mysql query log
Configure logstash
input {
file {
type => "mysql_sql_file"
path => "/var/log/mysql/mysql.log"
start_position => "beginning"
sincedb_path => "/ dev / null"
}
}
output {
kafka {broker_list => "localhost: 9092" topic_id => "test" compression_codec => "snappy" # string (optional), one of ["none", "gzip", "snappy"], default: "none"}
}
Run logstash
bin / logstash -f mysql.conf
Log example
2017-02-16T23: 29: 00.813Z localhost 170216 19:10:15 37 Connect
debian-sys-maint @ localhost on
2017-02-16 T23: 29: 00.813Z localhost 37 Quit
2017-02-16T23: 29: 00.813Z localhost 38 Connect debian-sys-maint @ localhost on
2017-02-16 T23: 29: 00.813Z localhost 38 Query SHOW VARIABLES LIKE 'pid_file'
The most simplistic operation is to cut the word without cutting the word, if you like automatic segmentation of the database name, time and other fields, please refer to:
grok syntax
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
grok syntax debugging
http://grokdebug.herokuapp.com/
Common attack characteristics Wavsep to build a common shooting range environment, please refer to my another article "WAVSEP-based shooting range build guide"
Use SQL scan link
Analysis of attack characteristics, the following list two, more attack characteristics please summarize the characteristics of a
2017-02-16T23: 29: 00.993Z localhost 170216 19:19:12 46 Query SELECT username, password FROM users WHERE username = 'textvalue' UNION ALL SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL # 'AND password =' textvalue2 '
Enumeration data using the union query will generate a large number of NULL field characteristics Two, three enumeration database structure will use INFORMATION_SCHEMA, another individual scanner will use GROUP BY x) a)
2017-02-16 T23: 29: 00.998Z localhost 46 Query SELECT username, password FROM users WHERE username = 'textvalue' AND (SELECT 7473 FROM (SELECT COUNT (*), CONCAT (0x7171716271, (SELECT (CASE WHEN (8199 = 8199 ) THEN 1 ELSE 0 END)), 0x717a627871, FLOOR (RAND (0) * 2)) x FROM INFORMATION_SCHEMA.PLUGINS GROUP BY x) a) - LFpQ 'AND password =' textvalue2 '
hadoop offline processing
hadoop is based on map, reduce model
Simplify understanding is:
cat data.txt | ./map | ./reduce
In the most simplistic, we can only develop the map program to process the log data line by line in the map to match the attack behavior.
Perl script development, python similar
#! / usr / bin / perl -w
my $ rule = "(null,) {3,} | information_schema | GROUP BY x \\) a \\)";
my $ line = "";
while ($ line =)
{
if ($ line = ~ / $ rule / i)
{
printf ($ line);
}
}
In hadoop run.
Production environment The rules of the production environment will be much more complicated than this, you need to constantly add, here is just an example;
Simply writing map will have a lot of duplicate alarms, you need to develop reduce for aggregation;
Emergency response needs to know that the SQL injection is that the library, which account is used, this need to logstash cut field to add;
The best response to emergency response SQL injection can know the corresponding link, this need to web log accesslog and SQL log analysis, more mature program is based on machine learning, learning time-based correlation matrix;
Clients directly collect SQL data requirements Mysql also open the query log, this has a greater impact on server performance, I know that large companies to db-prxoy access-based approach, it is recommended to be collected in the db-proxy;
Based on the rules identified SQL injection bottlenecks, although the relative web log level and traffic level has some progress, SQL semantics has become the inevitable path.