Construction of enterprise security building open source SIEM platform (below)

Source: Internet
Author: User
Keywords Business safety
Tags business business safety cat create data enterprise enterprise security example

The establishment of enterprise security building Open source SIEM platform, SIEM (security information and event management), as the name suggests is for security information and event management system for most businesses is not cheap security system, this article combined with the author's experience describes how to use open source software Analyze data offline and use algorithms to mine unknown attacks.

Review system architecture


Web server log, for example, through logstash WEB server query log collection, near real-time backup to hdfs cluster, offline analysis of the attack by hadoop script.
Custom log format Open httpd custom log format, record User-Agen and Referer
# You need to enable mod_logio.c to use% I and% O
LogFormat "% h% l% u% t \"% r \ "%> s% b \"% {Referer} i \ "\"% {User-Agent} i \ "% I% O" combinedio
/ IfModule>
CustomLog "logs / access_log" combined
Log example
GET / wordpress / HTTP / 1.1 "200 17443" http://180.76.190.79:80/ "" Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.21 (KHTML, like Gecko) Chrome / 41.0.2228.0 Safari / 537.21 "
GET / wordpress / wp-json / HTTP / 1.1 "200 51789" - "" print env "
180.76.152.166
- - [26 / Feb / 2017: 13: 12: 38 +0800] "GET
/wordpress/wp-admin/load-styles.php?c=0&dir=ltr&load[]=dashicons,buttons,forms,l10n,login&ver=Li4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vZXRjL3Bhc3N3ZAAucG5n
HTTP / 1.1 "200 35841" http://180.76.190.79:80/ "" Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.21 (KHTML, like Gecko) Chrome / 41.0.2228.0 Safari / 537.21 "
GET / wordpress / HTTP / 1.1 "200 17442" http://180.76.190.79:80/ "" Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.21 (KHTML, like Gecko) Chrome / 41.0.2228.0 Safari / 537.21 "
Test environment in the wordpress directory to add test code 1.php, the content is phpinfo


Access log for 1.php
[root @ instance-8lp4smgv logs] # cat access_log | grep 'wp-admin / 1.php'
125.33.206.140
- - [26 / Feb / 2017: 13: 09: 47 +0800] "GET /wordpress/wp-admin/1.php
HTTP / 1.1 "200 17" - "" Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_12_3)
AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 50.0.2661.102
Safari / 537.36 "
125.33.206.140 - - [26 / Feb / 2017: 13: 11: 19 +0800]
"GET /wordpress/wp-admin/1.php HTTP / 1.1" 200 17 "-" "Mozilla / 5.0
(Macintosh; Intel Mac OS X 10_12_3) AppleWebKit / 537.36 (KHTML, like
Gecko) Chrome / 50.0.2661.102 Safari / 537.36 "
125.33.206.140 - -
[26 / Feb / 2017: 13: 13: 44 +0800] "GET /wordpress/wp-admin/1.php HTTP / 1.1"
200 17 "-" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_12_3)
AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 50.0.2661.102
Safari / 537.36 "
127.0.0.1 - - [26 / Feb / 2017: 13: 14: 19 +0800] "GET
/wordpress/wp-admin/1.php HTTP / 1.1 "200 17" - "" curl / 7.19.7
(x86_64-redhat-linux-gnu) libcurl / 7.19.7 NSS / 3.14.0.0 zlib / 1.2.3
libidn / 1.18 libssh2 / 1.4.2 "
127.0.0.1 - - [26 / Feb / 2017: 13: 16: 04
+0800] "GET /wordpress/wp-admin/1.php HTTP / 1.1" 200 107519 "-"
"curl / 7.19.7 (x86_64-redhat-linux-gnu) libcurl / 7.19.7 NSS / 3.14.0.0
zlib / 1.2.3 libidn / 1.18 libssh2 / 1.4.2 "
125.33.206.140 - -
[26 / Feb / 2017: 13: 16: 12 +0800] "GET /wordpress/wp-admin/1.php HTTP / 1.1"
200 27499 "-" "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_12_3)
AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 50.0.2661.102
Safari / 537.36 "
[root @ instance-8lp4smgv logs] #
hadoop offline processing
hadoop is based on map, reduce model
map script
localhost: work maidou $ cat mapper-graph.pl
#! / usr / bin / perl -w
"GET / wordpress / HTTP / 1.1" 200 17443 "http://180.76.190.79:80/" "Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.21 (KHTML, like Gecko) Chrome / 41.0.2228.0 Safari / 537.21 "
my $ line = "";
while ($ line =)
{
if ($ line = ~ / "GET (\ S +) HTTP \ / 1. [01]" 2 \ d + \ d + "(\ S +)" /)
{
my $ path = $ 1;
my $ ref = $ 2;
if ($ path = ~ / (\ S +) \? (\ S +) /)
{
$ path = $ 1;
}
if ($ ref = ~ / (\ S +) \? (\ S +) /)
{

$ ref = $ 1;
}
if (($ ref = ~ / ^ http: \ / \ / 180 /) || ("-" eq $ ref))
{
my $ line = $ ref. "::". $ path. "\ n";
#printf ("$ ref :: $ path \ n");
print ($ line);
}
}
}
reducer script
localhost: work maidou $ cat reducer-graph.pl
#! / usr / bin / perl -w
my% result;
my $ line = "";
while ($ line =)
{
if ($ line = ~ / (\ S +) \: \: (\ S +) /)
{
unless (exists ($ result {$ line}))
{
$ result {$ line} = 1;
}
}
}
foreach $ key (sort keys% result)
{
if ($ key = ~ / (\ S +) \: \: (\ S +) /)
{
my $ ref = $ 1;
my $ path = $ 2; # Here is an example of filtering your concern webshell file suffixes, common php, jsp, whitelist filtering there is a risk of omission; you can also blacklist the form of filtering you ignore the file type
if ($ path = ~ / (\. php) $ /)
{
my $ output = $ ref. "->". $ path. "\ n";
print ($ output);
}
}

}
An example of the generated result is:
- -> http://180.76.190.79/wordpress/wp-admin/1.php
- -> http://180.76.190.79/wordpress/wp-admin/admin-ajax.php
- -> http://180.76.190.79/wordpress/wp-admin/customize.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/edit-comments.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/profile.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-login.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/xmlrpc.php
Figure algorithm generates data import map database neo4j, to meet the characteristics of webshell:
Entry and exit are 0
In and out are 1 and point yourself
neo4j
neo4j is a high-performance, NOSQL graphical database that stores structured data on a network rather than a table and is receiving increasing attention due to its embedded, high-performance, lightweight advantages.


neo4j installation
https://neo4j.com/ download installation package installation, the default configuration can be
ne04j start to my mac as an example, you can start by gui, the default password ne04j / ne04j, the first login will require a change of password


GUI management interface


python api library installation
sudo pip install neo4j-driver
Download JPype
https://pypi.python.org/pypi/JPype1
Install JPype
tar-zxvf JPype1-0.6.2.tar.gz
cd JPype1-0.6.2
sudo python setup.py install
The data into the map database code is as follows:
B0000000B60544: freebuf liu.yan $ cat load-graph.py
import re
from neo4j.v1 import GraphDatabase, basic_auth
nodes = {}
index = 1
driver = GraphDatabase.driver ("bolt: // localhost: 7687", auth = basic_auth ("neo4j", "maidou"))
session = driver.session ()
file_object = open ('r-graph.txt', 'r')
try:
for line in file_object:
matchObj = re.match (r '(\ S +) -> (\ S +)', line, re.M | re.I)
if matchObj:
path = matchObj.group (1);
ref = matchObj.group (2);
if path in nodes.keys ():
path_node = nodes [path]
else:
path_node = "Page% d"% index
nodes [path] = path_node
sql = "create (% s: Page {url: \"% s \ ", id: \"% d \ ", in: 0, out: 0})"% (path_node, path, index)
index = index + 1
session.run (sql)
#print sql
if ref in nodes.keys ():
ref_node = nodes [ref]
else:
ref_node = "Page% d"% index
nodes [ref] = ref_node
% (ref_node, ref, index) sql = "create (% s: Page {url: \"% s \ ", id: \"% d \ ", in: 0, out:
index = index + 1
session.run (sql)
#print sql
sql = "create (% s) - [: IN] -> (% s)"% (path_node, ref_node)
session.run (sql)
#print sql
sql = "match (n: Page {url: \"% s \ "}) SET n.out = n.out + 1"% path
session.run (sql)
#print sql
sql = "match (n: Page {url: \"% s \ "}) SET n.in = n.in + 1"% ref
session.run (sql)
#print sql
finally:
file_object.close ()
session.close ()
Generate a directed graph as follows




A node whose inquiring degree is 1 is 0 or a node whose inquiring degree is 1 and its own point is pointed out, and a degree of ref Out of the 0.


Optimization point Production environment in actual use, we encounter false positives are divided into the following categories:
Home page, a variety of index page (the first false positive is this)
phpmyadmin, zabbix operation management background
hadoop, elk and other open source software console
API interface These can be effectively solved by shortening plus white, the more troublesome is the effect of the scanner on the result (this is the case for the second false positive), which requires the removal of interference by means of scanner fingerprinting or using tall human algorithms .
Postscript Using algorithms to mine unknown attack behavior is a very popular research direction at present, this article only introduces an algorithm which is better understood and implemented. This algorithm is not my first initiative. Many security companies also have more or less experienced practice. Space is limited, I will gradually one after another in the article on corporate security construction from other than the introduction of other algorithms. Algorithm or the nature of machine learning is the trend of scientific law in a large data set set, so it is difficult to achieve accurate alarm, the current stage still need to be assisted by a variety of rules and models, but for the unknown unknown attack is indeed a Jones .
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.