企業安全建設之搭建開源SIEM平臺,SIEM(security information and event management),顧名思義就是針對安全資訊和事件的管理系統,針對大多數企業是不便宜的安全系統, 本文結合作者的經驗介紹如何使用開源軟體離線分析資料,使用演算法挖掘未知攻擊行為。
回顧系統架構
以WEB伺服器日誌為例,通過logstash搜集WEB伺服器的查詢日誌,近即時備份到hdfs集群上,通過hadoop腳本離線分析攻擊行為。
自訂日誌格式
開啟HTTPd自訂日誌格式,記錄User-Agen以及Referer
# You need to enable mod_logio.c to use %I and %O
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
/IfModule>
CustomLog "logs/access_log" combined
日誌舉例
180.76.152.166 - - [26/Feb/2017:13:12:37 +0800] "GET /wordpress/ HTTP/1.1" 200 17443 "HTTP://180.76.190.79:80/" "Moz illa/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"
180.76.152.166 - - [26/Feb/2017:13:12:37 +0800] "GET /wordpress/wp-json/ HTTP/1.1" 200 51789 "-" "print `env`"
180.76.152.166
- - [26/Feb/2017:13:12:38 +0800] "GET
/wordpress/wp-admin/load-styles.php?c=0&dir=ltr&load[]=dashicons,buttons,forms,l10n,login&ver= Li4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vZXRjL3Bhc3N3ZAAucG5n
HTTP/1.1" 200 35841 "HTTP://180.76.190.79:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"
180.76.152.166 - - [26/Feb/2017:13:12:38 +0800] "GET /wordpress/ HTTP/1.1" 200 17442 "HTTP://180.76.190.79:80/" "Moz illa/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"
測試環境
在wordpress目錄下添加測試代碼1.php,內容為phpinfo
針對1.php的訪問日誌
[root@instance-8lp4smgv logs]# cat access_log | grep 'wp-admin/1.php'
125.33.206.140
- - [26/Feb/2017:13:09:47 +0800] "GET /wordpress/wp-admin/1.php
HTTP/1.1" 200 17 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102
Safari/537.36"
125.33.206.140 - - [26/Feb/2017:13:11:19 +0800]
"GET /wordpress/wp-admin/1.php HTTP/1.1" 200 17 "-" "Mozilla/5.0
(Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/50.0.2661.102 Safari/537.36"
125.33.206.140 - -
[26/Feb/2017:13:13:44 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1"
200 17 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102
Safari/537.36"
127.0.0.1 - - [26/Feb/2017:13:14:19 +0800] "GET
/wordpress/wp-admin/1.php HTTP/1.1" 200 17 "-" "curl/7.19.7
(x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3
libidn/1.18 libssh2/1.4.2"
127.0.0.1 - - [26/Feb/2017:13:16:04
+0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 107519 "-"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0
zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
125.33.206.140 - -
[26/Feb/2017:13:16:12 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1"
200 27499 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102
Safari/537.36"
[root@instance-8lp4smgv logs]#
hadoop離線處理
hadoop是基於map,reduce模型
map腳本
localhost:work maidou$ cat mapper-graph.pl
#!/usr/bin/perl -w
#180.76.152.166 - - [26/Feb/2017:13:12:37 +0800] "GET /wordpress/ HTTP/1.1" 200 17443 "HTTP://180.76.190.79:80/" "Mo zilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"
my $line="";
while($line=)
{
if( $line=~/"GET (\S+) HTTP\/1.[ 01]" 2\d+ \d+ "(\S+)"/ )
{
my $path=$1;
my $ref=$2;
if( $path=~/(\S+)\?( \S+)/ )
{
$path=$1;
}
if( $ref=~/(\S+)\?( \S+)/ )
{
$ref=$1;
}
if( ($ref=~/^HTTP:\/\/180/)|| ( "-" eq $ref ) )
{
my $line=$ref."::".$path." \n";
#printf("$ref::$path\n");
print($line);
}
}
}
reducer腳本
localhost:work maidou$ cat reducer-graph.pl
#!/usr/bin/perl -w
my %result;
my $line="";
while($line=)
{
if( $line=~/(\S+)\:\:(\S+)/ )
{
unless( exists($result{$line}) )
{
$result{$line}=1;
}
}
}
foreach $key (sort keys %result)
{
if( $key=~/(\S+)\:\:(\S+)/ )
{
my $ref=$1;
my $path=$2;#這裡是舉例,過濾你關注的webshell檔尾碼,常見的有php、jsp,白名單形式過濾存在漏報風險;也可以以黑名單形式過濾你忽略的檔案類型
if( $path=~/(\.php)$/ )
{
my $output=$ref." -> ".$path." \n";
print($output);
}
}
}
生成結果示例為:
- -> HTTP://180.76.190.79/wordpress/wp-admin/1.php
- -> HTTP://180.76.190.79/wordpress/wp-admin/admin-ajax.php
- -> HTTP://180.76.190.79/wordpress/wp-admin/customize.php
HTTP://180.76.190.79/wordpress/ -> HTTP://180.76.190.79/wordpress/wp-admin/edit-comments.php
HTTP://180.76.190.79/wordpress/ -> HTTP://180.76.190.79/wordpress/wp-admin/profile.php
HTTP://180.76.190.79/wordpress/ -> HTTP://180.76.190.79/wordpress/wp-login.php
HTTP://180.76.190.79/wordpress/ -> HTTP://180.76.190.79/wordpress/xmlrpc.php
圖演算法
講生成資料導入圖資料庫neo4j,滿足webshell特徵的為:
入度出度均為0
入度出度均為1且自己指向自己
neo4j
neo4j是一個高性能的,NOSQL圖形資料庫,它將結構化資料存儲在網路上而不是表中,因其嵌入式、高性能、羽量級等優勢,越來越受到關注。
neo4j安裝
HTTPs://neo4j.com/ 上下載安裝包安裝,預設配置即可
ne04j啟動
以我的mac為例子,通過gui啟動即可,預設密碼為ne04j/ne04j,第一次登錄會要求更改密碼
GUI管理介面
python api庫安裝
sudo pip install neo4j-driver
下載JPype
HTTPs://pypi.python.org/pypi/JPype1
安裝JPype
tar -zxvf JPype1-0.6.2.tar.gz
cd JPype1-0.6.2
sudo python setup.py install
將資料導入圖資料庫代碼如下:
B0000000B60544:freebuf liu.yan$ cat load-graph.py
import re
from neo4j.v1 import GraphDatabase, basic_auth
nodes={}
index=1
driver = GraphDatabase.driver("bolt://localhost:7687",auth=basic_auth("neo4j","maidou"))
session = driver.session()
file_object = open('r-graph.txt', 'r')
try:
for line in file_object:
matchObj = re.match( r'(\S+) -> (\S+)', line, re. M|re. I)
if matchObj:
path = matchObj.group(1);
ref = matchObj.group(2);
if path in nodes.keys():
path_node = nodes[path]
else:
path_node = "Page%d" % index
nodes[path]=path_node
sql = "create (%s:Page {url:\"%s\" , id:\"%d\",in:0,out:0})" %(path_node,path,index)
index=index+1
session.run(sql)
#print sql
if ref in nodes.keys():
ref_node = nodes[ref]
else:
ref_node = "Page%d" % index
nodes[ref]=ref_node
sql = "create (%s:Page {url:\"%s\",id:\"%d\",in:0,out:0})" %(ref_node,ref,index)
index=index+1
session.run(sql)
#print sql
sql = "create (%s)-[:IN]->(%s)" %(path_node,ref_node)
session.run(sql)
#print sql
sql = "match (n:Page {url:\"%s\"}) SET n.out=n.out+1" % path
session.run(sql)
#print sql
sql = "match (n:Page {url:\"%s\"}) SET n.in=n.in+1" % ref
session.run(sql)
#print sql
finally:
file_object.close( )
session.close()
生成有向圖如下
查詢入度為1出度均為0的結點或者查詢入度出度均為1且指向自己的結點,由於把ref為空的情況也識別為」-」結點,所以入度為1出度均為0。
優化點
生產環境實際使用中,我們遇到誤報分為以下幾種:
主頁,各種index頁面(第一個誤報就是這種)
phpmyadmin、zabbix等運維管理後臺
hadoop、elk等開源軟體的主控台
API介面
這些通過短期加白可以有效解決,比較麻煩的是掃描器對結果的影響(第二個誤報就是這種),這部分需要通過掃描器指紋或者使用高大上的人機演算法來去掉干擾。
後記
使用演算法來挖掘未知攻擊行為是目前非常流行的一個研究方向,本文只是介紹了其中比較好理解和實現的一種演算法,該演算法並非我首創,不少安全公司也都或多或少有過實踐。 篇幅有限,我將陸續在企業安全建設專題其他文章中由淺入深介紹其他演算法。 演算法或者說機器學習本質是科學規律在大資料集集合上趨勢體現,所以很難做到精准報警,目前階段還是需要通過各種規則和模型來輔助,不過對於挖掘未知攻擊行為確實是一支奇兵。