Foreword
The establishment of enterprise security building Open source SIEM platform, SIEM (security information and event management), as the name suggests is for security information and event management system, for most businesses is not cheap security system, this article combined with the author's experience describes how to use open source Software build enterprise SIEM system, data depth analysis in the next.
SIEM development
Comparing Gartner's rankings of global SIEM vendors in 2009 and 2016, we can see clearly that Splunk, a maker of big data architecture, is rapidly emerging. The traditional semi-finals still occupy the leader quadrant based on the complete security product line and mature market channels. Other smaller Of the manufacturers gradually leave the leader quadrant. The most important storage architecture is also gradually transformed from a disk enclosure (optional) + commercial database to a scaleable big data architecture, and support for cloud environments has also become a trend.
Open source SIEM areas, the more typical is ossim and Opensoc, ossim storage architecture is mysql, supports a variety of log formats, including the famous Snort, Nmap, Nessus and Ntop, etc., for the case of small data size is a good choice, the new version Interface is cool.
The complete SIEM includes at least the following features:
Vulnerability management
Asset discovery
Intrusion detection
Behavior analysis
Log storage, retrieval
Alarm management
Cool report
One of the core I think is intrusion detection, behavior analysis and log storage retrieval, the focus of this article focuses on supporting the above three functions of the technical architecture.
Opensoc Introduction
Opensoc is an open source project Cisco announced at BroCon 2014, but it does not really open source its source code but simply releases its technology framework. We refer to the Opensoc release of the architecture, combined with the actual landing a set of programs. Opensoc is based entirely on the open source big data frameworks such as kafka, storm, spark, and es, which are inherently powerful for scale-out. The focus of this article is also on Opensoc-based siem structures.
Opensoc on the map is given the framework for the first time very hard to understand, we have to data storage and data processing two latitudes to refine the common linux server ssh login log collection as an example.
Data collection latitude
Data Collection The latitude requirement is to collect the original data, store and provide the UI interface of the user for interactive retrieval. The typical scenario is to trace the attack behavior and retrieve the loss through the retrieval log after a security incident occurs.
In fact, logtash can write data directly to es, but storm also needs data processing. Therefore, the data is split into logstash, and the split data is sent to kafka, which is provided to storm and logstash to es. Data retrieval can be used directly kibana, very convenient. Data segmentation can also be done inside storm. This is the famous ELK architecture. es is more suitable for real-time search queries that store hot data in a shorter period of time. For long-term storage, you also need to store it in hdfs when you want to use Hadoop or Spark for offline analysis. Therefore, the most common data flow chart is :
Data processing latitude
Here to live streaming data as an example, storm from kafka subscription ssh split log, matching detection rules, test results into mysql or es.
In this example, it is difficult to identify a security issue by looking at a log in isolation, identify non-gangplayer logins up to a maximum, and the real running needs to refer to the common logon IP, time, IP intelligence, etc. in the knowledge base as well as the state store of temporary storage processing status recently The IP login success and failure. The process is relatively close to the actual operation is as follows:
Examples of specific judgments logic, the actual use of a large number of proxy IP brute force at the same time, a shot for a place that can not be covered, here is just an example:
Expand data source
Production environment, dealing with security incidents, analysis of intrusion, only ssh login log is certainly not enough, we need as much as possible to collect data sources, the following as a reference:
linux / window system security log / operation log
web server access log
Database SQL log
Network traffic log
Simplify the system architecture is as follows, the alarm is stored es Mainly to view the alarm can also be kibana, manpower shortage of the interface are not developed:
storm topology
Storm topology support python development to deal with SQL log as an example:
Suppose the format of the SQL log
"Feb 16 06:32:50" "127.0.0.1" "root @ localhost" "select * from user where id = 1"
General storm topology
Simplify the spout is a universal read data from kafka, on a bolt handle SQL log, match the rules, hit the strategy that the output "alert": "The original SQL log"
Core bolt code doSQLCheckBolt pseudo code
import storm class doSQLCheckBolt (storm.BasicBolt): def process (self, tup): words = tup.values [0] .split ("") sql = word [3] if re.match (rules, sql): storm. emit (["sqli", tup.values [0]]) doSQLCheckBolt (). run () TopologyBuilder builder = new TopologyBuilder (); builder.setSpout ("sqlLog", new kafkaSpout (), 10); builder.setBolt "sqliAlert", new doSQLCheckBolt (), 3) .shuffleGrouping ("sqlLog");
Topology submission example
Config conf = new Config (); conf.setDebug (true); conf.setNumWorkers (2); LocalCluster cluster = new LocalCluster (); cluster.submitTopology ("doSQL", conf, builder.createTopology ()); Utils.sleep (10000); cluster.killTopology ("doSQL"); cluster.shutdown (); logstash
In this article, logstash configuration even exceeded the number of storm topology script development, the following talk about the more focused on a few points, cutting logs and retrieval needs are related, very personal, here is not expanded.
Read from the file
input file {path => ["/var/log/*.log", "/ var / log / message"] type => "system" start_position => "beginning"}} Subscribe from kafka to input {kafka {zk_connect => "localhost: 2181" group_id => "logstash" topic_id => "test" reset_beginning => false # boolean (optional), default: false consumer_threads => 5 # number (optional), default: 1 decorate_events => true # # string (optional), one of [none, "" boolean (optional), default: false}} Write kafka output {kafka {broker_list => "localhost: 9092" topic_id => "test" compression_codec => "snappy" gzip "," snappy "], default:" none "}}
Write hdfs
output {hadoop_webhdfs {workers => 2 server => "localhost: 14000" user => "flume" path => "/ user / flume / logstash / dt =% {+ Y} -% {+ M} write E output {elasticsearch {host => "localhost" protocol => "http" d / logstash -% {+ H} .log "flush_size => 500 compress =>" snappy "idle_flush_time => 10 retry_interval => 0.5} "index =>" logstash -% {type} -% {+ YYYY.MM.dd} "index_type =>"% {type} "workers => 5 template_overwrite => true}}
postscript
How to identify intrusion behavior in offline data through behavioral analysis and attack modeling?