Analyze the security of open-source components from the WTForm URLXSS

Last Update:2016-03-01 Source: Internet

Author: User

Tags php framework

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Analyze the security of open-source components from the WTForm URLXSS
0x00 open source components and Open Source Applications

Open-source components are essential tools for development. The so-called "do not duplicate the wheel" is also because we can directly call a large number of encapsulated components during development, this reduces the workload of repeated development.

There are also some differences between open-source components and open-source programs. Open-source components are intended for developers, and open-source programs can directly target users. Open-source components, such as uploadify in JavaScript, PHPExcel in php, and open-source programs such as wordpress, joomla, and ghost written by node. js.

In terms of security, there is no doubt that the vulnerability of open-source components has a far greater impact than that of open-source software. However, a large number of open-source component vulnerabilities rarely appear in our eyes. I have summarized several reasons:

Open-source program vulnerabilities are universal, and many of them can be tested across the network through a common poc, which is more "commercial value". Open-source components are not uniform due to different developers' usage methods, the Exploitation threshold is relatively high. The public is more familiar with open-source software, such as wordpress, and few know which open-source components are used inside wordpress. Correspondingly, when a vulnerability occurs, people only think that this vulnerability is a wordpress vulnerability. Inertial thinking makes people think that there should be no vulnerabilities in the "library", and the code defects of third-party libraries imported in are rarely concerned during code auditing. Therefore, open-source components have fewer vulnerabilities. Developers who can develop open-source components have relatively high quality and high code quality, which also makes open-source components less likely to have vulnerabilities. Component vulnerabilities are mostly controversial. Many people cannot tell whether the component is its own or its users. Many problems can only be referred to as "Features 』, but in fact, these features are more terrible than some vulnerabilities.

In particular, the current impetuous security atmosphere in China can clearly feel the first reason. Several vulnerabilities that have a major impact some time ago: Java deserialization, joomla code execution, and redis ssh key writing can clearly feel that the latter two are better than the former, the former has been widely concerned for nearly a year.

The Java deserialization vulnerability happens to be caused by the typical "component" feature. As early as 2015 of January 28, a white hat reported the use of Apache Commons Collections, a common Java Library, to implement arbitrary code execution methods, but there was not much attention (the same was true in foreign countries ). It was not until November that someone proposed to use this method to attack WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and other applications.

This comparison clearly reflects the gap between "open-source components" and "open-source applications" in Security Vulnerability attention.

I personally posted several component vulnerabilities on wooyun. From the previous ThinkPHP framework injection, to the Tornado file later, to the XXE of slimphp, I basically used these components myself, this is found when code review is performed on the overall code.

This article uses an example to briefly describe how to review the code of a third-party library, and how to correctly use a third-party library.

0x01 weak validator in WTForm

WTForms is an important component in python web development. It provides simple form generation, verification, conversion, and other functions. It is a number of python web frameworks (especially flask) one of the indispensable auxiliary libraries.

One important function of WTForms is to check user input, which is called validator in this document:

Http://wtforms.readthedocs.org/en/latest/validators.html

A validator simply takes an input, verifies it fulfills some criterion, such as a maximum length for a string and returns. or, if the validation fails, raises a ValidationError. this system is very simple and flexible, and allows you to chain any number of validators on fields.

We can simply use its built-in validator to check the data, for example, if you need to enter a URL address of "not blank", "minimum 10 characters", and "Maximum 64 characters", you can write the following class:

#!pyclass MyForm(Form):    url = StringField("Link", validators=[DataRequired(), Length(min=10, max=64), URL()])

Taking flask as an example, you only need to call the validate () function in the view to check whether your input is legal:

[email protected]('/', methods=['POST'])def check():    form = MyForm(flask.request.form)    if form.validate():        pass # right input    else:        pass # bad input

Typical agile development methods reduce the development workload.

However, during code review, I found that the built-in validators of WTForms are not credible, not so much. It is better to say that validator does not play any role in security.

Take the appeal code as an example. Can this code really check whether the user input data is a "URL? We can see the wtforms. validators. URL () class:

#!pyclass URL(Regexp):    def __init__(self, require_tld=True, message=None):        regex = r'^[a-z]+://(?P
 
  [^/:]+)(?P
  
   :[0-9]+)?(?P
   
    \/.*)?$'        super(URL, self).__init__(regex, re.IGNORECASE, message)        self.validate_hostname = HostnameValidation(            require_tld=require_tld,            allow_ip=True,        )    def __call__(self, form, field):        message = self.message        if message is None:            message = field.gettext('Invalid URL.')        match = super(URL, self).__call__(form, field, message)        if not self.validate_hostname(match.group('host')):            raise ValidationError(message)

It inherits the Rexexp class, which is actually a regular match for user input. We can see its regular expression:

Regex = R' ^ [a-z] + ://(? P [^/:] + )(? P : [0-9] + )? (? P \/.*)? $'

It can be seen that this regular expression does not seriously match the URL that the developer understands. Most developers want to obtain an http url, but this regular expression matches too much. The biggest feature is that it can match any protocol. The most common attack method is the XSS triggered by the Javascript protocol. For example

Javascript: //... xss

Code

WTForms will regard this as a legal URL and store it in the database. In business logic, URLs are usually output in the href attribute of hyperlinks. The href attribute supports Javascript pseudo-protocol execution of JavaScript code. Therefore, an XSS attack may be constructed. Another validator written in the Grass-roots code is the wtforms. validators. Email () class. view its code:

#!pyclass Email(Regexp):    def __init__(self, message=None):        self.validate_hostname = HostnameValidation(            require_tld=True,        )        super(Email, self).__init__(r'^.+@([^.@][^@]+)$', re.IGNORECASE, message)    def __call__(self, form, field):        message = self.message        if message is None:            message = field.gettext('Invalid email address.')        match = super(Email, self).__call__(form, field, message)        if not self.validate_hostname(match.group(1)):            raise ValidationError(message)

Check his regular expression ^. + @ ([^. @] [^ @] +) $. This regular expression cannot detect whether the user input is Email. The front. + enables all bad characters to enter the database.

Therefore, I personally call URL () and Email () as URL Finder and Email Finder, rather than validator, because they cannot verify user input at all, they are more suitable for searching targets for crawlers.

0x02 construct XSS with weak validator

This vulnerability is actually found on a website I wrote. This website allows visitors to enter their blog addresses, while the backend uses URL () to verify the validity of the addresses. on the user's homepage, other users can click their profile picture to access the blog. The whole process is as follows: https://gist.github.com/phith0n/807869afbe1365015627

#!py#(๑¯ω¯๑) coding:utf8 (๑¯ω¯๑)import osimport flaskfrom flask import Flaskfrom wtforms.form import Formfrom wtforms.validators import DataRequired, URLfrom wtforms import StringFieldapp = Flask(__name__)class UrlForm(Form):    url = StringField("Link", validators=[DataRequired(), URL()])@app.route('/', methods=['GET', 'POST'])def show_data():    form = UrlForm(flask.request.form)    if flask.request.method == "POST" and form.validate():        url = form.url.data    else:        url = flask.request.url    return flask.render_template('form.html', url=url, form=form)if __name__ == '__main__':    app.debug = False    app.run(os.getenv('IP', '0.0.0.0'), int(os.getenv('PORT', 8080)))

Form.html:

#!html

{% If form. url. errors %} {form. url. errors | join ('') }}{% endif %}

Your input url {url }}

Demo page: https://flask-form-phith0n.c9users.io/available for testing. Is this code vulnerable? Review URL Regular Expressions:

#!pyregex = r'^[a-z]+://(?P
       
        [^/:]+)(?P
        
         :[0-9]+)?(?P
         
          \/.*)?$'super(URL, self).__init__(regex, re.IGNORECASE, message)

There is a //, which is really annoying. I have commented out all the subsequent content, so that I cannot directly execute JavaScript. The bypass method is also simple. Because // is a single line comment, you only need to wrap the line.

But here the regular expression modifier is re. IGNORECASE, and there is no re. S, which causes the regular expression to no longer match once a line break occurs.

However, in JavaScript, The newline characters include \ n \ r \ u2028 and \ u2029, while in the regular expression, the newline is only \ n \ r, so I only need to use the \ u2028 or \ u2029 character to replace the line feed. (\ U2028 url encoding is % E2 % 80% A8)

Therefore, enter the url as follows:

Javascript: // www.baidu.com/alert (1)

Enter the above url and click the link to trigger the request:

This vulnerability is typical. No developer can think of a piece of code that hides a deep threat. Some people may think that my demo does not indicate the actual problem. I simply flipped through github and found a project with the same problem in less than five minutes: https://github.com/1jingdian/1jingdian. (Although the site has been closed, the code can be viewed)

Https://github.com/1jingdian/1jingdian/blob/master/application/forms/user.py

#! Pyclass SettingsForm (Form): motto = StringField ('sday') blog = StringField ('blog ', validators = [Optional (), URL (message = 'incorrect link format')]) weibo = StringField ('weibo ', validators = [Optional (), URL (message = 'incorrect link format')]) douban = StringField ('douban', validators = [Optional (), URL (message = 'incorrect link format')]) zhihu = StringField ('zhihu', validators = [Optional (), URL (message = 'link format incorrect ')])

Here, all the four links are verified using URL. Validate () is passed and stored in the database.

Then, on the personal page, extract the user information and input the template user/profile.html.

Https://github.com/1jingdian/1jingdian/blob/master/application/controllers/user.py#L14

#!pydef profile(uid, page):    user = User.query.get_or_404(uid)    votes = user.voted_pieces.paginate(page, 20)    return render_template('user/profile.html', user=user, votes=votes)

Follow-up profile.html https://github.com/1jingdian/1jingdian/blob/master/application/templates/user/profile.html

#!html{% from "macros/_user.html" import render_user_profile_header %}...{{ render_user_profile_header(user, active="votes") }}

Call the marco, pass in the render_user_profile_header function, continue to follow up: https://github.com/1jingdian/1jingdian/blob/master/application/templates/macros/_user.html#L37

#!html{% macro render_user_profile_header(user, active="creates") %}   ...

{% If user. blog %} {% endif %} {% if user. weibo %} {% endif %} {% if user. douban %} {% endif %}

Put user. blog, user. weibo, and user. douban into the href attribute of the tag. These operations are actually the epitome of my previous demo, which eventually results in LAX filtering of input URLs and XSS.

0x03 who owns the Open Source Component Vulnerability?

This is one of the several controversial topics. Many people think that open-source components cause vulnerabilities because developers do not use components in a standardized manner. I think one problem is that open source components must meet the following requirements:

According to the conventional methods in this document, the development document does not describe the security issues in such development. The same development method does not have any vulnerabilities in other similar components, and vulnerabilities are generated in this component.

For example, a design defect in WooYun: ThinkPHP may cause getshell. First, the first condition is met. The S function is used normally. Of course, security is also described in this document:

However, I think it is not enough. You can set the... parameter to avoid the cache file name "guessed 』. This document does not describe the dangers of the cache file name being guessed, and does not require this parameter to be set. So this pot should be at least half officially backed.

Another example: WooYun: The XXE Vulnerability (a typical form of XXE) exists in the slim architecture of the international php framework, A developer can cause a XXE vulnerability when receiving the POST parameter normally. This vulnerability has nothing to do with the developer.

Another example: WooYun: the unreasonable design of ThinkPHP architecture can easily lead to SQL injection. We modify logical operators to change the normal judgment process of developers, resulting in security problems. We will compare ThinkPHP and Codeigniter. In CI, the location of logical operators is different from that of TP, and it is in the "key" position:

Under normal circumstances, the key position is not controlled by the user. Therefore, the same development method does not have a problem in CI, but there is a problem in TP. I think this is also the ThinkPHP pot. Let's take a look at the question of the WTForm mentioned in this article. In fact, the WTForm can be left alone. In this document, we can see that validater is not rigorous:

Of course, this vague reminder is difficult for many people who do not have a security foundation.

0x04 how developers can cope with the "security features" of potential components 』

Therefore, developers who do not have a Security Foundation should be able to cope with potential component security features.

First of all, I think it is necessary to review code frequently. I will often read and audit the code I have written as an open-source application, at this time, you will often find some security issues that have not been noticed before.

In the code review process, we need to follow up the source code of the third-party library in depth, instead of simply looking at the code we write, in order to discover some potential features. These features are often the culprit of the vulnerability.

In addition, the reading capability of documents is also extremely important. In fact, a large number of "framework features" are described in the framework documentation. Many developers prefer to look at example, and think that reading code is more intuitive than reading text (maybe related to English Reading Ability), rather than reading instructions in detail. This approach is actually very dangerous in terms of security, because the sample code is usually the most simple code officially provided, and may ignore many necessary security measures.

In addition, having a certain security foundation is essential for every development, and the reasons do not need to be repeated.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More