Prevents malicious website operations (playing with the System)

Source: Internet
Author: User
Tags sha1 hash

By Eli White

Learn how to prevent MySQL-PHP websites from being played.

Released in April 2011

A new term "gaming" appears in the Web application dictionary )". In this context, playing games does not mean that people play online games. It means someone sees your website as a prey and tries to influence it.

One of the most obvious possible examples is online voting. For example, a person tries to create an account or other means to play with a single person to vote multiple times, leading the ranking of his/her candidates. However, this concept applies to any situation where users may wish to manipulate the website for profit.

Why is someone doing this? One of the situations where you can quickly post a large number of comments on an article is to make the page full of messy comments so that you cannot read the real comments. They may log in/out frequently to increase the number of logins, or constantly reload their own pages to make these pages more popular. In the worst case, some people may hate your website and want to invalidate the website data. In any case, this is always a vulnerability that you do not want to expose.

In this article, I will introduce some methods to help prevent websites from being played. Some of these methods rely on closing the script, and some try to find the person who manually executes this behavior.

Early Anti-Spam Information Methods

In my article "cracking down on spam information and building healthier websites", I introduced some methods to ensure that user comments are effective. For the same purpose, these methods can also be used as defense measures. Because the alternative method (for example, creating and logging on to multiple accounts) is tedious, most system-playing attempts are automated. If the configured obstacle makes the script invalid, or at least makes it more difficult to write, you can discard the script.

Speed Limit

There is also a very effective defense measure: to limit the speed of Operation execution. Generally, if someone writes a script to perform operations on your website, they will not consider making it "act" like a person ". Therefore, this script generally sends intensive requests to you. This is especially true for GreaseMonkey scripts that are completely written in JavaScript to automatically execute browser operations.

In this case, you only need to track the time when the operation occurred, manually test the person to make a wise decision, and click the speed of the relevant link. Then set a threshold value so that if the voting speed is higher than the possible voting speed, you will know that these votes are generated by scripts. Then, you can ignore the validity of all these votes.

For example, if you list multiple items (such as movies) on a certain page of the website, you are asked to score each item by Level 1 to 5. In MySQL, the original mode of this "movie score" table may be similar to the following:

CREATE TABLE `ratings` ( `id`         INTEGER UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,`user_id`    INTEGER UNSIGNED NOT NULL,`movie_id`   INTEGER UNSIGNED NOT NULL,`rating`     TINYINT UNSIGNED NOT NULL) ENGINE=innodb DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;CREATE UNIQUE INDEX `user_movie` ON `ratings`(`user_id`, `movie_id`);


Someone may write a script and try to score "1" for each item. To limit the speed of scoring movies, modify the table to store additional data:

ALTER TABLE `ratings`   ADD COLUMN `invalid` TINYINT UNSIGNED NOT NULL DEFAULT 0,   ADD COLUMN `created` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP;


Now we can track when a link is clicked, and there is a way to mark the score as invalid. After that, let us run some manual tests and find that the fastest speed for a person is to click five links in five seconds. Considering that some users may be faster than you, we assume that you can click five links in four seconds. Now, before adding a new score for a movie, you can run the following SQL statement to check whether the speed limit is exceeded:

SELECT `id` FROM `ratings`WHERE `user_id` = 42   AND `created` > DATE_SUB(NOW(), interval 4 second);


This code returns all entries generated in the past four seconds. If there are four or more items (because 5th items are to be added), you must set an invalid field when inserting the current score row for the movie. You also need to set all the original scores as invalid.

Check Source

Another information you can use in the survey is the operation source. View all available information to track who is performing the operation.

The simplest way is to view specific users. If you are tracking logon users as in the previous example, it is easier to achieve this. If you do not require logon, you can store session IDs in each DB row to track the operation. In this way, you can track individual users who move on the website and perform different operations.

One of the largest indicators that can be used when tracking information by user is to view the overall activity. For example, if many users leave few historical records in the system, either creating a new one or playing too much for a movie .. . This may be doubtful. Someone may have created a large number of accounts to try to increase the score of their movies. A better solution is to discount the number of votes of new users in the overall solution.

For the movie scoring system, a simple version is to exclude all the scores of users who scored less than five times in the overall average score of a movie. The following are the SQL statements that execute this operation (warning: do not perform this operation on large databases, You need to rewrite it ):

SELECT AVG(`rating`) FROM `ratings` WHERE `movie_id` = 1     AND `user_id` IN        (SELECT `user_id` FROM `ratings`          GROUP BY `user_id` HAVING count(`id`) >= 5)


Of course, users tracking and executing operations can only help you with this step, because the website players may be quite tricky and they may expect this. To this end, it may be helpful to start tracking the IP addresses of any operation and tracking the HTTP Referrer of the operation at the same time. These may provide different perspectives for analysis operations. First, let's look at the IP address.

When you view an IP address, you can find out the number of activity activities for a single address. In this case, multiple user accounts are used. You need to try to find out whether a person has used a large number of accounts, logged on/deregistered each account, and tried to use these accounts to affect the vote. This situation requires caution. Employees or students in the company or school may use the same IP address. Of course, this is also a meaningful statistical information. If a large number of people vote for a specific object from a single IP address (or IP address range), you may encounter collusion. (In the case of movie scoring, it may be that the film company asks employees to rate their movies as 5 stars. This should not be considered as valid data .)

The HTTP referrer of the trail operation is similar. If you know which original HTTP addresses are bringing them to your website, you can also find the source of collusion. Maybe it's just a public webpage that says, "Everyone is going to vote for this ". Although its purpose may be simple, it usually leads to blind voting, rather than true and honest input. It should be noted that to do this correctly, you need to track not only the "last" HTTP referrer, because it is usually your own website (or in some AJAX scenarios, it is even blank ). Instead, you need to capture the referrer as part of the session when the user visits your website for the first time. And then record it with all operations.

You can consider adding this data to the Cine Classification Database, as shown below:

ALTER TABLE `ratings`    ADD COLUMN `ip_addr` INTEGER UNSIGNED NULL,    ADD COLUMN `refer` VARCHAR(1048) NULL;


One of the following issues needs to be considered is a bit complicated. You will notice that I define the "refer" field as a 1,048 character VARCHAR. Here, you need to make some significant sacrifices for the application. The URL may contain up to 2,000 characters. If your VARCHAR is smaller than the length listed above, you may lose data that is very important to you. Although, on the other hand, recording all this information for each score may make the database very large, which is very painful.

One solution is to record only the MD5 or SHA1 hash values of HTTP referrer. In this way, you can still programmatically detect collusion and restrict the field to 32 (or 40) characters, greatly reducing your pain. In this way, you will not be able to actually view the original URL or access the original URL to satisfy your curiosity. If you still want to obtain this information, you can create a separate query table to combine the complete URL with its associated hash values (or in this example, use only one auto-incrementing key) to push data into the query table. However, as a result, additional database overhead and maintenance work are increased. These are the sacrifices you always make when trying to track as much information as possible to fight back the website.

Pattern Matching

At this point, we have explained in detail how to seek out a system player Method Based on "Source" and "whether it is like a script. If you want to understand more profound situations, you need to go deep into the pattern matching layer.

In this example, we will find the suspicious mode in the database. The simpler mode is the timing and sorting of operations. For example, in our score system, can you see that 50 identical accounts vote for multiple movies in exactly the same order and with the same score? In this case, this is a mode that does not exist in normal website usage. Therefore, you can decide whether there is a script to perform these operations, or someone logs on to each account in sequence according to the workbook of the Set account.

Similarly, you can view the operation timing. We have discussed the speed limit before, but now we can only look at the Suspicious timing. What happens if an account accurately scores every 10 minutes? Or, what if someone randomly selects a comment from a group of accounts every other minute? This interesting timing mode can guide you through suspicious activities.

These analysis tasks do not occur in real time on your website. These tasks may be highly costly queries and need to be processed offline. They can be performed as manual flows by the administrator who searches for the activity or at night through automatic flows, mark any suspicious items found the previous day.

Friendship

One of the biggest sources of collusion (whether intentional or unintentional) is friendship. However, if your website has social skills (as many websites currently do), you will have a huge data source to identify this. Most of these websites support the concept of "dating" between accounts. In this case, you can create a tree to show who is associated with whom. You can check the entire course tree to see if the mode is displayed. All people in a circle of friends may make identical behaviors and give the same score to a movie.

On the surface, some of them will obviously happen. Real friends share similar interests, so they like similar things. But there is a limit. At some points, it will go beyond the "conventional" similar taste and enter the field of collusion. If people in a circle of friends always perform identical operations, you will not get valid data from them.

You can further extend this mode. Many websites now allow you to link your Twitter, Facebook, or other social media website accounts. It may be used as a login authentication method, or as a way to "find friends. If you do this, you will be able to perform in-depth analysis that you previously could not do. (Warning: You are here to start crossing the boundary between the white hat and the black hat .)

For example, you can track which accounts on the website are exactly linked to Twitter accounts. You can call the Twitter API to obtain a list of friends of each user. In this case, you can build a relationship chart between accounts on your website that may not serve as a friend link but are actually related to Twitter. In this way, you can combine various contacts, and these contacts are invisible to you, either because you are not very concerned about users, either the user intends to hide collusion attempts and thinks that the linked account for friends on your website will hide the relationship between them.

This will become a never-ending Data Mining source. You can explore the relationships and operations in depth and try to find the patterns hidden from you.

Set traps

The topic of the pattern is discussed in depth and how to detect the pattern after it occurs. Now let's talk about the last step. This step may be a little easier, and it is absolutely effective.

This is the concept of a trap. When it first happened, it captured the gamer. This is usually done using cookies. Here, cookies have two specific purposes that can help us track and detect people trying to play with the system.

The first use is called "Login cookie ". The idea is to set a cookie when the browser accesses the website. This cookie will be retained in the same place no matter how many times you log on or log off. (This is different from the session ID. The session ID should be generated after each access .) Then, you can save the account name and logon cookie value in the database query table. This allows you to track individual users who log on using multiple accounts (a single browser ). This can be combined with the IP address tracking mentioned above to gain a deeper look at the dynamic IP address and the gateway.

The second use of cookie is essentially similar, but there is a fundamental difference, it is the account cookie. In this case, tracing is performed when a person actually creates multiple accounts. In execution, the effect is similar to the previous one. When the browser visits your website for the first time, you will return a unique identifier in the account cookie. However, in this case, the Account creation value is recorded as part of the account information. The trick is that you reset the cookie whenever you log on to it to match the cookie value of the original account.

If the user logs out immediately (or at any time in the future) and then clicks "add account" to try to create his/her account, you can track that the user is actually the same user.

In this way, the account groups of a single user can be linked together and operated accordingly based on the information. Similarly, this requires caution. If only a few accounts are linked together, this may be a family where each person has an account. However, when the number of linked accounts increases, these accounts are doubtful.

Summary

We hope that you will never encounter any problems. Unfortunately, the world is not always so beautiful. Detection and play are never easy. You are always struggling to find out what you are playing with, And you want not to generate false positives by mistake. The method provided in this article is just a starting point. You can accept these concepts, apply them in your own applications, and then check whether users can discover anything.


Eli WhiteHe has been using PHP for a long time. He is the author of PHP 5 in Practice. He has been involved in many large PHP projects, including Digg, TripAdvisor, and harbo Space Telescope projects. He often gives speeches at PHP meetings to share his knowledge. For more information about Eli, visit eliw.com.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.