Anti-crawler: Using the filter and cache of ASP. (Into the pit)

Source: Internet
Author: User
Tags httpcontext

Background Introduction :

In order to balance the contributions and requests of community members, we helped introduce the help coins. When the user points (Help point) reached a certain amount, it will "drop" a certain number of "Help coin". In order to increase the interest, help the coin "drops" after all users can "pick up", who first pick up who belongs to.

But this creates a problem, because this "help coin" can be bought and sold valuable, so there will inevitably be malicious users with the crawler constantly scanning, resulting in this situation:

Note: After verification, Steve Jobs's classmates actually did not use the crawler, is the manual point, points out! What else can you say? Can only express admiration ah admire ...

So we need a mechanism to stop this kind of crawler behavior.

General idea :

This problem we have a very convenient premise: only registered users can "pick up" help coins. Therefore, we do not need to "IP" (need to obtain a real IP) this way to block crawler crawling, but directly seal registered users, very convenient.

So how do you tell if a request is a real user or a crawler? We decided to use the simplest method: record the frequency of visits. When a user's access frequency is higher than the set value (for example: 5 minutes 10 times), it is determined that the user "is a reptile suspect."

In addition, in order to prevent false positives (there are indeed user deft on hand), we should also give the user a "unlock" function: By entering a verification code to determine not a crawler.

Detail Design :

One of the most important questions is: What do you use to record the frequency of a user's visit ?

Database? Feel unnecessary, this data does not need long-term retention, access to do once I/O operation is not acceptable performance, so we decided to use memory.

But what data structure does it need to record?

Finally, we chose to use the cache to record the simplest "User ID-and-access" key-value pair to solve this problem because:

    • By using the automatic Purge (expire) feature of the cache, the expired data is purged and the number of accesses to the record is always within a certain amount of time.
    • Cache reads and writes fast, with no stress on performance

Of course, there are still a few problems here. For example, assume that the cache time is 5 minutes and the maximum number of accesses is 10 times. 0:10, start the cache access times, has been cumulative, to 0:14, a total of 7 visits, no problem; however, once 0:15, the cache is emptied, 0:16, the cache has only 0:15 to 0:16 this minute data, there is no past 5 minutes (from 0:11 to 0:16) data. So users can control the crawler, Access 9 times, and then rest, 5 minutes later, then continue to visit 9 times, and then rest for 5 minutes ...

Oh ~ ~ really so, I really have no way? But if such a frequency he can accept, I actually do not matter, you slowly climb Bai. Or, we do more monitoring in the background, each user's visits are recorded, statistics, to find anomalies. At that time, you might really need a database (in order to improve performance, you can put a DataTable in memory and synchronize to database regularly). But for the time being, it is not necessary.

In addition, there is a question, is not only need to record user access frequency?

If, in the above scenario, the frequency of access is recorded in the cache, and the data is cached to determine whether or not to allow continued access, there is a problem: once the cache expires, the user will have free access to the target page! Equivalent to expire automatically unlocked.

I think this is still unscientific, if it is identified as a crawler, can only be manually unlocked (identification code verification). So add a locked (Locked) field to the Database user table, update it to the current time if the user is locked out, or null if unlocked (after unlocking).

Specific Implementation :

For reuse, we need to use authorize Fitler to check and record in its onauthorization () method.

The code itself should be relatively simple, if...else ... The logic:

            /// 1. Based on the database to check whether the current user            is locked /// 2. If locked, intercept directly. Otherwise:            ////3. Check the cache for the current user's access            record ///      3.1 No, create a new cache. Otherwise:            ///     3.2 Check the number of            times that the user has visited //          3.2.1 If the number of access limits has been reached, Intercepts and locks the user in the database. Otherwise             ///          3.2.2 Cumulative number of user visits

The condensed comment code is as follows:
     Public classNeedlogon:authorizeattribute { Public Override voidonauthorization (AuthorizationContext filtercontext) {httpcontextbase context=Filtercontext.httpcontext; ///AUTOFAC related operations, obtaining a isharedservice instance of a fetchIsharedservice Service = autofacconfig.container.resolve<isharedservice>(); _navigatormodel Model= Service. Get ();//get the current user information from the database            ///truncated programming to reduce if...else {} nesting            if(model. Locked.hasvalue) {///model. Locked from the database, the user has been locked, interceptedVisittoomuch (Filtercontext); return; }            stringCacheKey = Cachekey.max_visit +model.            Id; ///very interesting, you cannot use the int value type directly, you must use a reference typevisitcounter amount; if(Context. Cache[cachekey] = =NULL) {Amount=Newvisitcounter {Value =1 }; ///Create a new cacheContext. Cache.Add (CacheKey, amount,NULL, DateTime.Now.AddSeconds (config.seconds), Cache.noslidingexpiration, CacheItem Priority.normal,NULL); }            Else{Amount= Context. Cache[cachekey] asVisitcounter; if(Amount. Value >=config.maxvisit) {///Lock the user in the databaseservice.                    Lockcurrentuser ();                    Baseservice.commit (); ///Clear Cache Nowcontext.                    Cache.remove (CacheKey);                    Visittoomuch (Filtercontext); return;} Else                {                    ///cannot be used: currentvisitamount++; ///context.                    Cache[cachekey] = Currentvisitamount; /// value++; }            }        }    }     Public classVisitcounter { Public intValue {Get;Set; } }

Look closely at the code and you will find two questions. This is my brother Fei's Hole! O (╥﹏╥) o

1, why to introduce Visitcounter class?

The cache holds an instance of this class, which in fact wraps an int Value; Why, what is this? Why not just use int? Can't you just save int in the cache?

Oh, no! 艹.

Save it, no problem, take it out, it's okay, but there's a problem with the update (cumulative). How do you update it?

            // Remove Cache            Currentvisitamount = Convert.ToInt32 (context. Cache[cachekey]);             // Accumulate            currentvisitamount++;             // save it in            . Context. Cache[cachekey] = Currentvisitamount;

This is not possible, the specific explanation look here: Cached item never expiring.

Simply put, the context. Cache[cachekey] = Currentvisitamount; This sentence is equivalent to re-inserting a cache that never expires. Never thought of it! This bug to fly elder brother almost crazy, originally cache debugging is very troublesome, but also engaged in this kind of moth.

So what is the solution? Save a reference type value in the cache, and then change the value of the reference class instance to OK without changing the cache. The code is not duplicated.

2. While locking the user, clear the user's cache

Here ah, once walked a little detour.

I started by clearing the user's cache when unlocking the user.

        [Needlogon]        public  actionresult Unlock ()        {            string userId = Getcurrentuserid ();             string CacheKey = cachekey.max_visit + userId;            HttpContext.Cache.Remove (CacheKey);             return View (new  Imagecodemodel ());        }

The result does not know the bring, the time is not the spirit. I put the local code, connected to the server database, open debug mode, one step at a glance, OK, no problem; but publish the local code to the server, Duang. Can not debug, only write log what, the pit I do not want to ...

It was suddenly discovered that there was a "taste of Bad Code": repetition. Do you see the construction of this cachekey in Needlogon.onauthorization ()? Is the reused code supposed to be encapsulated? So, to start, is to get a way out to get cachekey, such as Striing Getvisitlimitcachekey () What, but this method to let the controller in the Unlock () and the Onauthorization () in the filter can be called, where to put it?

Suddenly a flash of light: Why Cache.remove to write in unlock () it?

In fact, as long as the user is locked, his cache information is useless. Because we have identified in the database he was locked, so Needlogon.onauthorization () intercept him, do not need the cache! Removing the cache as early as possible can improve performance a little bit.

Most crucially, the code is more compact: Cacheke is used in the same method, the cache operation is done in the same method class, avoiding the code dispersion coupling, and more elegant!


Last of all, please help a little favor, I do a small survey: Would you like to be a "good person"?

Forgot to give the registrant and the invitation code: FEI, 1786. or click Register directly.

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.