Anti-crawler: Using the filter and cache of ASP. (Into the pit)

Source: Internet
Author: User
Tags httpcontext


Background Introduction :






In order to balance the contributions and requests of community members, we helped introduce the help coins. When the user points (Help point) reached a certain amount, it will "drop" a certain number of "Help coin". In order to increase the interest, help the coin "drops" after all users can "pick up", who first pick up who belongs to.






But this creates a problem, because this "help coin" can be bought and sold valuable, so there will inevitably be malicious users with the crawler constantly scanning, resulting in this situation:






Note: After verification, Steve Jobs's classmates actually did not use the crawler, is the manual point, points out! What else can you say? Can only express admiration ah admire ...



So we need a mechanism to stop this kind of crawler behavior.






General idea :






This problem we have a very convenient premise: only registered users can "pick up" help coins. Therefore, we do not need to "IP" (need to obtain a real IP) this way to block crawler crawling, but directly seal registered users, very convenient.



So how do you tell if a request is a real user or a crawler? We decided to use the simplest method: record the frequency of visits. When a user's access frequency is higher than the set value (for example: 5 minutes 10 times), it is determined that the user "is a reptile suspect."



In addition, in order to prevent false positives (there are indeed user deft on hand), we should also give the user a "unlock" function: By entering a verification code to determine not a crawler.









Detail Design :






One of the most important questions is: What do you use to record the frequency of a user's visit ?



Database? Feel unnecessary, this data does not need long-term retention, access to do once I/O operation is not acceptable performance, so we decided to use memory.



But what data structure does it need to record?



Finally, we chose to use the cache to record the simplest "User ID-and-access" key-value pair to solve this problem because:


    • By using the automatic Purge (expire) feature of the cache, the expired data is purged and the number of accesses to the record is always within a certain amount of time.
    • Cache reads and writes fast, with no stress on performance


Of course, there are still a few problems here. For example, assume that the cache time is 5 minutes and the maximum number of accesses is 10 times. 0:10, start the cache access times, has been cumulative, to 0:14, a total of 7 visits, no problem; however, once 0:15, the cache is emptied, 0:16, the cache has only 0:15 to 0:16 this minute data, there is no past 5 minutes (from 0:11 to 0:16) data. So users can control the crawler, Access 9 times, and then rest, 5 minutes later, then continue to visit 9 times, and then rest for 5 minutes ...



Oh ~ ~ really so, I really have no way? But if such a frequency he can accept, I actually do not matter, you slowly climb Bai. Or, we do more monitoring in the background, each user's visits are recorded, statistics, to find anomalies. At that time, you might really need a database (in order to improve performance, you can put a DataTable in memory and synchronize to database regularly). But for the time being, it is not necessary.






In addition, there is a question, is not only need to record user access frequency?



If, in the above scenario, the frequency of access is recorded in the cache, and the data is cached to determine whether or not to allow continued access, there is a problem: once the cache expires, the user will have free access to the target page! Equivalent to expire automatically unlocked.



I think this is still unscientific, if it is identified as a crawler, can only be manually unlocked (identification code verification). So add a locked (Locked) field to the Database user table, update it to the current time if the user is locked out, or null if unlocked (after unlocking).






Specific Implementation :






For reuse, we need to use authorize Fitler to check and record in its onauthorization () method.



The code itself should be relatively simple, if...else ... The logic:


///1. First check if the current user is locked according to the database.
             ///2. If it is locked, intercept it directly. otherwise:
             ///3. Check the cache for the current user's visit count
             /// 3.1 No, create a new one for his cache. otherwise:
             /// 3.2 Check the number of visits the user has visited
             /// 3.2.1 If the access limit has been reached, intercept and lock the user in the database. otherwise
             /// 3.2.2 Accumulate user visits




The condensed comment code is as follows:
Public class NeedLogOn : AuthorizeAttribute
    {
        Public override void OnAuthorization(AuthorizationContext filterContext)
        {
            HttpContextBase context = filterContext.HttpContext;

            ///Autofac related operations, get the ISharedService instance being fetched
            ISharedService service = AutofacConfig.Container.Resolve<ISharedService>();
            _NavigatorModel model = service.Get(); //Get the current User information from the database

            ///Truncate programming, reducing the nesting of if...else{}
            If (model.Locked.HasValue)
            {
                ///model.Locked comes from the database, the user has been locked, intercepted
                visitTooMuch(filterContext);
                Return;
            }

            String cacheKey = CacheKey.MAX_VISIT + model.Id;

            ///Very interesting, you can't use the int value type directly, you must use the reference type
            VisitCounter amount;
            If (context.Cache[cacheKey] == null)
            {
                Amount = new VisitCounter { Value = 1 };
                ///Create a new Cache
                context.Cache.Add(cacheKey, amount, null,
                    DateTime.Now.AddSeconds(Config.Seconds),
                    Cache.NoSlidingExpiration, CacheItemPriority.Normal, null);
            }
            Else
            {
                Amount = context.Cache[cacheKey] as VisitCounter;
                If (amount.Value >= Config.MaxVisit)
                {
                    ///Lock the user in the database
                    service.LockCurrentUser();
                    BaseService.Commit();

                    ///Clear Cache immediately
                    context.Cache.Remove(cacheKey);

                    visitTooMuch(filterContext);
                    Return;}
                Else
                {
                    ///Cannot be used: currentVisitAmount++;
                    ///context.Cache[cacheKey] = currentVisitAmount;
                    ///See: https://stackoverflow.com/questions/2118067/cached-item-never-expiring
                    amount.Value++;
                }
            }
        }
    }

    Public class VisitCounter
    {
        Public int Value { get; set; }
    }


Look closely at the code and you will find two questions. This is my brother Fei's Hole! O (╥﹏╥) o



1, why to introduce Visitcounter class?



The cache holds an instance of this class, which in fact wraps an int Value; Why, what is this? Why not just use int? Can't you just save int in the cache?



Oh, no! 艹.



Save it, no problem, take it out, it's okay, but there's a problem with the update (cumulative). How do you update it?


/ / Take out the cache
             currentVisitAmount = Convert.ToInt32(context.Cache[cacheKey]);

             //Accumulate
             currentVisitAmount++;
             //Save it again
             context.Cache[cacheKey] = currentVisitAmount;


This is not possible, the specific explanation look here: Cached item never expiring.



Simply put, the context. Cache[cachekey] = Currentvisitamount; This sentence is equivalent to re-inserting a cache that never expires. Never thought of it! This bug to fly elder brother almost crazy, originally cache debugging is very troublesome, but also engaged in this kind of moth.



So what is the solution? Save a reference type value in the cache, and then change the value of the reference class instance to OK without changing the cache. The code is not duplicated.






2. While locking the user, clear the user's cache



Here ah, once walked a little detour.



I started by clearing the user's cache when unlocking the user.


[NeedLogOn]
        public ActionResult Unlock()
        {
            string userId = getCurrentUserId();
            string cacheKey = CacheKey.MAX_VISIT + userId;
            HttpContext.Cache.Remove(cacheKey);

            return View(new ImageCodeModel());
        } 



The result does not know the bring, the time is not the spirit. I put the local code, connected to the server database, open debug mode, one step at a glance, OK, no problem; but publish the local code to the server, Duang. Can not debug, only write log what, the pit I do not want to ...


It was suddenly discovered that there was a "taste of Bad Code": repetition. Do you see the construction of this cachekey in Needlogon.onauthorization ()? Is the reused code supposed to be encapsulated? So, to start, is to get a way out to get cachekey, such as Striing Getvisitlimitcachekey () What, but this method to let the controller in the Unlock () and the Onauthorization () in the filter can be called, where to put it?



Suddenly a flash of light: Why Cache.remove to write in unlock () it?



In fact, as long as the user is locked, his cache information is useless. Because we have identified in the database he was locked, so Needlogon.onauthorization () intercept him, do not need the cache! Removing the cache as early as possible can improve performance a little bit.



Most crucially, the code is more compact: Cacheke is used in the same method, the cache operation is done in the same method class, avoiding the code dispersion coupling, and more elegant!






++++++++++++++++++++






Last of all, please help a little favor, I do a small survey: Would you like to be a "good person"?



Forgot to give the registrant and the invitation code: FEI, 1786. or click Register directly.








Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.