Here, I will not explain too much code, there is no need to say that the principle can be.
Adblock's ad blocker is actually divided into 2 parts:
1, for the URL request interception
This is typically a page in which the DIV element embeds a ifrmae/image element and then loads an ad link or gif image or something.
This section of the rule base description is more complex. There are about tens of thousands of rules, even for domestic use, there may be 1000 or so.
But I don't think it's necessary to complicate this, by extracting the most common 100, and then using these 100 blocks to block 80% requests.
The syntax for these rules is basically a string prefix or suffix match for domain and path in the URL. Some additional attributes may be included (in theory, these attributes should be available from the Networkrequest object)
The authors of AdBlock plus actually use JavaScript to map these URL matching rules to regular expressions, and then use regular expressions to filter the target URLs.
Of course, it can also be implemented in Java. As already mentioned, the regular expression is based on the prefix or suffix, the prefix can be used trie tree, suffix can be reversed and treated as a prefix.
For keyword in queries, you can use an AC algorithm based on a compiled automaton (I think the AC algorithm is actually a simplified version of the key1|key2|key3|...| Keyn this form of regular expression only)
Once the match is on, return true in the shouldoverrideurlloading function, the representation has been processed, and is not actually processed, thus masking the URL request. (Shouldoverrideurlloading is primarily used for special scheme protocols, which are misused for URL request interception), as if there were additional shouldignorenetworkrequest
2, for the page DOM embedded advertising content processing
Based on the rule, navigate to these DOM elements by CSS3 selector, and then set their display equal to none!important.
AdBlock plus ad filtering for page content is site-specific, that is, an exact match for the domain string. It is enough to use a simple hashmap performance here.
The problem is that some ad content is delayed loading (through SetTimer), for this part of the content, only a few seconds to wait for the ad content, only to inject the execution JS script.
Of course, this method is not very good, preferably in the browser kernel to do a DOM mutation event monitoring daemon: If a new DOM node is detected to join, of course, must be in the DOM Content loaded after sending a notification to the client, The client then dispatches the re-execution of the ad masking script.
The above 2 classes of processing appear to be valid and actually invalid. Imagine a site that requires users to have access to an Ad server to improve a special cookie that allows users to access resources. So the 1th method is actually useless. For the 2nd kind is simpler, in principle, if the ads and normal content mixed together, you are not able to use computer algorithms to distinguish who is the advertising who is normal content, unless the employing to maintain. or legislation.
From a personal point of view, a part of the ad let it download the display is not, if users feel it annoying, provide a rule to add UI interaction portal. And for the pop-up window, flashing gif this kind of more disgusting, kill can also.
Implementation principle of Adblock ad blocker plugin