The process of recording an optimizer: Millions of filter blacklist what would you think?

Source: Internet
Author: User

Problem Description:

Recently encountered a problem, in the program, there is a millions other product collection, need to filter out the product black name


Single, This blacklist is manually configured by the operating students, from the original several to hundreds of, and then to thousands, leading to the current


in the program It takes too long to filter this piece. How to shorten the filter run time of the program?


Analysis:


For example, our collection of goods is map<id,product> skumap,id for the commodity id,product is the product type


object, size is 100w+, the product blacklist list is every 10MIN to read once form long[] SKUs, array of


element is the item ID, and the array size is level thousands of.



Filter scheme:


Programme one:


Pseudo code:


for (Product P:skumap) {


for (Long Id:skus) {


if (p.getid () = = ID) {

This product is filtered out

P.setflag (FALSE);

}

}

}


This scheme is originally written, the idea is relatively simple, is the 2-layer for loop.


However, as more and more blacklists are configured, from the first few to hundreds of, and finally to thousands of levels, it leads


This code runs out of time, and it's getting longer!



Scenario Two:


How to shorten the time?


For scenario one, every item in a 100w+ commodity must be traversed for SKUs to complete.


Filter logic, resulting in the number of cycles is 100w+ * 1000+ level, that is, 1 billion + level, the number of cycles is really


It's too much! To shorten the number of cycles, you can shorten the time!


Pseudo code:


Arrays.sort (SKUs);


for (Product P:skumap) {


int result = Arrays.binarysearch (Skus,id);


if (Result < 0) {

P.setflag (FALSE);

}

}


If we get the SKUs, we sort the IDs inside the SKUs, and then each product makes two points for SKUs.


Find. To know the 1000+ blacklist, for binary search, find up to 11 times to find! is in


In the worst case, we need the number of cycles to become 100w+ * 11, which is the number of 1000w+ cycles!


In this scenario two, the number of cycles of 1 billion + is shortened to 1000w+.


Although this, but the program's filter time, the log printing, unexpectedly is 300s+, this also cannot accept!



Programme III:


Why do we take Skumap to traverse SKUs, why not blacklist to filter the collection of goods?


Pseudo code:


for (long Sku:skus) {


if (Skumap.containskey (SKU)) {

Skumap.get (SKU). Setflag (false);

}


}


The outer loop, turned into a blacklist, in the interior to know that we use HashMap find, in fact, all of a sudden, the number of cycles into 1000+. The filter time of the program is also shortened to a few seconds!



This article is from the "Boundless Mind Infinite" blog, please be sure to keep this source http://zhangfengzhe.blog.51cto.com/8855103/1707609

The process of recording an optimizer: Millions of filter blacklist what would you think?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.