Crawl treasure NET with personal data display

Source: Internet
Author: User

Golang Crawl to cherish the network, climbed to more than 30,000 user information, and coexist to the elasticsearch, such as, query to more than 30,000 user information.

Image.png

Let's take a look at the final effect:

42.gif

Using the HTML Template Library for the Go language:

To perform a template rendering:

func (s SearchResultView) Render (w io.Writer, data model.SearchResult) error {    return s.template.Execute(w, data)}

Model. The SearchResult data structure is as follows:

type SearchResult struct {    Hits int64    Start int    Query string    PrevFrom int    NextFrom int    CurrentPage int    TotalPage int64    Items []interface{}    //Items []engine.Item}
"' html<! DOCTYPE html>

The variables, functions, judgments and loops in the template syntax are used.

Definition of template function:
The previous page in the template code, the next page of the A-tag href used the custom template function add and sub to get the page number of the previous page and the next page, to the background (this is not implemented in JavaScript).

The functionality provided in the Html/template package is limited, so it is often necessary to use user-defined functions to assist in rendering the page. Here's how the template function is used. Template package is supported when creating new templates. The Funcs method is used to import a custom collection of functions into the template, and subsequent files rendered through the template support the direct invocation of these functions.

function declaration

// Funcs adds the elements of the argument map to the template's function map.// It panics if a value in the map is not a function with appropriate return// type. However, it is legal to overwrite elements of the map. The return// value is the template, so calls can be chained.func (t *Template) Funcs(funcMap FuncMap) *Template {    t.text.Funcs(template.FuncMap(funcMap))    return t}

The Funcs method is used to create our template function, which requires a parameter of type Funcmap:

// FuncMap is the type of the map defining the mapping from names to// functions. Each function must have either a single return value, or two// return values of which the second has type error. In that case, if the// second (error) argument evaluates to non-nil during execution, execution// terminates and Execute returns that error. FuncMap has the same base type// as FuncMap in "text/template", copied here so clients need not import// "text/template".type FuncMap map[string]interface{}

How to use:

Define two functions add and sub in the Go Code:

//减法,为了在模板里用减1func Sub(a, b int) int {    return a - b}//加法,为了在模板里用加1func Add(a, b int) int {    return a + b}

Template binding Template functions:

Creating a Funcmap type of Map,key is the name of the template function, and value is the name of the function you just defined.
Inject the funcmap into the template.

filename := "../view/template_test.html"template, err := template.New(path.Base(filename)).Funcs(template.FuncMap{"Add": Add, "Sub": Sub}).ParseFiles(filename)if err != nil {    t.Fatal(err)}

How to use the template:

As in the previous page of the HTML template:

{{Sub .CurrentPage 1}}

Add 1 to the CurrentPage value after rendering

Attention:

1, the function of injection, must be before the parsefiles, because the parsing template, the need to first inject the function.

2. Template object can have multiple templates in it and each one has a name. If you are in the implementation of Parsefiles, you see that it uses the filename as the template name inside of the TEMP Late object. So, name your file the same as the template object, (probably not generally practical) or else use executetemplate instead of just Execute.

3. The name of the template is the bare filename of the template, and not the complete path. If the template name is incorrectly written, it will appear when executed:

error: template: “…” is an incomplete or empty template

Especially 3rd, I met today, the template name to use the file name, not the name with the path, see the following code:

Func TestTemplate3 (t *testing. T) {//filename: = "crawler/frontend/view/template.html" FileName: = ". /view/template_test.html "//file, _: = OS. Open (filename) t.logf ("basename:%s\n", path. Base (filename)) TPL, err: = template. New (filename). Funcs (template. funcmap{"Add": Add, "Sub": Sub}). Parsefiles (filename) if err! = Nil {t.fatal (Err)} page: = Common. searchresult{} page. Hits = 123 page. Start = 0 Item: = engine. Item {Url: "http://album.zhenai.com/u/107194488", Type: "Zhenai", Id: "107194488", Payload : Model.            profile{Name: "Neon", age:28, height:157, Marriage: "Unmarried", Income: "5001-8000 yuan", Education: "Secondary School", Occupation: "Program Daughter", Gender: "Female"         , House: "Has bought a house", Car: "Has bought cars", Hukou: "Xuhui District, Shanghai," Xinzuo: "Aquarius", },} page. currentpage = 1 page. TOtalpage = Ten page. Items = Append (page. Items, item) afterhtml, err: = OS. Create ("template_test1.html") if err! = Nil {t.fatal (err)} TPL. Execute (afterhtml, page)}

This is the template. New (filename) passed in the file name (which is defined as the file name with the path), causing the template_test1.html file to be empty after executing the code, and of course the test class passed, but when this is rendered to the browser, it is reported:

 template: “…” is an incomplete or empty template

So, to use the file's basename, that is:

tpl, err := template.New(path.Base(filename)).Funcs(template.FuncMap{"Add": Add, "Sub": Sub}).ParseFiles(filename)

The template_test1.html is rendered with content after running the code.

Other syntax: variables, judgments, circular usage is relatively simple, I do not encounter problems, other syntax, such as: template nesting, I do not have to do, and do not repeat here.

Problems encountered by the query:

Because the query shows 10 records per page, the 1000th page of the query is normal, when the query is greater than or equal to 1001 pages, the following error is reported:


Image.png

With the Restclient tool, the error is more obvious:

{"Error": {"Root_cause": [{"Type": "Query_phase_execution_exception", "Reason": "Result win Dow is too large, from + size must are less than or equal to: [10000] but was [10010]. See the Scroll API for a more efficient-to request large data sets. This limit can is set by changing the [Index.max_result_window] index level setting. "}]," type ":" Search_pha Se_execution_exception "," Reason ":" All Shards Failed "," phase ":" Query "," grouped ": true," Failed_shards " : [{"Shard": 0, "index": "Dating_profile", "Node": "bjhldvt6qeartvhmbkht4q", "Reason ": {" type ":" Query_phase_execution_exception "," Reason ":" Result window is too large, from + size mu St is less than or equal to: [10000] and was [10010]. See the Scroll API for a more efficient-to request large data sets. This limit can is set by changing the [Index.max_result_window] index level setting. "}}"  }, "Status": 500} 

Asked Google after the discovery, is due to the elasticsearch of the default depth of the paging mechanism caused by the limitations. Es default paging mechanism a disadvantage is that, for example, there are 5,010 data, when you only want to fetch the NO. 5000 to No. 5010 data, ES will also load the first 5,000 data into memory, so ES in order to avoid the user's over-the-page request caused by the ES service machine memory overflow, By default, the number of deep pagination is limited, the default maximum number of bars is 10,000, which is exactly the reason why the result window is too large exception when obtaining the 10,000th data in the problem description. (Since the page is 1001 pages in background 1001-1 and then multiplied by 10 as the from value of the query es, and es by default requires From+size to be less than Index.max_result_window: the maximum window value).

To solve this problem, you can change the Index.max_result_window maximum window value for ES default deep paging using the following method

curl -XPUT http://127.0.0.1:9200/dating_profile/_settings -d '{ "index" : { "max_result_window" : 50000}}'

The dating_profile here is index.

Where My_index is the index name to be modified, 50000 is the number of new windows to adjust. After you have adjusted the window, you will be able to troubleshoot problems with data that cannot be obtained after 10,000.

Precautions

Solved our problem through the above, but also introduced another problem that needs our attention, after the window value is increased, although the number of data bar request to paging more, but it is at the expense of more server memory, CPU resources to exchange. Consider whether a large paging request in a business scenario will cause a OutOfMemory problem with the Cluster service. Deep paging is also discussed in ES's official documentation

Https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

Https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

The core views are as follows:

Depending on the size of your documents, the number of shards, and the hardware is using, paging to 50,000 res Ults (5,000 pages) deep should is perfectly doable. But with big-enough from values, the sorting process can become very heavy indeed, using vast amounts of CPU, memory, and Bandwidth. For the reason, we strongly advise against deep paging.

This point of view means that depending on the size of the document, the number of shards and the hardware used, paging between 10,000 and 50,000 results (1,000 to 5,000 pages) should be entirely feasible. However, in terms of values, using a lot of CPU, memory and bandwidth, the classification process does become very important. For this reason, we strongly recommend that you do not make deep paging .

Es as a search engine, the more appropriate scenario is to use it for searching, rather than large-scale result traversal. In most scenarios, it is not necessary to get more than 10,000 result items, for example, to return only the first 1000 results. If you do need a lot of data to traverse the display, consider whether other more appropriate storage can be used. Or, depending on the business scenario, you can use the Elasticsearch scrolling API (like an iterator, but with a time window concept) instead.

The problem that is shown here is solved by:

Image.png

Project code See: Https://github.com/ll837448792/crawler

Finally, CSDN resources, collected a huge amount of learning materials, if you are ready into it pit, inspirational to become a good program ape, then these resources are suitable for you, including Java, go, Python, springcloud, elk, embedded, big data, interview materials, front-end and other resources. At the same time we formed a technical exchange group, there are many big guys, will not regularly share technical articles, if you want to learn to improve together, you can pay attention to the following public number after the reply "2", access.

I am a small bowl of soup, we study together, sweep code attention, wonderful content the first time to push you

Image.png

Reference article:

Golang template Function Use example:
https://blog.csdn.net/wj199395/article/details/75040723

Getting Started with Go templates | Hugo 21:
http://www.g-var.com/posts/translation/hugo/hugo-21-go-template-primer/

Common basic syntax for Golang templates (template):
https://studygolang.com/articles/8023

Resolve result window is too large problem in Elasticsearch deep paging mechanism
https://blog.csdn.net/lisongjia123/article/details/79041402

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.