Go language crawler-what games TapTap users like

Source: Internet
Author: User

Front of the Crap

When it comes to reptiles, the first thing to think about is python~. It is very popular in the field of machine learning and reptile data analysis. But I'm learning the go language recently, so I wrote it with Go.

TapTap Community

This is a high quality game sharing community, which can be said to be steam on the phone. The above user quality is very high, the core players more, see them so attentively write so many long comments, let me marvel, so this time intends to take it to crawl data practice practiced hand, the following first look at the results

First look at the effect, the players here, all like to play what type of game ah?

According to the word frequency of the game tags in the download list :

Found single-machine, two-dimensional, MOBA, strategy and other labels more prominent

Let's add the weight of the player rating, scoring is based on the scores of tens of thousands of players, multiple games the same tag will be averaged.
See what's changed?

The word cloud is completely different, the visual dislocation, brain hole, philosophy and other labels of high score, these are the players real preferences, why add weight change so big, let us see what is the game score so high!

The original is Monument Valley, ape riding, ash and other games. and the Monument Valley (tag: visual dislocation) score unexpectedly reached 10 points!! (7,951 Ratings)

But this game also really let me convinced, even my mother, wife they do not play games, are very fond of this game ~


Then the following will be added to the scoring weights to see the real needs of the players in mind

Then analyze the new list

Game Name (based on rank weight + scoring weight)

Look at what we've analyzed and what's different on the list?

Can see, add the scoring weight, like "My Name MT4", "Ace War: Code Hero" This although ranked top, but the poor reputation of the game , almost in our analysis of the image is not seen. (So on the TapTap, even if you spend money to brush the list up, also not too much use, the player's eyes are sharp, haha ha)

Book List


Game Name (based on rank weight + scoring weight)

Here can be seen in the future market players demand, "full-time Awakening", "Fortress Night" and so are expected to be relatively high

Hot Play List

Game Name (based on rank weight + scoring weight)

"Jedi Survival, stimulation of the battlefield" is also a prominent game, it seems taptap players, but also like to eat chicken

Implementation method

Goquery Parsing HTML
Iconv-go for encoding Conversion
Sego used in Chinese participle
WordArt Achieve word cloud effect

Now to do a simple version, the full version is also want to implement the capture of a game player comments, word segmentation, emotional analysis.

Analyze the HTML structure first, find out what HTML elements are contained in the game information, and then use Goquery to parse

Using Google Chrome, press F12 can easily find the element oh

Then define a struct to hold the data

type GameInfo struct {    Rank     int      //排名    TapTapID string   //游戏ID    Name     string   //游戏名    Company  string   //公司名    Score    float64  //游戏评分    IconUrl  string   //图标地址    Type     string   //游戏类型    tags     []string //标签}

Analyze Individual game Information

Parse a game information func Parsegameinfocell (selection *goquery. Selection) {gameinfo: = gameinfo{} Namea: = Selection. Find (". Card-middle-title") Gameinfo.taptapid = Namea.attror ("href", "") Gameinfo.taptapid = Gameinfo.taptapid[strin Gs. LastIndex (Gameinfo.taptapid, "/") + 1:] Gameinfo.name = Namea.find ("H4"). Text () Gameinfo.company = Selection. Find (". Card-middle-author"). Find ("a"). Text () score, _: = StrConv. Parsefloat (selection. Find (". middle-footer-rating"). Find ("span"). Text (), Gameinfo.score = score Gameinfo.iconurl = Selection. Find (". Card-left-image"). Find ("img"). Attror ("src", "") Temprank, _: = StrConv. parseint (selection. Find (". Top-card-order-text"). Text (), ten, +) Gameinfo.rank = Int (temprank) Gameinfo.type = Selection. Find (". Card-middle-footer"). Find ("a"). Text () Tagsalist: = Selection. Find (". Card-tags"). Find ("a") Tagsalist.each (func (i int, Selectiona *goquery. Selection) {gameinfo.tags = append (Gameinfo.tags, Selectiona.text ())}) GameInfolist = Append (gameinfolist, gameinfo)//fmt. Printf ("%v\n", Gameinfo)}

But soon there was a problem because the leaderboard data was paged, and we asked for only 30 data at a time, so we found the "more" button and found that a link was asynchronously requested by Ajax to get the data.

https://www.taptap.com/ajax/top/played?page=2&total=30

Page on behalf of the number of pages, according to the total amount of 150, each page 30 can be obtained a total of 5 pages. So we can loop 5 times to request all the data.

func ReqRankPage(page int) {    res, err := http.Get("https://www.taptap.com/ajax/top/" + rankTypeName + "?page=" + strconv.Itoa(page))    if err != nil {        log.Fatal(err)    }    defer res.Body.Close()    if res.StatusCode != 200 {        log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)    }    jsonBs, err := ioutil.ReadAll(res.Body)    tPageJson := TPageJson{}    err = json.Unmarshal(jsonBs, &tPageJson)    if err != nil {        fmt.Println("解析json错误", err)    }    var htmlRead io.Reader = strings.NewReader(tPageJson.Data.Html)    doc, err := goquery.NewDocumentFromReader(htmlRead)    if err != nil {        log.Fatal(err)    }    doc.Find(".taptap-top-card").Each(func(i int, selection *goquery.Selection) {        ParseGameInfoCell(selection)    })}

All code

Package Mainimport ("bytes" "Encoding/json" "FMT" "Github.com/puerkitobio/goquery" "io" "Io/ioutil"    "Log" "Net/http" "StrConv" "Strings" "math") type Tpagejson struct {Success bool ' JSON: "Success" ' Data Tpagedatajson ' JSON: "Data" '}type tpagedatajson struct {html string ' JSON: "HTML" ' next string ' JSON: "Next "'}type gameinfo struct {rank int//rank Taptapid string//Game ID name string//game name company STR ing//company name score Float64//game rating ICONURL string//Icon address type string//game type tags []string//Tag} var gameinfolist []gameinfovar ranktypename = "reserve" var ranktypes = []string{"Download", "new", "reserve", "sell", "pla Yed "}func Main () {for _, TypeName: = Range Ranktypes {gameinfolist = []gameinfo{} ranktypename = Typena Me//Each leaderboard has 5 pages of data (based on a total of 150, 30 per page) for I: = 1; I <= 5; i++ {reqrankpage (i)}//Generate a label dictionary GeneratetagS () Generategamenames () fmt. Println ("Generate Leaderboard:", Ranktypename, "finished")}}func Generategamenames () {var tagsbuffer bytes. Buffer tagsbuffer.writestring ("word;weight\n") for _, Gameinfo: = Range Gameinfolist {//weightsize: = 150- Gameinfo.rank//Put the weight of the rank plus//weightsize: = Int (math. Ceil (Float64 (150-gameinfo.rank) * gameinfo.score))//Add the weight of the rank to weightsize: = Int (math.        Ceil (gameinfo.score*100))//Add tagsbuffer.writestring (gameinfo.name) tagsbuffer.writestring (";") to the weight of the rankings Tagsbuffer.writestring (StrConv. Itoa (weightsize)) tagsbuffer.writestring ("\ n")} WriteFile (ranktypename+ "_names_score.csv", tagsbuffer.string ())}func generatetags () {tagscountdic: = Make (Map[string]int) Tagsscoredic: = Make (Map[string]float64) var TAGSB Uffer bytes.    Buffer tagsbuffer.writestring ("word;weight;")            For _, Gameinfo: = Range Gameinfolist {for _, Tag: = Range Gameinfo.tags {tagscountdic[tag]++ TagssCoredic[tag] + = gameinfo.score*100}} for key, value: = Range Tagscountdic {tagsbuffer.writestring (k        EY) tagsbuffer.writestring (";") Tagsbuffer.writestring (StrConv. Itoa (value)) tagsbuffer.writestring (StrConv. Itoa (int (Tagsscoredic[key]/float64 (value)))) tagsbuffer.writestring ("\ n")} WriteFile (ranktypename+ "_tags_sc Ore.csv ", tagsbuffer.string ())}func WriteFile (name, content String) {data: = []byte (content) if Ioutil. WriteFile (name, data, 0644) = = Nil {fmt. Println ("Write File succeeded:", name)}}func reqrankpage (page int) {res, err: = http. Get ("https://www.taptap.com/ajax/top/" + Ranktypename + "? page=" + StrConv. Itoa (page)) If err! = Nil {log. Fatal (ERR)} defer res. Body.close () if Res. StatusCode! = $ {log. Fatalf ("Status code Error:%d%s", Res.) StatusCode, Res. Status)} Jsonbs, err: = Ioutil. ReadAll (Res. Body) Tpagejson: = tpagejson{} err = json.  Unmarshal (Jsonbs, &tpagejson)  If err! = Nil {fmt. PRINTLN ("Parse JSON error", err)} var htmlread io. Reader = strings. Newreader (tPageJson.Data.Html) doc, err: = Goquery. Newdocumentfromreader (htmlread) if err! = Nil {log. Fatal (Err)} doc. Find (". Taptap-top-card"). Each (func (i int, selection *goquery. Selection) {Parsegameinfocell (Selection)})}//parse a game information func Parsegameinfocell (Selection *goquery. Selection) {gameinfo: = gameinfo{} Namea: = Selection. Find (". Card-middle-title") Gameinfo.taptapid = Namea.attror ("href", "") Gameinfo.taptapid = Gameinfo.taptapid[strin Gs. LastIndex (Gameinfo.taptapid, "/") + 1:] Gameinfo.name = Namea.find ("H4"). Text () Gameinfo.company = Selection. Find (". Card-middle-author"). Find ("a"). Text () score, _: = StrConv. Parsefloat (selection. Find (". middle-footer-rating"). Find ("span"). Text (), Gameinfo.score = score Gameinfo.iconurl = Selection. Find (". Card-left-image"). Find ("img"). Attror ("src", "") Temprank, _: = StrConv. parseint (selectiOn. Find (". Top-card-order-text"). Text (), ten, +) Gameinfo.rank = Int (temprank) Gameinfo.type = Selection. Find (". Card-middle-footer"). Find ("a"). Text () Tagsalist: = Selection. Find (". Card-tags"). Find ("a") Tagsalist.each (func (i int, Selectiona *goquery. Selection) {gameinfo.tags = append (Gameinfo.tags, Selectiona.text ())}) Gameinfolist = Append (Gameinfolist, Gameinfo)//fmt. Printf ("%v\n", Gameinfo)}

In this way, the crawled data can be written as a file, and a single cloud of words will be generated for analysis.

Summarize

The first time to play crawler, so not very good writing, crawler and a lot of technology, this article is not involved. such as prevent anti-crawling, account landing and so on. Write this also want to write a little more go code, may later will be in my master language development

Next study crawl to take NetEase cloud music ~ hehe Hey

Go language crawler-what games TapTap users like

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.