Odd number of Likes
Life is not as good as the ten, but there are no two or three people with the prophets. Fortunately, we were born in the internet era, the reality can not find to talk to people can also be found in the network of emotional vent, tree hole such products is to provide a person on the network anonymous talk on the platform.
I found the platform by accident: http://www.6our.com/, feel more miserable when they look at the other people's unhappy, found that God is still very much care about their own (also do not know that China is not in his old jurisdiction). But I found a strange phenomenon: The Secret has a like and stepping function, but I see the secret of the praise did not find that there are less than 2, and then try to send a, found just issued to have two praise, so I guess the site's developers set a secret to the number of praise when the amount is 2, But as a dead-minded programmer, think that only hand-proven is credible, so I want to verify my thoughts, so with the shell to write a crawler, crawling all the secret of the number of praise, the crawler code is as follows:
#! /bin/bash########################################################## Tree Hole network likes to crawl ############################### ########################## envcd ' dirname $ ' source utils.sh# initialize the number of threads control, use 10 threads concurrent crawl to avoid killing the tree hole website Init_thread 250 10# Initialize business-related variables url= "http://www.6our.com/qiushi?&p=" total_page_num= ' Curl_ "${url}1" | Grep-oe "<a href= '/qiushi\?\&p=2480 ' > Last page </a>" | Grep-oe "[0-9]+" ' Log "Total_page_num $total _page_num" # Start crawl list for page_num in ' seq 1 $total _page_num ';d oread-u250{cur_pa Ge_url= "${url}${page_num}" log "URL ${cur_page_url} begin" Curl_ $cur _page_url | Grep-oe "id=\" yes-[0-9]+\ ">[0-9]+" | Sed-n ' s/id= ' yes-//; s/">//P >> shudong-id-yes.datalog" url ${cur_page_url} end "echo" >&250}&donewaitlog "All Done"
Common libraries that need to be introduced:
################################## Tool Library, used to store some common methods ################################## ha! Simple Log4shelllog () {echo "[' Date + '%F%T '] $"}# encapsulated thread controller # $ $ to use the number of threads to use Init_thread () {pipe_num=$1thread_num=$ 2fifo_path= "/tmp/fifo_path_ ' Date +%s ' _${1}_${2}" Mkfifo $fifo _patheval "exec ${pipe_num}<>${fifo_path}" rm $ Fifo_pathfor i in ' seq 1 $thread _num ';d oecho ' >&${pipe_num}donereturn $pipe _num}# a layer package for Curl # 1. Camouflage under u-a# 2. Simulates the behavior of the browser persistent cookie # 3. Quiet mode, do not display statistics # [email protected] will be placed in the last Curl_ () {curl-s--user-agent "mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/64.0.3282.119 safari/537.36 "-B cookie-c Cookies [email protect ed]}# similar to Stream.map (), allowing custom functions to support pipeline calls # $ function name map () {function_name=$1while read linedo$function_name "$line" done}
Look down how many comments to climb down the data:
Less than 50,000, if none of the 50,000 points of praise is less than 2, then it means that my guess is correct, OK, first look at the data format:
The first is the secret ID, the second column for this secret number of likes, filtered out how many of the second column 1:
This... This... This is embarrassing, choose some out to see if the page is displayed like this:
This is the URL pattern for the details page: http://www.6our.com/article/{article_id}, choose an id stitching URL http://www.6our.com/article/55840 go inside to see:
Really only a praise, and chose a few other to go in to see a bit of the discovery is a praise. It's OK to testify a bit, or else give the wrong conclusion.
There's something I can do.
I thought the programmer who developed the website was sending care, and the result was not. So I was thinking, can I do something for them? So I went to register an account:
Then wrote a script, to detect the contents of the page, according to the secret content to reply, send them some encouragement, the script content:
#! /bin/bash################################################################## Tree Cave encouragement Teacher ############################## #################################### env CD ' dirname $ ' source utils.sh# analog login, save Cookielogin () {username= "foo" passwd= " Bar "# Although unsure what to do with __hash__, take a look at hash_param= ' curl_ ' Http://www.6our.com/index.php/User/Index/login ' | Grep-oe "[0-9a-z]+\_[0-9a-z]+" | Tail-n 1 ' curl_-D "account=${username}&password=${passwd}&remember_me=1&submit=&__hash__= $hash _ Param "" Http://www.6our.com/index.php/User/Index/checkLogin "| grep "Login Successful" >>/dev/nullif [$-ne 0];thenlog "Login failed." Exit-1elselog "Login Success" fi}# reply Secret # secret id# $ reply Content Replay () {id=$1content=$2# detect comments to avoid repeated replies, repetition here refers to each secret reply, Instead of replying to each pattern once my_name= "tree cave encouragement" Curl_-D "id= $id" "Http://www.6our.com/index.php/Reply/showReply" | grep $my _name >>/dev/nullif [$?-eq 0];thenreturnfiresult= ' curl_-D "pid=${id}&anonymous=0&arcontent=${ Content} "" http://www.6our.com/index.php/Reply/cHeckReply2 "If [$result-eq 1];then Log" replay $id $content Success "Else Log" replay $id $content failed "fi# prevent reply Too fast sleep 3}# check for specific conditions to recover specific content # $ $ $ in Perl regular mode # $ $ reply Content Check_pattern_and_replay () {content=$1id= ' echo $content | grep- OP ' id= ' content-\d+ ' | Grep-op ' \d+ ' Pattern=$2replay_content=$3echo $content | Grep-p $pattern >>/dev/null[[$-eq 0] && replay $id $replay _content}# to a single secret detection process # SECRET element, containing ID and content proc Ess_single () {content=$1# passionate young people check_pattern_and_replay $content "need help | hinder | difficulty | dream | Try hard" come on, tomorrow will be better! "# lonely, with thousands of amorous feelings, more with who said Check_pattern_and_replay $content" (annoying | hate | dislike). * Social "It's hard to deal with people" # Suicidal Tendencies Check_pattern_and_replay $ Content "Die | suicide | I'm Dead" "Live to Hope" # Save Yan Value check_pattern_and_replay $content "long ugly" long ugly look at how I grow to help you regain confidence:) "}# monitoring the first page monitor () {while true;do curl_ "Http://www.6our.com/qiushi" | tr-d "\ r \ n" | grep-op ' id= "content-\d+" .+?</div> "| map" Proce Ss_single "Log" look first page over "Sleep 10done}loginmonitor
Effect:
All script code has been put into Github:https://github.com/cc11001100/6our-robot
Note:
Wrote a "foo|" while debugging the regular. Type of regular, resulting in a part of the test when the wrong comment, see after the quick Ctrl-C, but still have a few comments deleted, learn from, later careful.
.
The Linux shell crawler realizes the tree Cave net automatic reply robot