This is a creation in Article, where the information may have evolved or changed.
"Day depends on mountain, ___". The next sentence naturally fill the Yellow River into the current, then "The sun and Moon is not flooded, ___,___, the fear of the beauty of the Twilight", the middle of two sentences how to fill it?
There is a need in the recent work, that is, 1500 Chinese poetry in the blanks have no answer, now need to give these questions to match their answers, fortunately, the topic information is complete, pointed out the origin of poetry, author information. Naturally think of crawling to the web for corresponding article information and then string matching answers. At present, the effect can also, basically all the answers to all the questions have, now the operation of the process record, do a summary.
1. Article Information acquisition
Online search for a long time, found that Baidu Chinese to the ancient poetry collection of the better, the format is relatively standard, the entire crawl process is relatively simple, browser analysis site, find their interface: http://hanyu.baidu.com/hanyu/ajax/sugs
only need to pass a parameter: Mainkey, is a string in urlencode format. The interface returns a matching list and then filters the list with the author's name, with the following detailed code:
BASEURL: ="Http://hanyu.baidu.com/hanyu/ajax/sugs?"Client: = &http. Client{}u, _: = URL. Parse (BASEURL) Q: = U.query () q.set ("Mainkey", name) U.rawquery = Q.encode ()//Add headerReq, _: = http. Newrequest ("GET", U.string (),Nil) Req. Header.add ("User-agent",' mozilla/5.0 (X11; Linux x86_64) applewebkit/537.36 (khtml, like Gecko) chrome/61.0.3163.100 safari/537.36 ') Req. Header.add ("DNT","1") Req. Header.add ("Host","Hanyu.baidu.com") Req. Header.add ("Accept-language","zh-cn,zh;q=0.8") Req. Header.add ("Referer","Http://hanyu.baidu.com/shici/detail?pid=be520db056da43238035dc18bb1e1798&tn=sug_click") resp, Errdo: = client. Do (req)
After the return value is obtained, the corresponding author information is filtered out.
//If there are multiple search results, the author is correctRespjson.foreach ( func(key, value Gjson. Result) bool {///First see if there are display_nameDisplayName: = value. Get ("display_name.0"). String () Sid: = value. Get ("sid.0"). String ()if Len(displayName) = =0{//Not this recordreturn true}//See typeTYPESTR: = value. Get ("type.0"). String ()ifTypestr = ="Poemline"{//Take sourceDisplayName = value. Get ("source_poem.0"). String () SID = value. Get ("source_poem_sid.0"). String ()}literatureauthor: = value. Get ("literature_author.0"). String ()//author is consistentifLiteratureauthor = = Author {searchresult.sid = Sidsearchresult.displayname = Displaynamesearchresult.author = Literatureauthorreturn false}return true //Keep iterating})
SearchResult saved the search results, according to the SID to get the article page, parse out the article.
func getcontent(SID String) (content string, err Error) {BASEURL: ="Http://hanyu.baidu.com/shici/detail"Result: = Make([]string,0,0) Client: = &http. Client{}u, _: = URL. Parse (BASEURL) Q: = U.query () q.set ("pid", sid) U.rawquery = Q.encode () req, _: = http. Newrequest ("GET", U.string (),Nil) Req. Header.add ("User-agent",' mozilla/5.0 (X11; Linux x86_64) applewebkit/537.36 (khtml, like Gecko) chrome/61.0.3163.100 safari/537.36 ') Req. Header.add ("DNT","1") Req. Header.add ("Host","Hanyu.baidu.com") Req. Header.add ("Accept-language","zh-cn,zh;q=0.8") Req. Header.add ("Referer","Http://hanyu.baidu.com/shici/detail?pid=be520db056da43238035dc18bb1e1798&tn=sug_click") resp, Errdo: = client. Do (req)ifErrdo! =Nil|| Resp. StatusCode! = ${err = errors. New ("Cannot connect Baidu Chinese"+ Errdo.error ())return}DOCM, Errdoc: = Goquery. Newdocumentfromresponse (RESP)ifErrdoc! =Nil{err = errors. New ("Parsing doc error"+ Errdoc.error ())return}///poetry information is stored in body_p div and can be obtained through the Puerkitobio/goquery library.Pselect: = docm. Find ("#body_p") Pselect.each ( func(pos int, selection *goquery. Selection) {content: = strings. Trimspace (selection. Text ()) result =Append(Result, content)}) Content = Strings. Join (Result,"")return}
Now will crawl Baidu Chinese, Ancient poetry two Web site data, if there is a better source of data, only need to implement spider interface, in the MapSpiderManifest()
method registration can.
typeSpiderInterface{getcontent (SearchResult) (string, error) Findcontent (string,string) (SearchResult, error)} func mapspidermanifest() map[string]Spider {//Initialize and register all spidersSpidermap: = Make(Map[string]spider)//BaiduBaiduspider: =New(Baiduspider) spidermap["Baiduspider"] = Baiduspider//Ancient Poetry networkGushiwenspider: =New(Gushiwenspider) spidermap["Gushiwenspider"] = GushiwenspiderreturnSpidermap}
2. Poetry sentence Search
Ancient prose dictation, used to go to school time to do more, put a word out, randomly selected a few of them to let students write in dictation. Can generally be categorized into several modes:
- Leave blank: _,[_, ...], who does not have hometown love.
- Empty at the end: All the way, _,[_, ...].
- The middle of the empty: the Moon out of Dongshan, _, Lu Heng,
No matter what kind of pattern, in each fill-in point of view, only it is in front of or behind the prompt sentence, we can know this empty answer is what. In other words, such a blank can be autonomous to find the answer, let's call it autonomous space. And before and after there is no hint of the empty, can only wait for a nearby independent empty find the answer, to find its own answer, with a diagram to explain more clearly:
The gray block in the figure has a hint, so you can find the corresponding answer by the step of a crawled article content, fill in blank, the specific search algorithm as shown in the following code:
//Known newfind prestring, Beg blankstring and poststring func makewithprecontent(contentssplit []string, Newfind *find) { forL: =RangeContentssplit {ifIsEqual (Contentssplit[l], newfind.prestring) && L <Len(Contentssplit)-1{newfind.blankstring = contentssplit[l+1]ifL <Len(Contentssplit)-2{newfind.poststring = contentssplit[l+2]}newfind.blankfinish =true}}}//Known newfind poststring, Beg blankstring and prestring func makewithpostcontent(contentssplit []string, Newfind *find) { forL: =RangeContentssplit {ifIsEqual (Contentssplit[l], newfind.poststring) && l >0{newfind.blankstring = Contentssplit[l-1]ifL-1>0{newfind.prestring = Contentssplit[l-2]}newfind.blankfinish =true}}}//separating content by punctuation func splitbypunctuation(S string) ([]string, [] String) {Regpunctuation, _: = Regexp.compile (' [,,..??!; ;::] ')//Match punctuation marks and save them. Then split the stringTopun: = Regpunctuation.findallstring (S,-1Result: = Regpunctuation.split (s),-1)if Len(result[Len(Result)-1]) ==0{result = result[:Len(Result)-1]}//Remove the front and back spaces, remove the quotation marks forI: =Rangeresult {Result[i] = strings. Trimspace (Result[i]) regquoting: = RegExp. Mustcompile ("[" ""] ") Result[i] = regquoting.replaceallstring (Result[i],"")}returnresult, Topun}
After all the autonomous blocks have found the answer, we can think of each autonomous block as a table header of a doubly linked list, all we have to do is iterate through each doubly linked list, and find out the answers to each of the nodes by finding the algorithm. When the next node is null, or if the next node is an autonomous block, the traversal is stopped and the next doubly linked list is processed. In this way, no matter how complex the content is given, it is possible to complete the automatic completion of the work smoothly.
3. Effects
Some commonly used writings in classical Chinese or poetry:
1, the former Chibi fu
2. Li Sao
Project Address: Ancientpoetryfillblank