Found a problem, the content of the previous crawl is written in the wrong way, it should be "WB"! Ah, actually found out, too stupid!
Json.dump (): Writes a python built-in type sequence into a Python object after it is written to the file
Json.load (): Converts string elements in JSON form to Python type
Importurllib.requestImportJSONImportJsonpathurl="Https://www.lagou.com/lbs/getAllCitySearchLabels.json"Headers= {"user-agent":"mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.181 safari/537.36"}request= Urllib.request.Request (URL, headers=headers) HTML=Urllib.request.urlopen (Request). Read ()#Html.decode ("Utf-8")#HTML = bytes (HTML, encoding= "Utf-8")#html = html.decode ("GBK")With open ("Lagou.txt","WB") as F:f.write (HTML)#converts a JSON-formatted string into a Python-form Unicode stringUnicodestr =json.loads (HTML) city_list= Jsonpath.jsonpath (Unicodestr,"$.. Name") forIteminchcity_list:Print(item)#dumps () default Chinese is ASCII encoded format#dumps direct operation, returns a Unicode stringArray = Json.dumps (City_list, ensure_ascii=False) with open ("Lagou.json","WB") as F:#convert Unicode to Utf-8F.write (Array.encode ("Utf-8"))
XPath fuzzy query:
Div[contains (@ the label or property name to find, the string to match)]
Reptile--json, Jsonpath, XPath fuzzy query