Python3.5 + tesseract + adb: Watermelon video or assistant answer by the leader, python3.5tesseract

Last Update:2018-01-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have been involved in the recent Q & A games, and some questions are really difficult to answer, but 10 seconds is not enough for Baidu, so I wrote a secondary ticket, in this way, when a question occurs, Baidu automatically takes 2 seconds. The remaining 7 or 8 seconds can be analyzed and answered, improving the probability of winning.

For the source code, see my github: click the link

Analysis of the Principle: Use the adb command to capture the video playing interface of the mobile phone, and then obtain the question and answer through python interception and ocr. Then Baidu gets the result. How to set up this environment? contact me if you have any children's shoes you need. because local ocr is used, parsing is free of money and there are no restrictions on use.

Code on github

Ocr_b1_py, which automatically goes to Baidu Based on the question, then opens a browser to display the search results

#-*-Coding: UTF-8-*-import pytesseractimport timeimport webbrowserimport subprocessfrom PIL import Imagedef main (): "main function" op = yes_or_no? ') If not op: print ('bye') return # core recursion ocr_subject_parent () # for root, sub_dirs, files in OS. walk ('e:/temporarily received file/zhihu answer/million/'): # for file in files: # print ('found image:' + file) # img = Image. open ('e:/temporarily received file/zhihu answer/million/'+ file) # ocr_subject (img) def yes_or_no (prompt, true_value = 'y ', false_value = 'n', default = True ): "check whether the startup program is ready" default_value = true_value if default else false_value prompt = '{}{}/{} [{}]: '. Format (prompt, true_value, false_value, default_value) I = input (prompt) if not I: return default while True: if I = true_value: return True elif I = false_value: return False prompt = 'Please input {}or {}:'. format (true_value, false_value) I = input (prompt) def screenImg (true_value = '', default = True): prompt = 'when a question occurs, press enter to identify 'I = input (prompt) if not I: return default while True: If I = true_value: return True else: return False I = input (prompt) def ocr_subject (p ): # The interception distance starts from 530 and ends at 940 # The interception distance ends at 260. p = cut_img (p) pytesseract. pytesseract. tesseract_cmd = 'e:/Program Files (x86)/Tesseract-OCR/tesseract 'subject = pytesseract. image_to_string (p, lang = 'chi _ sim ') subject = "". join (subject. split () subject = subject. split ('. ') [1] print (subject) openPage (subject) ocr_subject_p Arent () def ocr_subject_parent (): result = screenImg () if result: start = time. time () # screenshot. check_screenshot () process = subprocess. popen ('adb shell screencap-p', shell = True, stdout = subprocess. PIPE) binary_screenshot = process. stdout. read () binary_screenshot = binary_screenshot.replace (B '\ r \ n', B' \ n') f = open('autojump.png ', 'wb') f. write (binary_screenshot) f. close () # screenshot. pull_scr Eenshot () img = Image.open('autojump.png ') print ("time consumed:" + str (time. time ()-start) ocr_subject (img) def openPage (subject): url = 'https: // www.baidu.com/s? Wd = {}'. format (subject) webbrowser. open (url) webbrowser. get () def cut_img (img): region = img. crop (70,260,102 5, 570) # region. save ("temp/cut_first.png") return regionif _ name _ = '_ main _': main ()

Ocr_bw2.py: Search Baidu Based on the question + answer, crawl Baidu's indexed data through crawlers, and print the results on the console.

_ Author _ = 'zjy' #-*-coding: UTF-8-*-import pytesseractimport timeimport webbrowserimport subprocessfrom PIL import Imageimport urllibimport urllib. requestimport threadingfrom urllib. parse import quotedef main (): "" main function "op = yes_or_no ('Make sure that ADB is enabled on your phone and your computer is connected, ''then open the watermelon video and use this program. Are you sure you want to start? ') If not op: print ('bye') return # core recursion ocr_subject_parent () # for root, sub_dirs, files in OS. walk ('e:/temporarily received file/zhihu answer/million/'): # for file in files: # print ('found image:' + file) # img = Image. open ('e:/temporarily received file/zhihu answer/million/'+ file) # ocr_subject (img) def yes_or_no (prompt, true_value = 'y ', false_value = 'n', default = True ): "check whether the startup program is ready" default_value = true_value if default else false_value prompt = '{}{}/ {} [{}]: '. Format (prompt, true_value, false_value, default_value) I = input (prompt) if not I: return default while True: if I = true_value: return True elif I = false_value: return False prompt = 'Please input {}or {}:'. format (true_value, false_value) I = input (prompt) def screenImg (true_value = '', default = True): prompt = 'when a question occurs, press enter to identify \ n' I = input (prompt) if not I: return default while Tru E: if I = true_value: return True else: return False I = input (prompt) def ocr_subject (p ): # The interception distance starts from 530 and ends at 940 # The interception distance ends at 260. subImg = cut_img (p) pytesseract. pytesseract. tesseract_cmd = 'e:/Program Files (x86)/Tesseract-OCR/tesseract 'subject = pytesseract. image_to_string (subImg, lang = 'chi _ sim ') subject = "". join (subject. split () subject = subject. split ('. ') [1]. replace ("\" "," ") print (subject) Ocr_answer (p, subject) # openPage (subject) # print ("end:" + str (time. time () ocr_subject_parent () def getSearchNum (key): key = quote (key) # print (key) url = 'HTTP: // www.baidu.com/s? Wd = {}'. format (key) # print (url) response = urllib. request. urlopen (url) page = response. read (). decode ("UTF-8") I = int (page. index ('Baidu finds related results for you ') start = I + 10 end = I + 25 page = page [start: end] return pagedef ocr_answer (p, subject ): list = cut_question (p) pytesseract. pytesseract. tesseract_cmd = 'e:/Program Files (x86)/Tesseract-OCR/tesseract 'for p in list: t = threading. thread (target = ocr_answer _ Thread, args = (p, subject) t. start () def ocr_answer_thread (p, subject): answer = pytesseract. image_to_string (p, lang = 'chi _ sim ') answer = "". join (answer. split () v = getSearchNum (subject + ''+ answer) print (answer +'' + v) # print (time. time () def ocr_subject_parent (): result = screenImg () if result: start = time. time () # print ("start:" + str (start) # screenshot. check_screenshot () process = subpro Cess. popen ('adb shell screencap-p', shell = True, stdout = subprocess. PIPE) binary_screenshot = process. stdout. read () binary_screenshot = binary_screenshot.replace (B '\ r \ n', B' \ n') f = open('autojump.png ', 'wb') f. write (binary_screenshot) f. close () # screenshot. pull_screenshot () img = Image.open('autojump.png ') ocr_subject (img) def openPage (subject): url = 'https: // www.baidu.com/s? Wd = {}'. format (subject) webbrowser. open (url) webbrowser. get () def cut_img (img): region = img. crop (70,260,102 5, 570) # region. save ("temp/cut_first.png") return regiondef cut_question (img): list = [] question1 = img. crop (70,590,102 5, 768) question2 = img. crop (70,769,102 5, 947) question3 = img. crop (70,948,102 5, 1130) list. append (question1) list. append (question2) list. append (question3) # question1.save ("temp/cut_1.png") # question2.save ("temp/cut_2.png") # question3.save ("temp/cut_3.png ") return listif _ name _ = '_ main _': main ()

Because many of the following questions are not, I prefer the first method. Basically, the recognition time is between 0.5-0.6 seconds.

In the end, ocr_zh.py can be used to capture the king of the mind.

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python3.5 + tesseract + adb: Watermelon video or assistant answer by the leader, python3.5tesseract

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python3.5 + tesseract + adb: Watermelon video or assistant answer by the leader, python3.5tesseract

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support