Python3.5 + tesseract + adb: Watermelon video or assistant answer by the leader, python3.5tesseract

Source: Internet
Author: User

Python3.5 + tesseract + adb: Watermelon video or assistant answer by the leader, python3.5tesseract

I have been involved in the recent Q & A games, and some questions are really difficult to answer, but 10 seconds is not enough for Baidu, so I wrote a secondary ticket, in this way, when a question occurs, Baidu automatically takes 2 seconds. The remaining 7 or 8 seconds can be analyzed and answered, improving the probability of winning.

For the source code, see my github: click the link

Analysis of the Principle: Use the adb command to capture the video playing interface of the mobile phone, and then obtain the question and answer through python interception and ocr. Then Baidu gets the result. How to set up this environment? contact me if you have any children's shoes you need. because local ocr is used, parsing is free of money and there are no restrictions on use.

Code on github

Ocr_b1_py, which automatically goes to Baidu Based on the question, then opens a browser to display the search results

#-*-Coding: UTF-8-*-import pytesseractimport timeimport webbrowserimport subprocessfrom PIL import Imagedef main (): "main function" op = yes_or_no? ') If not op: print ('bye') return # core recursion ocr_subject_parent () # for root, sub_dirs, files in OS. walk ('e:/temporarily received file/zhihu answer/million/'): # for file in files: # print ('found image:' + file) # img = Image. open ('e:/temporarily received file/zhihu answer/million/'+ file) # ocr_subject (img) def yes_or_no (prompt, true_value = 'y ', false_value = 'n', default = True ): "check whether the startup program is ready" default_value = true_value if default else false_value prompt = '{}{}/{} [{}]: '. Format (prompt, true_value, false_value, default_value) I = input (prompt) if not I: return default while True: if I = true_value: return True elif I = false_value: return False prompt = 'Please input {}or {}:'. format (true_value, false_value) I = input (prompt) def screenImg (true_value = '', default = True): prompt = 'when a question occurs, press enter to identify 'I = input (prompt) if not I: return default while True: If I = true_value: return True else: return False I = input (prompt) def ocr_subject (p ): # The interception distance starts from 530 and ends at 940 # The interception distance ends at 260. p = cut_img (p) pytesseract. pytesseract. tesseract_cmd = 'e:/Program Files (x86)/Tesseract-OCR/tesseract 'subject = pytesseract. image_to_string (p, lang = 'chi _ sim ') subject = "". join (subject. split () subject = subject. split ('. ') [1] print (subject) openPage (subject) ocr_subject_p Arent () def ocr_subject_parent (): result = screenImg () if result: start = time. time () # screenshot. check_screenshot () process = subprocess. popen ('adb shell screencap-p', shell = True, stdout = subprocess. PIPE) binary_screenshot = process. stdout. read () binary_screenshot = binary_screenshot.replace (B '\ r \ n', B' \ n') f = open('autojump.png ', 'wb') f. write (binary_screenshot) f. close () # screenshot. pull_scr Eenshot () img = Image.open('autojump.png ') print ("time consumed:" + str (time. time ()-start) ocr_subject (img) def openPage (subject): url = 'https: // www.baidu.com/s? Wd = {}'. format (subject) webbrowser. open (url) webbrowser. get () def cut_img (img): region = img. crop (70,260,102 5, 570) # region. save ("temp/cut_first.png") return regionif _ name _ = '_ main _': main ()

Ocr_bw2.py: Search Baidu Based on the question + answer, crawl Baidu's indexed data through crawlers, and print the results on the console.

_ Author _ = 'zjy' #-*-coding: UTF-8-*-import pytesseractimport timeimport webbrowserimport subprocessfrom PIL import Imageimport urllibimport urllib. requestimport threadingfrom urllib. parse import quotedef main (): "" main function "op = yes_or_no ('Make sure that ADB is enabled on your phone and your computer is connected, ''then open the watermelon video and use this program. Are you sure you want to start? ') If not op: print ('bye') return # core recursion ocr_subject_parent () # for root, sub_dirs, files in OS. walk ('e:/temporarily received file/zhihu answer/million/'): # for file in files: # print ('found image:' + file) # img = Image. open ('e:/temporarily received file/zhihu answer/million/'+ file) # ocr_subject (img) def yes_or_no (prompt, true_value = 'y ', false_value = 'n', default = True ): "check whether the startup program is ready" default_value = true_value if default else false_value prompt = '{}{}/ {} [{}]: '. Format (prompt, true_value, false_value, default_value) I = input (prompt) if not I: return default while True: if I = true_value: return True elif I = false_value: return False prompt = 'Please input {}or {}:'. format (true_value, false_value) I = input (prompt) def screenImg (true_value = '', default = True): prompt = 'when a question occurs, press enter to identify \ n' I = input (prompt) if not I: return default while Tru E: if I = true_value: return True else: return False I = input (prompt) def ocr_subject (p ): # The interception distance starts from 530 and ends at 940 # The interception distance ends at 260. subImg = cut_img (p) pytesseract. pytesseract. tesseract_cmd = 'e:/Program Files (x86)/Tesseract-OCR/tesseract 'subject = pytesseract. image_to_string (subImg, lang = 'chi _ sim ') subject = "". join (subject. split () subject = subject. split ('. ') [1]. replace ("\" "," ") print (subject) Ocr_answer (p, subject) # openPage (subject) # print ("end:" + str (time. time () ocr_subject_parent () def getSearchNum (key): key = quote (key) # print (key) url = 'HTTP: // www.baidu.com/s? Wd = {}'. format (key) # print (url) response = urllib. request. urlopen (url) page = response. read (). decode ("UTF-8") I = int (page. index ('Baidu finds related results for you ') start = I + 10 end = I + 25 page = page [start: end] return pagedef ocr_answer (p, subject ): list = cut_question (p) pytesseract. pytesseract. tesseract_cmd = 'e:/Program Files (x86)/Tesseract-OCR/tesseract 'for p in list: t = threading. thread (target = ocr_answer _ Thread, args = (p, subject) t. start () def ocr_answer_thread (p, subject): answer = pytesseract. image_to_string (p, lang = 'chi _ sim ') answer = "". join (answer. split () v = getSearchNum (subject + ''+ answer) print (answer +'' + v) # print (time. time () def ocr_subject_parent (): result = screenImg () if result: start = time. time () # print ("start:" + str (start) # screenshot. check_screenshot () process = subpro Cess. popen ('adb shell screencap-p', shell = True, stdout = subprocess. PIPE) binary_screenshot = process. stdout. read () binary_screenshot = binary_screenshot.replace (B '\ r \ n', B' \ n') f = open('autojump.png ', 'wb') f. write (binary_screenshot) f. close () # screenshot. pull_screenshot () img = Image.open('autojump.png ') ocr_subject (img) def openPage (subject): url = 'https: // www.baidu.com/s? Wd = {}'. format (subject) webbrowser. open (url) webbrowser. get () def cut_img (img): region = img. crop (70,260,102 5, 570) # region. save ("temp/cut_first.png") return regiondef cut_question (img): list = [] question1 = img. crop (70,590,102 5, 768) question2 = img. crop (70,769,102 5, 947) question3 = img. crop (70,948,102 5, 1130) list. append (question1) list. append (question2) list. append (question3) # question1.save ("temp/cut_1.png") # question2.save ("temp/cut_2.png") # question3.save ("temp/cut_3.png ") return listif _ name _ = '_ main _': main ()

Because many of the following questions are not, I prefer the first method. Basically, the recognition time is between 0.5-0.6 seconds.

In the end, ocr_zh.py can be used to capture the king of the mind.

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.