"Original" uses Python to crawl Leetcode's AC code to GitHub

Source: Internet
Author: User

In Leetcode wrote 105 high-profile film branch, consider relocating to their own GitHub, to make a problem-solving question Bank, the interview can also show a

But! But!

Leetcode Online IDE function not too comfortable, I directly on the line a a lot of problems, local no code, unless there is a problem debugging a half-day a does not come, local to save code

So I thought, just use Python to crawl the AC code on the Leetcode, and then throw it into the local GitHub folder, then a synchronous Dafa

About the knowledge involved:

0. Cookies

1, the structure of the website analysis

2. Script Login

3. Script Crawl

--------------------------------------------------------------------------------------------------------------- -------------------------------

First, automatic login

Python's Cookielib + urllib2 + urllib, and then leetcode this site has a Django bird code, when visiting the homepage will be sent as a cookie, and on the login page need to submit this code at the same time, pay attention to first visit the homepage, Extract this code and then visit the login page and submit it together.

Then there is to modify the header, I changed the Referer, has been 403,wtf.

Code:

Import urllib2
Import cookielib
Import urllib


Mydir = R ' C:/users/user/documents/github/leetcode /'
Myhost = R ' https://oj.leetcode.com '  



cookie = cookielib. Cookiejar ()
handler = Urllib2. Httpcookieprocessor (cookie)
Urlopener = Urllib2.build_opener (handler)
Urlopener.open (' https:// oj.leetcode.com/')

Csrftoken = ""
for CK in cookie:
Csrftoken = ck.value


Login = "SHADOWMYDX"
MyPwd = "**********" # password


values = {' Csrfmiddlewaretoken ': csrftoken, ' login ': login, ' password ': mypwd, ' Remember ': ' on '}
values = Urllib.urlencode (values)
headers = {' user-agent ': ' mozilla/5.0 (Windows; U Windows NT 6.1; En-us; rv:1.9.1.6) gecko/20091201 firefox/3.5.6 ', \
' Origin ': ' https://oj.leetcode.com ', ' Referer ': ' https:// Oj.leetcode.com/accounts/login/'}



request = Urllib2. Request ("https://oj.leetcode.com/accounts/login/", values,headers=headers)

URL = urlopener.open (request)

page = Url.read ()

Second, climb the station

Cut into several sub-problems. First, find the title of the AC address, second, find the code address of the AC, and finally, crawl the AC code into the local GitHub project folder.

Since Leetcode IDE is the dynamic page of JS implementation, it is not possible to use Firebug to examine elements directly, but to capture the AC code from the JS code sent over. This means that a dictionary is required to convert special characters

def savecode (Code,title):
Global Mydir
f = Open (Mydir + title + '. cpp ', ' W ')
F.write (Code)

def downloadcode (Refer,codeadd,title):
Global headers
Global Urlopener
Global Myhost
headers[' Referer '] = refer
Request = Urllib2. Request (Codeadd,headers=headers)
url = urlopener.open (request)
all = Url.read ()
tar = "storage.put (' cpp ',"
index = All.find (tar,0)
Start = All.find (' class solution ', index)
Finis = All.find ("');", start)
Code = All[start:finis]
Tocpp = {' \u000d ': ' \ n ', ' \u000a ': ', ' \u003b ': '; ', ' \u003c ': ' < ', ' \u003e ': ' > ', ' \u003d ': ' = ', \
' \u0026 ': ' & ', ' \u002d ': '-', ' \u0022 ': ' ' ', ' \u0009 ': ' \ t ', ' \u0027 ': ' ', ' \u005c ': ' \ \ '}
For key in Tocpp.keys ():
Code = Code.replace (Key,tocpp[key])
Savecode (Code,title)

def findcode (Address,title):
Global headers
Global Urlopener
Global Myhost
headers[' Referer ' = Address
Address + = ' submissions/'
print ' now is dealing ' + address + ': ' + title
Request = Urllib2. Request (Address,headers=headers)
url = urlopener.open (request)
all = Url.read ()
tar = ' class= ' text-danger status-accepted "'
index = All.find (tar,0)
Start = All.find (' href= "', index)
Finis = All.find (' > ', start)
Downloadcode (address,myhost + all[start + 6:finis],title)

def findadd (page):
index = 0
While 1:
index = page.find (' class= ' AC ', index)
If Index! =-1:
Index + = 1
Start = Page.find (' <td><a href= "', index)
Finis = Page.find (' > ', start)
Tmpfin = Page.find (' < ', finis)
title = Page[finis + 2:tmpfin]
Findcode (myhost + page[start + 13:finis],title)
Else
Break

Finally, call Findadd (page), done

PostScript: The first idea is to do a multi-threaded version, and then think about the first to realize the function, or else add a rotten tail toys.

"Original" uses Python to crawl Leetcode's AC code to GitHub

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.