Late last night, suddenly a friend came to me to help download the technical documents of Subaru. Originally thought that because of some reason he didn't visit the website of foreign country, the result let me be shocked! Mom, this PDF has more than 1000.
Friends in the foreign forum to find someone can download, and has been affixed to the source, but he did not understand.
Forum address is: http://www.subaruoutback.org/forums/138-gen-5-2015-present/280682-2016-owner-s-service-manuals-posted.html
This is the website of the document download: http://techinfo.subaru.com/index.html
Where the account is to buy money, good new settings:
- Hours (3 days) @ $34.95
- @ $299.95
- 365 days (1 year) @ $2499.95
Another: they require that the account download PDF, no more than 50 files per hour ... Hang dad ...
I borrowed their code (the Code on the second floor, the one written in Python, the 15-story time mechanism to prevent more than 1 hours and 50 times the limit), and the current situation changed the part.
Now share it with the people you need.
My python environment is Python-2.7.10. Using the 3.4 version of the need to change part of the code, only the package name and some of the package name below the name of the method, is a simple operation.
Below the code ~
1 #Pip Install required packages and import them2 Importlxml.html, Urllib2, Urlparse, OS, requests, Natsort, time3 fromPyPDF2ImportPdffilemerger, Pdffilereader4 5 #File Downloader6 defDownload_file (URL, url_splitter='/'):7Local_filename = Url.split (Url_splitter) [-1]8 #here is a cookie simulation method that requires a simulated login9headers = {Ten "Host":"techinfo.subaru.com", One "user-agent":"lol", A "Cookies":"jsessionid=f3cb4654bfc47a6a8e9a1859f0445123" - } -r = Requests.get (URL, stream=true, headers=headers) theWith open (Local_filename,'WB') as F: - forChunkinchR.iter_content (chunk_size=1024): - ifChunk: - F.write (Chunk) + F.flush () - returnLocal_filename + A #Grab all the PDFs at defgrab_files (base_url): -res =Urllib2.urlopen (Base_url) -Tree =lxml.html.fromstring (Res.read ()) -NS = {'Re':'http://exslt.org/regular-expressions'} - forNodeinchTree.xpath ('//a[re:test (@href, "\.pdf$", "I")]', namespaces=NS): -PDFlink = Urlparse.urljoin (Base_url, node.attrib['href']) in PrintPDFlink -filename =download_file (PDFlink) to Print("Downloading"+ filename +"complete\n") + Print("Sleep") -Time.sleep (72) the return(0) * $ #Merge the PDFsPanax Notoginseng defMerge_pdfs (merged_filename,files_dir=OS.GETCWD ()): -Pdf_files = Natsort.natsorted ([f forFinchOs.listdir (Files_dir)ifF.endswith ("PDF")]) theMerger =Pdffilemerger () + forFileNameinchPdf_files: A Print("Merging"+filename) theMerger.append (Pdffilereader (Os.path.join (files_dir, filename),"RB")) + Merger.write (Os.path.join (Files_dir, merged_filename)) - Print("Merge Completed-"+merged_filename) $ return(Merged_filename) $ - #here is a list of pages to download PDF -Grab_files ('Http://techinfo.subaru.com/search/listResults.html?searchLit=Search&litNum=G2520BE') theMerge_pdfs ('2016_outback_legacy_manual.pdf')
To parse:
1. The cookie in line 12th needs to borrow the browser and then find the cookie to do. The simple thing is to use the Chrome browser. Such as:
2. Line 48th, where the connection address is the address of the PDF document you need to download. The Subaru PDF download appears to be available on the search page. A little wonderful ...
3. The 49th line, is the code that merges the PDF, after all, 1000+ a document, is open enough to mess up ... (You can choose not to merge, of course)
Okay, so you can run the code and run.
If you use Python 3.4 and the code changes, please note that:
1. Reference urllib2 should be the need to refer to Urllib, the official website can be found URBLIB2 was changed to URLLIB3, but in the actual use, I remember there is a method and need to change him to Urllib;
2. lxml may have a larger problem when quoting, you can download the already compiled package directly. http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml
Please leave a message if you have questions.
Subaru Automotive Technical Documentation Download method