Subaru Automotive Technical Documentation Download method

Source: Internet
Author: User

Late last night, suddenly a friend came to me to help download the technical documents of Subaru. Originally thought that because of some reason he didn't visit the website of foreign country, the result let me be shocked! Mom, this PDF has more than 1000.

Friends in the foreign forum to find someone can download, and has been affixed to the source, but he did not understand.

Forum address is: http://www.subaruoutback.org/forums/138-gen-5-2015-present/280682-2016-owner-s-service-manuals-posted.html

This is the website of the document download: http://techinfo.subaru.com/index.html

Where the account is to buy money, good new settings:

    • Hours (3 days) @ $34.95
    • @ $299.95
    • 365 days (1 year) @ $2499.95

Another: they require that the account download PDF, no more than 50 files per hour ... Hang dad ...

I borrowed their code (the Code on the second floor, the one written in Python, the 15-story time mechanism to prevent more than 1 hours and 50 times the limit), and the current situation changed the part.

Now share it with the people you need.

My python environment is Python-2.7.10. Using the 3.4 version of the need to change part of the code, only the package name and some of the package name below the name of the method, is a simple operation.

Below the code ~

1 #Pip Install required packages and import them2 Importlxml.html, Urllib2, Urlparse, OS, requests, Natsort, time3  fromPyPDF2ImportPdffilemerger, Pdffilereader4 5 #File Downloader6 defDownload_file (URL, url_splitter='/'):7Local_filename = Url.split (Url_splitter) [-1]8     #here is a cookie simulation method that requires a simulated login9headers = {Ten         "Host":"techinfo.subaru.com", One         "user-agent":"lol", A         "Cookies":"jsessionid=f3cb4654bfc47a6a8e9a1859f0445123" -     } -r = Requests.get (URL, stream=true, headers=headers) theWith open (Local_filename,'WB') as F: -          forChunkinchR.iter_content (chunk_size=1024): -             ifChunk: - F.write (Chunk) + F.flush () -     returnLocal_filename +  A #Grab all the PDFs at defgrab_files (base_url): -res =Urllib2.urlopen (Base_url) -Tree =lxml.html.fromstring (Res.read ()) -NS = {'Re':'http://exslt.org/regular-expressions'} -      forNodeinchTree.xpath ('//a[re:test (@href, "\.pdf$", "I")]', namespaces=NS): -PDFlink = Urlparse.urljoin (Base_url, node.attrib['href']) in         PrintPDFlink -filename =download_file (PDFlink) to         Print("Downloading"+ filename +"complete\n") +         Print("Sleep") -Time.sleep (72) the     return(0) *  $ #Merge the PDFsPanax Notoginseng defMerge_pdfs (merged_filename,files_dir=OS.GETCWD ()): -Pdf_files = Natsort.natsorted ([f forFinchOs.listdir (Files_dir)ifF.endswith ("PDF")]) theMerger =Pdffilemerger () +      forFileNameinchPdf_files: A         Print("Merging"+filename) theMerger.append (Pdffilereader (Os.path.join (files_dir, filename),"RB")) + Merger.write (Os.path.join (Files_dir, merged_filename)) -     Print("Merge Completed-"+merged_filename) $     return(Merged_filename) $  - #here is a list of pages to download PDF -Grab_files ('Http://techinfo.subaru.com/search/listResults.html?searchLit=Search&litNum=G2520BE') theMerge_pdfs ('2016_outback_legacy_manual.pdf')

To parse:

1. The cookie in line 12th needs to borrow the browser and then find the cookie to do. The simple thing is to use the Chrome browser. Such as:

2. Line 48th, where the connection address is the address of the PDF document you need to download. The Subaru PDF download appears to be available on the search page. A little wonderful ...

3. The 49th line, is the code that merges the PDF, after all, 1000+ a document, is open enough to mess up ... (You can choose not to merge, of course)

Okay, so you can run the code and run.

If you use Python 3.4 and the code changes, please note that:

1. Reference urllib2 should be the need to refer to Urllib, the official website can be found URBLIB2 was changed to URLLIB3, but in the actual use, I remember there is a method and need to change him to Urllib;

2. lxml may have a larger problem when quoting, you can download the already compiled package directly. http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

Please leave a message if you have questions.

Subaru Automotive Technical Documentation Download method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.