Using Python to access data on the network

Last Update:2017-02-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

These two days have finished course above:

Accessing network data using Python

https://www.coursera.org/learn/python-network-data/

Wrote some homework and finished some homework. Do some study notes to make memos.

1. Regular Expressions---Although the following lessons do not use this knowledge, this skill is fine.

Attach the usage of the main regular expressions listed in the course:

Python Regular Expression Quick Guide^Matches The beginning of a line$ Matches the end of the line. Matches any character\s Matches whitespace\s Matches any non-whitespace character*Repeats a character zero or more times*?Repeats a character zero or more times (non-greedy)+Repeats a character one or more times+?Repeats a character one or more times (non-greedy)  [Aeiou] Matches a single characterinchThe listedSet[^XYZ] Matches a single character notinchThe listedSet[A-z0-9] TheSetOf characters can include a range (indicateswhere stringExtraction isto start) Indicateswhere stringExtraction isTo end

In particular, it was not noted that the from ([0-9a-z]) is actually the part of the statement () that conforms to the entire rule.

and (.) Does not mean any character but just.

Attach the Operation programming:

Import redef Sumtext (name): Handle= Open (name,'R') Sum=0         forLineinchhandle:nums= Re.findall ('[0-9]+', line)ifLen (nums) >=1:                         forNuminchNums:sum+=int(num)returnSumfiledir= Raw_input ("imput fileName:") sum1=sumtext (filedir) print sum1

2. Create a socket link using Python

Introduced the next socket, a used to communicate with the application, each network application has a corresponding port number, through the Protocol + hostname + port can find the application for communication.

Shows how to use Telnet to get an HTTP service.

 the GET http: // www.cnblogs.com/webarn/p/6398989.html http/1.0

Not necessarily successful, I think it is not the course said the speed is too slow reasons.

Well attach yourself to know the simpler way:

Curl-xget http://www.cnblogs.com/webarn/p/6398989.html

Or use Python to create a URL link directly with the following code:

Import Socketmysock=Socket.socket (socket.af_inet, socket. Sock_stream) Mysock.connect (('data.pr4e.org', the)) Mysock.send ('GET http://data.pr4e.org/intro-short.txt http/1.0\n\n') whileTrue:data= Mysock.recv ( +)    if(Len (data) <1 ) :         Breakprint data;mysock.close ()

Or, you can use the browser's developer tools at a glance.

3. Understanding HTML and parsing

Because most of the Web page is HTML format is the Hypertext Markup language, most of the Web page is used when the language, so tell us that Python is also a parsing HTML package: BeautifulSoup.

The links to this project are as follows:

https://www.crummy.com/software/BeautifulSoup/

Use the details to view it.

Then it is how the code is used, or a small demo of their own work:

Import Urllib fromBeautifulSoup Import *URL= Raw_input ('Enter-') HTML=urllib.urlopen (URL). Read () Soup=beautifulsoup (HTML) sum=0TRS= Soup ('TR') forTrinchTRS:ifTr.span isNot None:num=int(tr.span.contents[0]) sum+=numprint sum

4.webService and XML

Describes XML, extensible Markup Language. Primarily used to transmit and store data. Readability is relatively strong. Many webservice communication protocols are designed with XML.

There is a Schme concept, such as we used to write some XSD files to represent the constraints in the XML data structure, such as whether the field can be lost or not, such as the field type, which is a constraint, and something similar to the protocol.

The schema will also have a lot of standards.

The XML parsing is using Python's internal package:

Xml.etree.ElementTree, the XML as a tree structure to parse, to get the value of the field from the root node to count.

The code is as follows:

Import Urllibimport Xml.etree.ElementTree asEturl= Raw_input ("Enter Location:") Print'Retrieving', Urluh=urllib.urlopen (URL) data=uh.read () print'\nretrieved', Len (data),'characters'Tree=et.fromstring (data) Comments= Tree.findall ('.//comment') Sum=0Count=len (comments) Print'Count:', Count forCommentinchComments:sum+=int(Comment.find ('Count'). Text) Print'Sum:', Sum

5.json,api

This section refers to SOA, object-oriented services, large systems will use this, it is felt that each system has a layer of middle-tier for communication, communication with the data protocol, the format is unified, so that they can communicate with each other. Of course, there are service discovery and other issues to consider. But with the SOA architecture, the various systems can communicate.

API courses cited the Google Maps API and Twitter APIs, and each application may have APIs to make calls, and application program interface is the interface for communicating with a system. The format of the API is relatively simple, using restful calls. RESTful style feel can write another article, can look at their direct relationship, but I see the API is mostly URL + parameters. This is the kind of http://www.xxxx.com?para1=11&&param2=11, should be understood to be with the previous protocol + host + port + parameters almost.

JSON introduction: JSON is an introduction to the data exchange protocol, only one version, never modified, and XML than a lot lighter, only two data format map,list. Other can see (json.org) (write this section of Chrome crashes 3 times, I also collapsed ...) Then the loads is the parsing string, and load is parsing file.

The code is attached:

Import Jsonimport Urlliburl= Raw_input ('Enter Location:') Print'Retrieving', Urluh=urllib.urlopen (URL) data=uh.read () print'Retrieved', Len (data) Info=json.loads (data) Print'Count:', Len (info['Comments']) sum=0 forCommentinchinfo['Comments']: Sum+=int(comment['Count']) Print'Sum:', Sum

The API gets and then parses the JSON:

Import Urllibimport Jsonserviceurl='Http://python-data.dr-chuck.net/geojson?' whiletrue:address= Raw_input ('Enter Location:')        ifLen (address) <1 :                 BreakURL= Serviceurl + Urllib.urlencode ({'sensor':'false','Address': Address}) Print'Retrieving', URL uh=urllib.urlopen (URL) data=uh.read () print'Retrieved', Len (data),'characters'        Try: JS =json.loads (str (data)) Except:js=Noneif 'Status'NotinchJS or js['Status'] !='OK': Print'= = = Failure to Retrieve = = ='Print DataContinueprint json.dumps (JS, indent=4) Print'place_id:', js['Results'][0]['place_id']

Use Python to access data on the network

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Using Python to access data on the network

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support