Comparison between Python and C programming ideas through examples

Source: Internet
Author: User
Tags numba
This article mainly introduces the differences between Python and C programming ideas through examples. as a representative of object-oriented and process-oriented programming languages, the comparison between the two is classic, for more information about how to use Python to process various data science projects, see. Python is famous for its ease of use. Coding experience can be used to get started (or use it effectively) after several days ).

It sounds good, but if you use both Python and other languages, such as C, there may be some problems.

Let me give you an example of my experience. I am proficient in imperative languages such as C and C ++. Familiar with ancient classical languages such as Lisp and Prolog. I have also used Java, Javascript, and PHP for a while. (So, learning) isn't Python easy for me? In fact, it just seems easy. I dug a hole for myself: I use Python like C.

For more information, see.

In a recent project, you need to process geospatial data. The given (task) is a gps tracking of about 25,000 points. you need to repeat the given longitude and latitude to locate the point with the shortest distance. My first response was to review (implemented) the code snippet for calculating the distance between two known longitude and latitude points. The code can be found in the code available in the public domain written by John D. Cook.

Everything is ready! If you write a Python function and return the index of the vertex that is the shortest distance from the input coordinate (the index in the 25,000 vertex array), everything is fine:

def closest_distance(lat,lon,trkpts):  d = 100000.0  best = -1  r = trkpts.index  for i in r:    lati = trkpts.ix[i,'Lat']    loni = trkpts.ix[i,'Lon']    md = distance_on_unit_sphere(lat, lon, lati, loni)    if d > md      best = i      d = md  return best

Where distance_on_unit_sphere is John D. in the Cook's book, trkpts is an array that contains the coordinate of points tracked by gps (in fact, it is a data frame in pandas. Note that pandas is a python third-party data analysis extension package ).

The above functions are basically the same as the functions I previously implemented in C. It traverses (iterates) the trkpts array and saves the index value of the vertex that is closest to the given coordinate location to the local variable best.

So far, the situation has been good. Although the Python syntax is quite different from C, writing this code does not take me too much time.

Code can be written quickly, but execution is slow. For example, if you specify 428 vertices, you can name them waypoints (the key points in the navigation point, path point, and navigation line ). During navigation, I need to find the shortest distance for each navigation point waypoint. For the program of finding the shortest distance for the 428 navigation point waypoint, I ran 3 minutes 6 seconds on my notebook.

Then, I changed to query and calculate the Manhattan distance, which is an approximate value. I do not calculate the exact distance between two points, but the distance between the east-west axis and the south-north axis. The function for calculating the Manhattan distance is as follows:

def manhattan_distance(lat1, lon1, lat2, lon2):  lat = (lat1+lat2)/2.0  return abs(lat1-lat2)+abs(math.cos(math.radians(lat))*(lon1-lon2))

In fact, I used a simpler function and ignored a factor, that is, the level-1 gap on the dimension curve is much larger than the level-1 gap on the longitude curve. The simplified functions are as follows:

def manhattan_distance1(lat1, lon1, lat2, lon2):  return abs(lat1-lat2)+abs(lon1-lon2)

The closest function is modified:

def closest_manhattan_distance1(lat,lon,trkpts):  d = 100000.0  best = -1  r = trkpts.index  for i in r:    lati = trkpts.ix[i,'Lat']    loni = trkpts.ix[i,'Lon']    md = manhattan_distance1(lat, lon, lati, loni)    if d > md      best = i      d = md  return best

If you replace the Manhattan_distance function, the speed can be faster:

def closest_manhattan_distance2(lat,lon,trkpts):  d = 100000.0  best = -1  r = trkpts.index  for i in r:    lati = trkpts.ix[i,'Lat']    loni = trkpts.ix[i,'Lon']    md = abs(lat-lati)+abs(lon-loni)    if d > md      best = i      d = md  return best

At the shortest distance of the calculation, this function has the same effect as the function using John's. I hope my intuition is correct. The easier it is, the faster it is. Now this program uses 2 minutes 37 seconds. The speed was increased by 18%. Good, but not exciting enough.

I decided to use Python correctly. This means that array operations supported by pandas are used. These array operations are derived from the numpy package. By calling these array operations, the code implementation is more concise:

def closest(lat,lon,trkpts):  cl = numpy.abs(trkpts.Lat - lat) + numpy.abs(trkpts.Lon - lon)  return cl.idxmin()

This function returns the same result as the previous function. It took 0.5 seconds to run on my notebook. Faster than 300 times! 300 times, that is, 30,000%. Incredible. The reason for the speed increase is that numpy array operations are implemented in C. Therefore, we have combined the best two sides: the speed of C and the simplicity of Python.

The lesson is clear: do not use C to write Python code. Use numpy arrays instead of array traversal. For me, this is a shift in thinking.

Update on July 2 and 2015. The article is discussed in Hacker News. Some comments did not notice (missed) I used pandas data frames. It is mainly used in data analysis. If I only want to quickly query the shortest path and have enough time, I can use C or C ++ to write a quad-tree (implementation ).

Second update on July 2, 2015. One comment mentioned that numba can speed up the code. I tried it.

This is my practice, not necessarily the same as your situation. First, it should be noted that the experiment results are not necessarily the same for different python installation versions. In my experiment environment, Anaconda is installed on windows and some extension packages are installed. These packages may interfere with numba ..

First, enter the following installation command to install numba:

$ conda install numba

This is my feedback on the command line interface:

Then I found that numba already exists in the anaconda installation kit. Or the installation instructions may be changed.

Recommended numba usage:

@jitdef closest_func(lat,lon,trkpts,func):  d = 100000.0  best = -1  r = trkpts.index  for i in r:    lati = trkpts.ix[i,'Lat']    loni = trkpts.ix[i,'Lon']    md = abs(lat - lati) + abs(lon - loni)    if d > md:      #print d, dlat, dlon, lati, loni      best = i      d = md  return best

I have not found any improvement in running time. I also tried more active compilation parameter settings:

@jit(nopython=True)def closest_func(lat,lon,trkpts,func):  d = 100000.0  best = -1  r = trkpts.index  for i in r:    lati = trkpts.ix[i,'Lat']    loni = trkpts.ix[i,'Lon']    md = abs(lat - lati) + abs(lon - loni)    if d > md:      #print d, dlat, dlon, lati, loni      best = i      d = md  return best

An error occurs when the code is run this time:

It seems that pandas is more intelligent than numba in processing code.

Of course, I can also spend time modifying the data structure so that numba can be compiled correctly (compile ). But why am I doing this? The code written with numpy runs fast enough. Anyway, I have been using numpy and pandas. Why not continue to use it?

We also recommend that you use pypy. This certainly makes sense,... I am using Jupyter notebooks on the hosting server (note: python interactive development environment for online browsers ). I am using the python kernel provided by it, that is, the official (regular) Python 2.7.x kernel. Pypy selection is not provided.

Cython is also recommended. Well, if I want to compile the code later, I just need to use C and C ++ directly. I use python because it provides interactive features based on notebooks (note: Online development environment for web pages) and can be quickly prototyped. This is not Cython's design goal.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.