What Python developers need to know before migrating to go

Source: Internet
Author: User
This is a (long) blog that records the migration of a large section of Python/cython code to the Go language experience. If you want to know everything about the whole story, background, etc., please read on. If you are interested only in what you need to know before Python developers enter, click on the link below:

Tips and tricks for migrating from Python to go

Background


Our greatest achievement in repustate technology is the realization of the Arabic sentiment analysis. Arabic is a hard-to-chew bone, and its grammatical form is too complex. Arabic participle (tokenization, dividing a sentence into separate words) is more difficult than English, because Arabic words may contain spaces within them (for example, the position inside the aleph). This does not require secrecy, that is, repustate using support vector Machine (SVM) to get the most likely meaning of the sentence, and then based on the analysis of emotions. We used a total of 22 models (22 support vector machines), and each word in the document would be analyzed. In other words, if a document contains 500 words, there will be more than 10,000 support vector machine comparison operations.

Python


Repustate is almost entirely implemented in Python, because we use Django as the application interface and the site architecture. So you can only keep the code uniform, and use Python to implement the entire Arabic emotion engine. In the process of prototyping and implementation, Python is still very good. Very strong expression ability, strong third-party library resources. If you just serve the web, it's perfect. However, when you need to perform the underlying calculations, you need to do a lot of comparison on the hash table (the dictionary in Python) and the speed slows down. We can only handle 2 to 3 Arabic language files per second, which is too slow. Compare our English emotion engine to handle 500 documents per second.

Bottleneck


So we started the Python parser and studied which parts were slow to execute. Remember when I said that we'd be using 22 support vector machines to process every single word? These processes are serial and do not have parallel operations. Well, our first thought was to change this to something like map/reduce. Long story short: Python is not suitable for using map/reduce. Python doesn't work well when you need concurrency. At the Pycon conference in 2013, Guido mentioned Tulip, who tried to solve the problem in a new project, but it will take a while to launch it. If there is a better choice, why should we wait for it?

Change the go language or go home to farm


My friend at Mozilla told me that most of the code in the Mazilla service's log schema has been switched to go, in part because of the power of the Goroutine (go thread). Go is designed by a group of people in Google that use parallelism as a first-class concept, rather than as a complement to Python's different solutions. So, we started to switch Python to Go.

Although the Go code has not reached the product level, the results have been very encouraging. We reached 1000 documents per second, used less memory, and didn't have to deal with the annoying problems of multi-process/gevent/with Python "Why CTRL + C killed my process" code.

Why do we fall in love with Go


Just knowing a little bit about how the programming language works, (Understanding the difference between interpretation and compilation, and dynamic versus static), says: "Dude, Go is obviously faster." Yes, we can also rewrite the whole thing in Java and get similar performance, but that's not why Go wins. It's easy for you to write the code with Go. I don't know what's going on, but once the code is compiled (it's fast), you'll feel it works (not just running without prompting the error, but logically right). I know it sounds iffy, but it's true. It's like Python solves redundancy problems (or no redundancy), it takes functions as a primary object, and function programming can be done easily. Go threads and channels (channel) make your life so easy. You can also get performance gains from static types, more precise control of memory allocations, and no loss of expression.

What we should have known.


To get rid of those compliments, using Go requires a different mindset than Python. Here are some of the migration notes, when you turn Python into go and randomly jump into my mind:

No built-in collection types (need to use map and check for presence)

Because there is no collection type, it is necessary to implement the methods of intersection, set, etc.

There is no tuple (tuple), you need to design your own structure (struct) or use slice (similar array)

There is no method like __getattr_ () that requires you to check for presence and not set a default value, such as in Python, you can write: value = Dict.get ("A_key", "Default_value")

Need to check for errors (or at least explicitly ignore them)

It is not possible to have unused variables and packages and to comment out some code from time to time

Switch between []byte and string, and regular processing (regexp) uses []byte (rewritable). It's right, but it's still a lot of trouble to convert.

Python syntax is more lenient. You can use an out-of-range index to take a fragment of a string without error, or you can take a fragment using a negative number. Go is not.

You cannot use a mixed-type data structure. This may not be appropriate, but sometimes I have a dictionary in Python that can be mixed with strings and lists. Not in Go, you have to clean up the data structure or custom structure *

You can't assign tuples or lists to separate variables (for example, X, y, x = [1, 2, 3])

Camel case (the first letter of the function/structure will not be exposed to other packages). I prefer Python's lowercase and underlined habits.

You must explicitly check if the error is empty, unlike many types in Python can be used like Boolean (0, empty string, none can be Boolean "false")

Some modules (such as CRYPO/MD5) have insufficient documentation, but the go-nutes on IRC is powerful and has strong support

The number-to-string (int64->string) differs from the []byte-to-string (as long as String ([]byte)), which calls StrConv

The code to read Go is absolutely like a programming language, and Python can be written like a pseudo code. Go uses more non-English numeric characters, using | | and && rather than or and and.

Writing files will have File.write ([]byte) and File.writestring (string), inconsistent with a Python developer's approach to solving the problem.

String insertion is not good, and FMT must be used frequently. Sprintf

Without constructors, the usual habit is to write a NewType () function to return the structure you want

else (or else if) is formatted correctly, and else is the same as the curly brace that is paired with If. Strange.

Different assignment operators are used inside and outside the function, = and: = (Translator Note: This is the author's misunderstanding, = and: = is the difference between explicitly defining a type or automatic type deduction, whereas variables outside the function can only be used =)

If I only want a list of key values (Dict.keys ()) or values (Dict.values ()) or a list of tuples (Dict.items ()), Go does not have a corresponding function and can only iterate on its own

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.