Repustate from Python to go with 10 times faster performance

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Repustate provides text Analysis Services to businesses and organizations around the world. As the company grows, the number of text segments they process daily increases from 500 million to 1 billion, including tweets, news articles, blog comments, user feedback, and more. Large-scale text analysis is difficult because there are very few cases where two paragraphs of text are exactly the same, so caching can not be used to improve efficiency. However, it can divide large pieces of text into multiple sentences, and then analyze each sentence in parallel. Recently, repustate official blog published a blog post, introducing its API evolution process.

The first version of the Repustate API was written in Django. They built a prototype and launched their services on this basis. However, the overhead for each Django request/response cycle is too high. As API traffic increases and reliability issues highlight, the cost of using Amazon services increases dramatically. As a result, they started looking for a python alternative and chose flask. Flask is almost a ready-made API, and it's lightweight. However, they later discovered the Falcon. They liked the framework very much, because it was optimized using Cython, much faster than Django, and it followed the simple rest principle. It turns out that Falcon is a good remedy. The average response time of the repustate is shortened, and the number of failures and support problems is lowered.

But even so, Repustate's performance is still not meeting the growing demand. In particular, concurrency has always been a sore spot for Python. And, they're using Python 2.7, and they're not using the Asyncio package in Python 3. But in fact, that is used, but also to worry about the Gil problem. Also, Falcon cannot implement self-hosted deployments. Python cannot be packaged like Java or C, and then distributed. Many of their customers need to run repustate in their own networks, and they can only provide these customers with a virtual application that deploys the entire technology stack. This virtual application can be used both in VMware and virtual Box. This is a viable option, but it is cumbersome and difficult to update and support. So, they want to be able to just provide a binary file that can be installed.

Go is chosen because go satisfies all of their requirements:

    • Faster than Python
    • Can be compiled into a single binary file
    • Can be deployed to any operating system
    • Easy to implement concurrency

Also, the go test suite layout looks simpler than their nose test. The test function header is easy to migrate. For example, def test_my_function(): it will be converted func TestMyFunction(t *testing.T) { to a simple replacement to complete. In addition, go routines and channels are very easy to use, making concurrent text analysis easy to implement.

The entire migration process takes 3 months and interested readers can view their migration process. Here are the results of their migration:

    • API average response time decreased from 100ms to 10ms;
    • The number of EC2 instances required decreased by 85%;
    • Since go can be compiled into a single binary file, and go 1.5 makes cross-compiling easy, they are now able to provide a repustate self-hosted version;
    • Because Python and go are similar, they are able to quickly rebuild unit tests.

In addition, they did a lot of performance-independent improvements because of the Python code again. Therefore, the paper states that:

If time permits, looking back at the old code is always good. You may be amazed at how bad it can be.

Thank Xing for the review of this article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.