Several solutions to improve Ruby on Rails performance: rubyrails
Introduction
The Ruby On Rails framework has received wide attention since it was proposed. Under the guidance of ideas such as "do not repeat yourself" and "convention is better than configuration, rails brings high development efficiency to Web developers. The flexibility of ActiveRecord allows you to achieve easy-to-use persistence without having to configure tedious Hibernate. The diverse Rails plug-ins on Github and Rubygems are another powerful guarantee for Rails development efficiency. Rails is a truly thorough MVC (Model-View-Controller) framework. Rails clearly separates the code of your Model from the application logic of your Controller from the View code. Rails developers rarely or may never have encountered the problem of which layer of code should be put. In the world of Rails, your code's responsibilities are clearly located, you can easily decide which layer they should be located. Rails1.2 and later versions start to support Rest services. With these embedded features, you can develop your Web applications and display HTML pages to users without any effort, you can also provide a Rest API-based Web Service. In addition, you can easily consume the Rest API-based Web Service provided by a third party just like using a basic data Model.
By using the search engine, you can find many articles similar to using Ruby On Rails to create a Blog system within 15 minutes, such as using Rails for rapid development. Ruby On Rails is also very friendly for agile development. In the Rails framework, Unit Test and Function Test are integrated to Test your Model and application logic. With this, you do not need to install any plug-ins or libraries to easily develop test drivers. With the support of Watir, you can easily use Ruby code to implement browser-based automatic testing. In addition, with the support of Rspec and Other plug-ins, you can even conduct behavior-Driven Development (Behaviour Driven Development) to make your test code more meaningful and easy to accept by customers.
Although you can list many advantages of Rails at once, you may also use Java, Hibernate, Spring, Struts, and Ruby On Rails to develop code lines for Web applications with the same functions, you can also take a picture of the books for training materials required for Java, Hibernate, Spring, Struts, and Ruby On Rails development, but we all inevitably face Scalability issues. Many Web applications face a large number of user groups, and such applications will encounter performance tests brought about by high concurrency. Rails is not convincing everyone at this point. Many system architects and engineers are still skeptical about whether Rails is applicable to developing highly load and concurrent Web applications. However, it is undeniable that with the emergence of twitter, friends for sale, basecamp, and other applications with large loads, Rails is increasingly recognized as extensible. These mature applications also tell us that, it is possible to develop a Rails application that simultaneously processes millions of user requests. Ruby On Rails framework has been criticized for scalability and focuses On the following issues: the performance of Ruby is poor. Ruby On Rails lacks mature and high-performance application servers to support database expansion, there is a lack of reliable network providers on the Internet. This article will start from these points and introduce the Scalable deployment architecture of Rails Applications, as well as some good practices for developing high-performance Rails Applications.
Common Web applications, especially high-load applications, in addition to Web and application server selection, Server Load balancer and other deployment problems, usually have to face background tasks, high-performance full-text search for these development issues, which have mature solutions in mature development environments such as Java or PHP. Developers and architects usually have multiple options, design the architecture based on specific applications. These are not yet complete and mature in the emerging Rails community. This article will introduce some specific practices for developing and deploying high-performance and scalable Rails Applications, this section analyzes and provides solutions to specific problems encountered by Web 2.0 websites. It aims to provide specific reference for Rails developers. The content described in this article can be used to comprehensively represent:
Figure 1. Overall structure of this Article
Use Nginx + pasl to deploy Rails Applications instead of Apache + Mongrel
In fact, the Rails deployment can be very simple. We need some machines to configure our environment. For performance and cost, we naturally choose Linux Series servers, which can be CentOS, Debian, ubuntu Server. All the programs, commands, and code in this article, unless otherwise stated, are based on Linux by default. Then we need the database to store our data. We can choose free Mysql, and then we start to develop the Rails code of our Web application. during deployment, we need to support the Ruby On Rails application server to run our Rails code. This may be Webrick that comes with Rails. Of course, this is extremely unlikely, either FastCGI or Mongrel, thin, Passenger, and so on. Then, we can access our application server through Web servers such as Apache, Nginx, Lighttpd, and HAProxy. This access can be either through HTTP or FastCGI, or other custom protocols, so that we can access our applications through a browser. As shown in, this is the common Rails Application Deployment structure.
Figure 2. Rails Application Deployment Structure
For front-end Web servers, Apache is a de facto industrial standard and has the highest share in the Web Server market. A large number of websites around the world use Apache to deploy their applications, apache is a mature and stable Web server with powerful functions. It provides extension support for almost all Web development languages and frameworks, and supports the Rails framework, we can use the mod_fcgid module to communicate with the Rails process through the FastCGI protocol. Or use mod_proxy_balancer to distribute HTTP distribution to independent backend Rails servers such as Thin Cluster, Mongrel Cluster, or Apach/Nginx + Passenger. However, Apache, as a general server, is far from some lightweight Web servers in terms of performance. The Distribution Performance of the mod_proxy_balancer module of Apache is not high, which is much different from that of Nginx or HAproxy. In addition, Apache currently does not support the Event model. It only supports Prefork (process) mode and Worker mode. Each time a link is processed, a process or thread needs to be created. Some lightweight Web servers, such as Nginx and Lighttpd, the kernel event mechanism is used to improve performance, greatly reduce the number of threads or processes, and reduce system load.
Nginx is a lightweight, efficient, and fast Web server. As an HTTP server and reverse proxy server, Nginx has high performance. Nginx can be compiled and run on most Unix like OS with a Windows Port. Nginx selects Epoll and Kqueue as the development model. It can support responses with up to 50,000 concurrent connections, and directly supports external services by Rails and PHP programs. In addition, as a server Load balancer server, Nginx can also be used as an HTTP Proxy Server for external services. Nginx is much more efficient than Apache in terms of system resource overhead and CPU usage. When you open the official Nginx website, you will find that the official homepage of such a well-known Web server product is so simple. In fact, Nginx is a simple installation and configuration file, you can even use the Perl syntax in the configuration file.
Apache and Nginx are competent in processing static files. However, for application servers or front-end Server Load balancer servers, we recommend Nginx instead of Apache, which is more important than Nginx.
For the application server, Mongrel was the most popular deployment method at one time. Its HTTP protocol parsing was written in C language, ensuring efficiency. Mongrel uses the Ruby user thread mechanism to implement multi-thread concurrency, but Ruby is not a local thread and Rails is NOT thread-safe. Therefore, during the execution of Rails code, it is completely locked, and there is no big difference between it and a single process. When we use Mongrel to deploy the Rails application, we usually start a mongrel_cluster in the background to start multiple Mongrel processes. For example, we can configure the following in mongrel_cluster.yml in 8000 ~ Port 8009 starts 10 Mongrel processes.
Listing 1. Mongrel_cluster.yml Configuration
--- cwd: /var/www/test_app log_file: log/Mongrel.log port: "8000" environment: production debug: false pid_file: log/mongrel.pid servers: 10
This deployment method actually limits Mongrel's ability to process large concurrent applications. When processing large data volumes of requests, the Mongrel process is often suspended, but the number of concurrent requests is large, as a result, all Mongrel entries will be suspended, so that the front-end servers will not be returned, and the entire Web application will be paralyzed. Therefore, Mongrel is not widely used. Many websites prefer to use the oldest FastCGI deployment method instead of Mongrel for application deployment, and Mongrel has stopped updating for a long time, mongrel developers are no longer developing Mongrel. Therefore, Mongrel is not currently the most recommended Rails application server.
Passenger is a Rails running environment similar to mod_php, rather than an independent Http server like Mongrel. Passenger supports mainstream apache and Nginx Web servers. Pasl is extremely convenient to use and has high performance. According to the test results published on the official Passenger website, Passenger is superior to the Mongrel server. Currently, passenger is undoubtedly the best choice. Passenger inherits the Rails "do not repeat yourself" convention. to deploy an application through Passenger, you only need to upload the Rails project file to the target server, and even do not need to restart the server, very simple.
To install and run passpsenger in the Nginx environment, you only need to perform the following operations:
List 2. Install Passenger
gem install passenger Passenger-install-nginx-module
The following code shows how to configure passpsenger on Nginx:
Listing 3. configuring Passenger on Nginx
http { ... server { listen 80; server_name www.test.com; root /var/www/test/public; passenger_enabled on; } ... }
Using Nginx + Passenger to build Ruby On Rails application servers can significantly improve performance. At the same time, you can also use Ruby Enterprise Edition to improve Ruby performance. This version is also developed by Phusion. It adopts the copy-on-write garbage collection policy and uses tcmalloc to improve memory allocation and the data published by their website, ruby Enterprise Edition can reduce memory consumption by 33% compared with normal Ruby versions.
Nginx + Passenger is deployed on the application server on one machine. However, when the concurrency is too large, one machine is insufficient to provide such processing capabilities. In this case, load balancing is usually performed, this is no different from the general load balancing policy. Nginx is better than Apache's reverse proxy module in its ability to use reverse proxy to achieve load balancing, we can add another Nginx Web server at the front end of multiple Nginx + Passenger. We can even use LVS for Load Balancing for better processing capabilities.
Use Starling and Workling to asynchronously execute Rails background tasks
During Web application development, each Web request needs to be quickly returned. At this time, some operations with a large amount of computing can be executed asynchronously in the background, such as a large number of data update operations, it only needs to be triggered in the Web application and then executed in the background. Some operations need to be performed on a regular basis, such as data backup, statistical analysis of some data, image processing, and mail sending. It is obviously inappropriate to put these operations in the Web application for immediate return, and the load on the machine is also very serious. For Rails Applications, in addition to affecting the user experience, this operation also blocks the Rails server instance, resulting in a reduction in overall performance. For such operations, we can use a task queue to queue the operations that need to be performed in sequence, and then start the process in the background to retrieve and execute these tasks in the queue. The queue can use the database, memcached, ActiveMQ, or Amazon SQS, while backend processes can use cronjob + script/runner or BackgrounDRb in Rails to perform operations. The solution to be introduced here is implemented using the Starling message queue and Workling plug-in contributed by twitter developers using the Memcache protocol.
Starling is an open-source Rails plug-in abstracted by the twitter development team from the twitter project. Although Starling is not exactly the plug-in used in the twitter online version, however, we can also trust Starling's performance and high-concurrency processing capabilities. Similar plug-ins include backgroundrb, background job, And background_fu. Backgroundrb uses drb to transmit messages in the queue, but it still has a problem. when updating the queue, backgroundrb uses a pessimistic lock, this situation is not allowed. Background job and background_fu are database-based message queues, and the database performance is not guaranteed when the load is high. Starling is a Message Queue Based on the Memcached protocol, which is more efficient and easier to scale. Generally, you can run a Starling server on each application server, and run the background program on the same machine or other machines to interact with it.
Run the following command to install Starling:
Listing 4. Installing Starling
gem sources -a http://gems.github.com/ sudo gem install starling-starling mkdir /var/spool/starling
When reading the Starling Server, we need memcache-client. This gem version 1.5.0 has some problems and is corrected in fiveruns-memcache-client, the fiveruns-memcache-client gem is automatically installed as a dependency in starling-starling gem.
After installing Starling, run the sudo Starling-d-p 15151 command to start it. during startup, use the-p parameter to specify the port to be used, generally, the-d parameter is added to make it run in the background in the daemon mode:
To understand the Starling mechanism and what Starling has done, after starting Starling, we can use the program below irb to perform a simple test:
Perform a simple test
Listing 5. Test Starling
>> require 'starling' => true >> Starling = Starling.new('127.0.0.1:15151) => MemCache: 1 servers, ns: nil, ro: false >> Starling.set('test_queue', 123) => nil >> loop { puts Starling.get('test_queue'); sleep 1 } 123 nil nil ...
Here we can see that the Server is started, and then we insert data here, we use a loop to access this queue, and the final output is what we want.
Next we will install workling:
Listing 6. Installing workling
script/plugin install git://github.com/purzelrakete/workling.git
Workling supports various background task operations, including the Starling installed above. After installing Starling, we need to environment in the Rails application. run the following code on rb to configure Workling to use Starling:
Listing 7. Using workling
Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new
The Workling configuration file is in workling. yml. Similar to other Rails configuration files, workling. yml can also perform different modes for different product modes. here only the production configuration is listed.
Listing 8. Workling Configuration
production: listens_on:localhost:15151, localhost:15152, localhost:15153 sleep_time: 2 reset_time: 30 memcache_options: namespace: myapp
The listens_on parameter is the Starling startup address and port accessed by workling_client. Multiple Starling addresses can be allowed here. This means that you can start multiple Starling servers and use one workling_client to call them. Sleep_time is the waiting time for workling_client to fetch data from the queue. reset_time defines the time for workling_client to wait for reconstruction and server connection in case of a memcache error. The namespace parameter in memcache_options defines the namespace used, which is very useful when the same server starts the same Starling server for different Rails Applications.
You can use the script/workling_client start script to start the workling_client process. Sometimes we feel that a workling_client is not enough. You can modify script/workling_client start to support multiple workling_client instances, in this way, a new workling_client instance is started every time script/workling_client start is run.
Listing 9. Workling configuration for multiple clients
options = { :app_name => "workling", :ARGV => ARGV, :dir_mode => :normal, :dir => File.join(File.dirname(__FILE__), '..', 'log'), :log_output => true, :multiple => true, :backtrace => true, :monitor => true }
Use Memcached and cache-money to cache data
Rails provides four caching Methods: Page Cache, Action Cache, Fragment Cache, and ActiveRecord Cache. Page Cache is the most efficient caching mechanism. It caches the entire Page in the form of Static Page HTML, which is very effective for pages that do not change frequently. Action Cache caches an action. The difference with Page Cache is that HTTP requests pass through the Rails application server until all before filters are processed, if you need to log on to the verification Page, you can add the verification steps to the before filter. Fragment Cache is a part of the Cache Page, different expiration policies can be applied to different parts of the same page. For the cache mechanism of Rails, we can write sweeper to process expiration and clearing. ActiveRecord Cache is a new caching mechanism for ActiveRecord released in Rails compared with the latest version. It uses the SQL query Cache to Cache database operations of the same SQL statement in the same action.
Rails's cache mechanism can effectively improve website performance. By default, Rails stores the cache in a file system, which is not a suitable storage method in the production environment and has limited file I/O efficiency, rails also supports saving Cache in the memory of the same process. However, if multiple Rails Applications exist, they cannot share the Cache. We recommend using MemCached for storage, which is also a popular cache storage method.
Memcached is developed by Danga Interactive to speed up access to LiveJournal.com. LiveJournal.com has several thousand dynamic page views per second, with 7 million users. Memcached is a high-performance distributed memory object cache system based on a hash table that stores key/value pairs. Its daemon is written in C. Memcached greatly reduces database load, allocates resources, and accesses the database more quickly. Clients can be implemented in various other languages. The Memcached client of Rails has been installed in the above introduction, because we only need to configure the following in the Rails application:
Listing 10. Memcached Configuration
config.cache_store = :mem_cache_store, 'localhost:11211'
You can use MemCached to cache data. In addition to the caching mechanism of Rails, we also directly use Rails. cache to operate Memcached for data caching. For example, we can use the following cache to read the number of all blogs:
Listing 11. Using Rails. cache
blogs_count = Rails.cache.fetch("blogs_count") do Blog.count end
Rails's ActiveRecord function is limited. It only applies to SQL query statements in the same action for caching. We need a more powerful ActiveRecord cache, cache-money was launched to solve such problems. As twitter websites become more and more stable and gradually get rid of the shadows of being used as a typical example of "Rails cannot be expanded, people expect the twitter development team to make more contributions to the Rails community. cache-money is another plug-in contributed by the twitter team after Starling. Cache-money is similar to the secondary cache of Hibernate. It is a write-through cache. When the ActiveRecord object is updated, instead of clearing the data in the cache, the updated content is directly written to the cache.
Cache-money has many outstanding features, such as the automatic cache clearing Mechanism (using after_save/after_destroy). It supports transactions because Rails's Active Record does not provide the after_commit mechanism, currently, a common cache plug-in conflicts with the cache update competition in high concurrency. This feature is very helpful for solving this problem. You can install cache-money through gem:
Listing 12. Installing cache-money
gem sources -a http://gems.github.com sudo gem install nkallen-cache-money require 'cache_money'
Listing 13. Configure config/memcached. yml
production: ttl: 604800 namespace: ... sessions: false debug: false servers: localhost:11211 development: ....
Listing 14. Use config/initializers/cache_money.rb to initialize
config = YAML.load(IO.read(File.join(Rails_ROOT, "config", "Memcached.yml")))[Rails_ENV] $memcache = MemCache.new(config) $memcache.servers = config['servers'] $local = Cash::Local.new($memcache) $lock = Cash::Lock.new($memcache) $cache = Cash::Transactional.new($local, $lock) class ActiveRecord::Base is_cached :repository => $cache end
Using cache-money is very convenient and requires no additional operations. You only need to perform simple configuration in your Model, such:
Listing 15. Configure the Model to use cache_money
class User < ActiveRecord::Base index :name end class Post < ActiveRecord::Base index [:title, :author] end class Article < ActiveRecord::Base version 7 index ... end
Then you can use Rails ActiveRecord methods and transaction operations as before. If you change the table structure of the database, you can change the version number of the Model to invalidate the previous cache without restarting the Memcached server.
Use Sphinx + LibMMSeg + Ultrasphinx for full-text search
Many applications require full-text search. Of course, you can directly integrate the search services provided by google or other search engines, but if you want to better control your search results, or make another use of your search results, you may have to implement full-text search by yourself. When performing full-text Chinese search, we generally need to consider two aspects: the performance of the search tool used and the preparation of Chinese word segmentation. In the Java World, Lucene is the absolute authority and first choice for full-text search. Although it does not support Chinese word segmentation, however, many third-party plug-ins can be used to improve the preparation rate and Performance of Chinese word segmentation. Ferret was once the most popular Rails full-text search plug-in, but this article recommends sphret with higher efficiency. Sphoff was developed by Russian Andrew Aksyonoff, which means "". It can complete indexing of millions of records in one or two minutes, and return the search results in milliseconds. Sphinx is well integrated with the database. It can be directly used to index the database data through the configuration file. In addition, Sphinx develops a SphinxSE database engine, you can directly compile Mysql to achieve database-level high-performance indexing. You can use the Ultrasphinx plug-in to use Sphinx in Rails, and Rails developers can use it to conveniently call the Sphinx function.
You can download Sphinx from here http://www.sphinxsearch.com/downloads.html
After installing sphge, you can directly install ultrasphge from Rubyforge:
Listing 16. Installing ultrasphloud
Ruby script/plugin install svn: // Rubyforge.org/var/svn/fauna/ultrasphinx/trunk
LibMMSeg is a Chinese word segmentation program. The latest version is 0.7.3 and is developed in C ++. The word segmentation algorithm uses Complex maximum matching )", it supports both Linux and Windows platforms, with the splitting speed of about 300 K/s (PM-1.2G). LibMMSeg starts from 0.7.2, and the author provides interfaces called by Ruby, therefore, we can use LibMMSeg directly in Ruby for word segmentation, which is quite convenient. LibMMSeg can be downloaded through a http://www.coreseek.cn/opensource/mmseg.
You can add your own custom words to the dictionary file to improve the segmentation accuracy of the Word Segmentation Method in a specific field. The default dictionary file is in data/unigram.txt. Then, run the mmseg-u unigram.txt command to generate a file named unigram.txt. uni, rename the file uni. lib, and construct the dictionary. Finally, unigram.txt must be UTF-8 encoded.
LibMMSeg developers in order to better enable sphjakarta to use LibMMSeg for Chinese word segmentation, developed related patches for sphsag, from here http://www.coreseek.cn/opensource/Sphinx/ download two patch files:
Http://www.coreseek.com/uploads/sources/sphinx-0.98rc2.zhcn-support.patch
Http://www.coreseek.com/uploads/sources/fix-crash-in-excerpts.patch
Then add the patch:
Listing 17. Installing the Sphinx patch
cd sphinx-0.9.8-rc2 patch -p1 < ../sphinx-0.98rc2.zhcn-support.patch patch -p1 < ../fix-crash-in-excerpts.patch
After installing these plug-ins and patches, We can configure the Rails application to support full-text search,
First, copy the vendor/plugins/ultrasphloud/examples/default. base under the ultrasphloud plug-in directory to: config/ultrasphloud/default. base, open this file, and replace:
Charset_type = UTF-8 changed to: charset_type = zh_cn.utf-8 to support full-text retrieval of Chinese characters, and add a line below the charset_type setting:
Charset_dictpath =/home/test/Search/lib, which is the path of the uni. lib dictionary mentioned above, and then deletes all charset_table-related settings.
Add full-text search support to the Model code in the Rails application:
If there is a Model named Article, two of which are called title and body, and I want to perform full-text search on these two attributes, I can add a row in article. rb:
Listing 18. Using ultrasphloud
Is_indexed: fields => ['created _ at', 'title', 'body']
After completing this configuration, we can use the rake ultrasphinx: configure command to generate the Sphinx configuration file. This command creates a development under config/ultrasphment. conf, which is the configuration file of Sphinx. Use the rake ultrasphinx: index Command to create an index. Rake ultrasphinx: daemon: start and rake ultrasphinx: daemon: stop correspond to the start and stop of the searchd service of Sphinx. Searchd starts a searchd on port 3313, and all search requests are sent to this port for execution. We perform a simple test on the console:
Listing 19. Test full-text index
search = Ultrasphinx::Search.new(:class_names => 'Article') search.run Search.results
After everything runs normally, we can search the full text in the action code.
Use Capistrano for quick deployment
During Rails deployment, you can directly update the code under svn or git, run db: migrate to update the database, and then start the server after such operations, you can deploy it easily. Even if you only have one machine, you will feel too troublesome. If you need multiple machines to run it, you may feel that every manual deployment is a nightmare, you can use shell scripts to simplify the deployment. When developing applications using Rails, you can use the Capistrano plug-in for simpler deployment. To put it simply, Capistrano is a tool that uses SSH to execute the same command on multiple machines in parallel and is used to install a whole batch of machines. It simplifies the deployment process through existing and custom tasks.
Listing 20. Installing Capistrano
gem sources -a http://gems.github.com/ gem install Capistrano
Configure the address of the server to be deployed in config/deploy. rb, the roles of various servers, and the unified user name and password for each server. The following is an example of Configuration:
Listing 21. Configure Capistrano
Set: application, "test_app" # application name set: scm_username, "test" # resource library username set: scm_password, 'test' # resource library password set: repository, Proc. new {"-- username # {scm_username} -- password # {scm_password} svn: // localhost/test_app/trunk"} # repository set: user, "test" # server SSH username set: password, 'test' # server SSH password set: deploy_to, "/var/www/# {application}" # deployment path on the server. The default deployment path is/u/apps/# {application} role: Web, 'web. t Est_app.com '# front-end Web server role: app, 'app1. test_app.com ', 'app2. test_app.com ', 'app3. test_app.com '# Rails Application Server role: db, 'app1. test_app.com ',: primary => true # the machine that runs the migrate script, usually one of the application servers.
When using Capistrano for deployment, cap sometask is usually used to run the task. You can use cap-h to view all options and use cap-T to view all existing tasks. For example, cap migrate executes the rake db: migrate command on the machine where role is db. For more information on using Capistrano, refer to the http://wiki.capify.org website. In addition, Capistrano can be automatically deployed in a non-Rails environment. After configuring the ruby environment and the Capistrano plugin, install the following Plugin:
Listing 22. Using Capistrano in a non-Rails Environment
gem sources -a http://gems.github.com/ gem install leehambley-railsless-deploy
Conclusion
This article focuses On some useful practices when using Ruby On Rails to develop and deploy Web applications. It does not specifically introduce some common problems that applications usually face, for example, database optimization and distributed deployment are issues that must be addressed by a large concurrent Web application. For example, you can deploy a distributed database using the master-slave method, you can also split the database by database or table sharding. In addition, when running a Rails server or other background applications, another process is required for monitoring. For example, using God to monitor a Rails process is also a common policy adopted by a Rails application. In addition, in many cases, more agile and lightweight Rack can be used instead of Rails to provide more efficient development services. In addition, the emergence of Rails network providers such as Engineyard, Joyent, and Heroku has also strengthened the confidence to use Rails to develop and deploy large-scale and concurrent Web applications. Although Ruby On Rails's defects are inevitable, developing scalable and high-performance applications is not impossible. This article hopes to help Rails developers quickly master some specific practices and compile and deploy Web applications with high performance and scalability.