Several programming skills for accelerating Ruby on Rails, rubyrails
Ruby is often praised for its flexibility. As Dick Sites says, you can "Program for programming ". Ruby on Rails extends the core Ruby language, but Ruby itself makes this extension possible. Ruby on Rails uses the flexibility of this language, so that you can easily write highly structured programs without too many samples or extra code: no additional work, A large number of standard behaviors can be obtained. Although this easy and free behavior is not always perfect, after all, you can get a lot of good architectures without having to work too much.
For example, Ruby on Rails is based on the Model-View-Controller (MVC) mode, which means that most Rails Applications can be clearly divided into three parts. The model section contains the behavior required to manage application data. Generally, in a Ruby on Rails application, the relationship between the model and the database table is; by default, the object relational ing (ORM) is used by Ruby on Rails) activeRecord is responsible for managing the interaction between the model and the database, which means that Ruby on Rails programs usually have (if any) a small amount of SQL code. The second part is the view, which contains the Code required to create the output sent to the user; it is usually composed of HTML, JavaScript, and so on. The final part is the controller, which converts the input from the user to the correct model and then presents the response using the appropriate view.
Rails advocates are generally happy to owe their Usability Improvements to the MVC model-and other features of Ruby and Rails, it is said that few programmers can create more functions in a short period of time. Of course, this means that the cost of software development will produce more commercial value, so Ruby on Rails development is becoming increasingly popular.
However, the initial development cost is not the whole thing, and there are other subsequent costs that need to be considered, such as the maintenance cost and hardware cost of the application program running. Ruby on Rails developers usually use testing and other agile development technologies to reduce maintenance costs. However, it is easy to ignore the effective running of Rails Applications with large amounts of data. Although Rails can simplify database access, it is not always so effective.
Why is the Rails application running slowly?
There are several basic reasons for the slow running of Rails Applications. The first reason is simple: Rails always makes some assumptions to accelerate development for you. This assumption is usually correct and helpful. However, they do not always benefit performance and lead to low resource usage efficiency-especially database resources.
For example, if you use an SQL statement equivalent to SELECT *, ActiveRecord Selects all fields in the query by default. When there are a large number of columns-especially when some fields are huge VARCHAR or BLOB fields-this behavior is very problematic in terms of memory usage and performance.
Another significant challenge is N + 1. This article will discuss this in detail. This will lead to the execution of many small queries, rather than a single large query. For example, ActiveRecord does not know which of the parent records will request a subrecord, so it generates a subrecord query for each parent record. This behavior causes obvious performance problems due to the load per query.
Other challenges are more related to the development habits and attitudes of Ruby on Rails developers. Because ActiveRecord can make so many tasks easy, Rails developers often form an "SQL is not like" attitude, even when it is more appropriate to use SQL, SQL is also avoided. The process of creating and processing a large number of ActiveRecord objects is very slow, so in some cases, writing an SQL query directly without instantiating any object will be faster.
Ruby on Rails is often used to reduce the size of development teams, and Ruby on Rails developers usually perform some system management tasks required to deploy and maintain applications in production, therefore, if you know little about the application environment, problems may occur. The operating system and database may not be correctly set. For example, although not optimal, MySQL my. cnf settings often retain their default settings in Ruby on Rails deployment. In addition, sufficient monitoring and benchmarking tools may be missing to provide long-term performance conditions. Of course, this is not to blame Ruby on Rails developers; this is the consequence of non-specialization; in some cases, Rails developers may be experts in these two fields.
The last problem is that Ruby on Rails encourages developers to develop in a local environment. There are several advantages to doing so-for example, reducing development latency and improving distribution-but it does not mean that you can only process limited datasets because of the decrease in workstation size. How they develop and where code will be deployed may be a big problem. Even if you have been processing small-scale data on a local server with good performance for a long time, it will also find that the application has obvious performance problems for the large data on the congested server.
Of course, there may be many causes of performance problems in Rails Applications. The best way to identify potential performance problems with Rails applications is to use diagnostic tools that provide you with a repeatable and accurate volume.
Detect performance problems
One of the best tools is the Rails development log, which is usually located in the log/development. log file on each development machine. It has a variety of comprehensive indicators: the total time spent in responding to requests, the percentage of time spent in the database, and the percentage of time spent in generating views. In addition, some tools can be used to analyze this log file, such as development-log-analyzer.
During production, you can find a lot of valuable information by viewing mysql_slow_log. A more comprehensive introduction is beyond the scope of this article. For more information, see references.
One of the most powerful and useful tools is the query_reviewer plug-in (see references ). This plug-in shows how many queries are executed on the page and how long it takes to generate the page. It also automatically analyzes the SQL code generated by ActiveRecord to detect potential problems. For example, if you forget to index an important column and cause performance problems, you can easily find this column (for more information about MySQL indexes, see references ). This plug-in displays all such information in a pop-up <div> (only visible in development mode.
Finally, do not forget to use tools like Firebug, yslow, Ping, and tracert to check whether the performance problem is caused by network or resource loading.
Next, let's look at some specific Rails performance problems and their solutions.
N + 1 query Problems
N + 1 query is one of the biggest problems in Rails Applications. For example, how many queries can be generated by the code in Listing 1? This code is a simple loop that traverses all the post in a hypothetical post table and displays the post category and its body.
Listing 1. unoptimized Post. all code
<% @ posts = Post.all (@posts) .each do | p |%>
<h1> <% = p.category.name%> </ h1>
<p> <% = p.body%> </ p>
<% end%>
Answer: The above code generates a query plus one query per line within @posts. This can be a big challenge due to the load per query. The culprit is the call to p.category.name. This call applies only to that particular post object, not the entire @posts array. Fortunately, we can fix this by using Load Now.
Immediate loading means that Rails will automatically execute the required queries to load objects for any particular child object. Rails will use a JOIN SQL statement or a strategy to execute multiple queries. However, assuming that all sub-objects to be used are specified, the N + 1 case will never result, and in the N + 1 case, each iteration of a loop will generate an additional query. Listing 2 is a revision of the code in Listing 1, which uses an immediate load to avoid N + 1 issues.
Listing 2. Optimized Post.all code with immediate loading
<% @ posts = Post.find (: all,: include => [: category]
@ posts.each do | p |%>
<h1> <% = p.category.name%> </ h1>
<p> <% = p.body%> </ p>
<% end%>
This code generates up to two queries, regardless of how many rows are in this posts table.
Of course, not all cases are so simple. Handling complex N + 1 query situations requires more work. So is it worth the effort? Let's do some quick tests.
Test N + 1
Using the script in Listing 3, you can see how quickly the query can be reached-or how fast. Listing 3 shows how to use ActiveRecord in a stand-alone script to establish a database connection, define a table, and load data. Then, you can use Ruby's built-in benchmarking library to see which way is faster and faster.
Listing 3. Load the benchmark script now
require 'rubygems'
require 'faker'
require 'active_record'
require 'benchmark'
# This call creates a connection to our database.
ActiveRecord :: Base.establish_connection (
: adapter => "mysql",
: host => "127.0.0.1",
: username => "root", # Note that while this is the default setting for MySQL,
: password => "", # a properly secured system will have a different MySQL
# username and password, and if so, you'll need to
# change these settings.
: database => "test")
# First, set up our database ...
class Category <ActiveRecord :: Base
end
unless Category.table_exists?
ActiveRecord :: Schema.define do
create_table: categories do | t |
t.column: name,: string
end
end
end
Category.create (: name => 'Sara Campbell \' s Stuff ')
Category.create (: name => 'Jake Moran \' s Possessions')
Category.create (: name => 'Josh \' s Items')
number_of_categories = Category.count
class Item <ActiveRecord :: Base
belongs_to: category
end
# If the table doesn't exist, we'll create it.
unless Item.table_exists?
ActiveRecord :: Schema.define do
create_table: items do | t |
t.column: name,: string
t.column: category_id,: integer
end
end
end
puts "Loading data ..."
item_count = Item.count
item_table_size = 10000
if item_count <item_table_size
(item_table_size-item_count) .times do
Item.create! (: Name => Faker.name,
: category_id => (1 + rand (number_of_categories.to_i)))
end
end
puts "Running tests ..."
Benchmark.bm do | x |
[100,1000,10000] .each do | size |
x.report "size: # {size}, with n + 1 problem" do
@ items = Item.find (: all,: limit => size)
@ items.each do | i |
i.category
end
end
x.report "size: # {size}, with: include" do
@ items = Item.find (: all,: include =>: category,: limit => size)
@ items.each do | i |
i.category
end
end
end
end
This script uses the: include clause to test how fast it is to loop through 100, 1,000, and 10,000 objects with and without immediate loading. In order to run this script, you may need to replace these database connection parameters at the top of the script with parameters appropriate for your local environment. In addition, you need to create a MySQL database named test. Finally, you also need the ActiveRecord and faker gems, both of which can be obtained by running gem install activerecord faker.
The results produced by running this script on my machine are shown in Listing 4.
Listing 4. Loaded benchmark script output
-create_table (: categories)
-> 0.1327s
-create_table (: items)
-> 0.1215s
Loading data ...
Running tests ...
user system total real
size: 100, with n + 1 problem 0.030000 0.000000 0.030000 (0.045996)
size: 100, with: include 0.010000 0.000000 0.010000 (0.009164)
size: 1000, with n + 1 problem 0.260000 0.040000 0.300000 (0.346721)
size: 1000, with: include 0.060000 0.010000 0.070000 (0.076739)
size: 10000, with n + 1 problem 3.110000 0.380000 3.490000 (3.935518)
size: 10000, with: include 0.470000 0.080000 0.550000 (0.573861)
In all cases, tests using: include are always faster-5.02, 4.52, and 6.86 times faster, respectively. Of course, the specific output depends on your particular situation, but loading immediately can lead to significant performance improvements.
Nested load immediately
What if you want to reference a nested relationship—the relationship of a relationship? Listing 5 shows a common situation: looping through all posts and displaying the author's image, where the relationship between Author and Image belongs_to
Listing 5. Nested immediate load use cases
@posts = Post.all
@ posts.each do | p |
<h1> <% = p.category.name%> </ h1>
<% = image_tag p.author.image.public_filename%>
<p> <% = p.body%>
<% end%>
This code suffers the same N + 1 issue as before, but the syntax for the fix is not so obvious because the relationship is used here. So how can we load the nested relationship immediately?
The correct answer is to use the hash syntax of the: include clause. Listing 6 shows a nested immediate load using hash syntax.
Listing 6. Nested immediate load solution
@posts = Post.find (: all,: include => {: category => [],
: author => {: image => []}})
@ posts.each do | p |
<h1> <% = p.category.name%> </ h1>
<% = image_tag p.author.image.public_filename%>
<p> <% = p.body%>
<% end%>
As you can see, you can nest hashes and array literals. Note that the only difference between a hash and an array in this example is that a hash can contain nested subentries, whereas an array cannot. Otherwise, the two are equivalent.
Indirect immediate loading
Not all N + 1 issues are easily detectable. For example, how many queries can Listing 7 generate?
Listing 7. Indirect immediate load example use case
<% @ user = User.find (5)
@ user.posts.each do | p |%>
<% = render: partial => 'posts / summary',: locals =>: post => p
%> <% end%>
Of course, determining the number of queries requires some knowledge of the posts / summary partial. This partial is shown in Listing 8.
Listing 8. Indirect immediate partial: posts / _summary.html.erb
<h1> <% = post.user.name%> </ h1>
Unfortunately, the answer is that Listing 7 and Listing 8 generate an extra query per line in the post to find the user's name — even if the post object was automatically generated by ActiveRecord from a User object already in memory. In short, Rails cannot associate child records with its parent records.
The fix is to use self-loading for immediate loading. Basically, because Rails overloads child records generated by parent records, these parent records need to be loaded immediately, just as there is a completely separate relationship between parent and child records. The code is shown in Listing 9.
Listing 9. Indirect immediate-load solution
<% @ user = User.find (5,: include => {: posts => [: user]})
... snip ...
Although counterintuitive, this technique works roughly similarly to the techniques described above. However, it is easy to use this technique for too much nesting, especially if the architecture is complex. Simple use cases are fine, such as shown in Listing 9, but complex nesting can be problematic. In some casesNext, loading too many Ruby objects can be slower than dealing with the N + 1 problem — especially when each object is not traversed by the entire tree. In that case, other solutions to the N + 1 problem may be more appropriate.
One way is to use caching technology. Rails V2.1 has simple cache access built in. Using Rails.cache.read, Rails.cache.write and related methods, you can easily create your own simple cache mechanism, and the back end can be a simple memory back end, a file-based back end, or a distributed cache server. You can find more information about Rails' built-in cache support in the Resources section. But you don't need to create your own caching solution; you can use a pre-built Rails plugin, such as Nick Kallen's cache money plugin. This plugin provides write-through caching and is based on code used on Twitter. See Resources for more information.
Of course, not all Rails issues are related to the number of queries.
Rails grouping and aggregation calculations
One problem you may encounter is that the work done in Ruby should be done by the database. This tests the power of Ruby. It's hard to imagine people voluntarily re-implementing parts of their database code in C without any major incentives, but it's easy to do similar calculations for groups of ActiveRecord objects in Rails. However, Ruby is always slower than database code. So don't use pure Ruby to perform calculations, as shown in Listing 10.
Listing 10. Incorrect way to perform group calculations
all_ages = Person.find (: all) .group_by (&: age) .keys.uniq
oldest_age = Person.find (: all) .max
Instead, Rails provides a series of grouping and aggregation functions. You can use them as shown in Listing 11.
Listing 11. The correct way to perform group calculations
all_ages = Person.find (: all,: group => [: age])
oldest_age = Person.calcuate (: max,: age)
ActiveRecord :: Base # find has a number of options for mimicking SQL. More information can be found in the Rails documentation. Note that the calculate method can be applied to any valid aggregate function supported by the database, such as: min,: sum, and: avg. In addition, calculate can accept several arguments, such as: conditions. Consult the Rails documentation for more detailed information.
However, not everything that can be done in SQL can be done in Rails. If the plugin is not enough, you can use custom SQL.
Custom SQL with Rails
Imagine a table that contains people's occupations, ages, and the number of accidents that involved them in the past year. This information can be retrieved using a custom SQL statement, as shown in Listing 12.
Listing 12. Example of custom SQL with ActiveRecord
sql = "SELECT profession,
AVG (age) as average_age,
AVG (accident_count)
FROM persons
GROUP
BY profession "
Person.find_by_sql (sql) .each do | row |
puts "# {row.profession}," <<
"avg. age: # {row.average_age}," <<
"avg. accidents: # {row.average_accident_count}"
end
This script should produce the results shown in Listing 13.
Listing 13. Custom SQL output with ActiveRecord
Programmer, avg. Age: 18.010, avg. Accidents: 9
System Administrator, avg. Age: 22.720, avg. Accidents: 8
Of course, this is the simplest example. You can imagine how you can expand the SQL in this example into a somewhat complex SQL statement. You can also use the ActiveRecord :: Base.connection.execute method to run other types of SQL statements, such as the ALTER TABLE statement, as shown in Listing 14.
Listing 14. Custom non-lookup SQL with ActiveRecord
ActiveRecord :: Base.connection.execute "ALTER TABLE some_table CHANGE COLUMN ..."
Most schema operations, such as adding and removing columns, can be done using Rails' built-in methods. But the ability to execute arbitrary SQL code can be used if needed.
Concluding remarks
As with all frameworks, Ruby on Rails suffers from performance issues if you don't take care and attention. Fortunately, the techniques for monitoring and fixing these problems are relatively simple and easy to learn, and even complex problems can be solved with patience and an understanding of the source of the performance problem.