The String in Ruby is mutable, unlike the String in java and C # is immutable. For example
Copy codeThe Code is as follows: str1 = "abc"
Str2 = "abc"
In java, for literal strings, the jvm maintains a table internally. Therefore, in java, str1 and str2 are the same String object. In Ruby, str1 and str2 are completely different objects. Similarly, operations on String objects in java will generate a new object, while Ruby will manipulate the same object, for example:Copy codeCode: str = "abc"
Str. concat ("cdf ")
In this case, str is "abccdf ". How does Ruby deal with String? Let's only talk about the implementation in c ruby. If you are interested, let's take a look at this article, "look at Ruby-object basics". In ruby. h, we can see the structure of the String object. Objects in Ruby (including classes and objects) are all one by one struct, and the String cannot be an exception:Copy codeThe Code is as follows: struct RString {
Struct RBasic basic;
Long len;
Char * ptr;
Union {
Long capa;
VALUE shared;
} Aux;
};
// Ruby. h
Apparently, len is the length of String; ptr is a char pointer pointing to the actual String; then it is a union, which will be discussed later. If you can see ruby. h, almost all defined object structures have a struct RBasic. Obviously, struct RBasic contains some important information shared by all object structures. Look at RBasic:Copy codeThe Code is as follows: struct RBasic {
Unsigned long flags;
VALUE klass;
};
Among them, flags are a multi-purpose mark, which is used to record the struct type in most cases, ruby. h pre-defines some column macros, such as T_STRING (struct RString) and T_ARRAY (struct RArray. Klass is a VALUE type, and VALUE is also unsigned long. It can be regarded as a pointer (a pointer of 4 bytes, more than enough). It points to a Ruby object, here we will go into depth later.
So what is the combination of capa and shared in aux? Because Ruby's String is variable, variables mean len can be changed. We need to increase or decrease the memory (using the realloc () function in c) according to len transformation every time ), this is obviously a huge overhead. The solution is to reserve a certain amount of space. The size of the memory pointed by ptr is slightly greater than that of len, so that realloc is not frequently called, aux. capa is a length that contains an additional memory size. So what does aux. shared do? This is a VALUE type, indicating that it points to an object. Aux. shared is actually used to speed up the creation of strings, in a loop:
Ruby code
Whiletruedo repeats a = "str" # creates a string with "str" as the content and assigns it to. concat ("ing") # Add "ing" p (a) for the object to which a points # display "string" end
Every time you re-create a "str" object, the internal is to re-create a char [], which is quite luxurious, aux. shared is used to share char []. Strings created literally share a char []. To change, copy the string to a non-shared memory, changes are implemented for this new copy. This is the so-called "copy-on-write" technology. After explaining the internal structure of String, it seems that we haven't introduced how String achieves mutable. Let's write a Ruby Extension Test and we want to write such a Ruby class:
Ruby code
ClassTestdefteststr = "str" str. concat ("ing") endend
The corresponding C language code is:
Cpp CodeCopy codeThe Code is as follows: # include
# Include "ruby. h" staticVALUEt_test (VALUEself ){
VALUEstr; str = rb_str_new2 ("str ");
Printf ("beforeconcat: str: % p,
Str. aux. shared: % p, str. ptr: % s "n", str, (RSTRING (str)-> aux ). shared, RSTRING (str)-> ptr );
Rb_str_cat2 (str, "ing ");
Printf ("afterconcat: str: % p, str. aux. shared: % p, str. ptr: % s" n ",
Str, (RSTRING (str)-> aux). shared, RSTRING (str)-> ptr); returnself;
}
VALUEcTest;
VoidInit_string_hack (){
CTest = rb_define_class ("Test", rb_cObject );
Rb_define_method (cTest, "test", t_test, 0 );
} // String_hack.c
The rb_define_class function defines a class Test. rb_define_method adds the t_test Method to the test class with the name of Test. In t_test, an RString structure is generated each time through rb_str_new2, and then str is connected to "ing" Through rb_str_cat2, and some prints are added for tracking. Use mkmf to generate Makefile and write extconf. rb
Ruby code
Require 'mkmf 'create _ makefile ("string_hack ");
Execute ruby extconf. rb to generate a Makefile, execute make, and generate a link library of string_hack.so. After the extension is completed, it is called through ruby:
Ruby code
Require 'string _ hack "t = Test. new (1 .. 3). each {| I | t. test}
Output:
Before concat: str: 0x40098a40, str. aux. shared: 0x3, str. ptr: str
After concat: str: 0x40098a40, str. aux. shared: 0x8, str. ptr: string
Before concat: str: 0x40098a2c, str. aux. shared: 0x3, str. ptr: str
After concat: str: 0x40098a2c, str. aux. shared: 0x8, str. ptr: string
Before concat: str: 0x40098a18, str. aux. shared: 0x3, str. ptr: str
After concat: str: 0x40098a18, str. aux. shared: 0x8, str. ptr: string
From the results, we can see that after str concat, the position indicated by str has not changed, but only the value of the string pointed by ptr in str. Let's see the implementation of the rb_str_cat2 function at a Glance:
Cpp CodeCopy codeThe Code is as follows: VALUErb_str_cat (str, ptr, len) VALUEstr;
Constchar * ptr;
Longlen;
{
If (len <0) {rb_raise (rb_eArgError, "negativestringsize (orsiz1_big )");
}
If (FL_TEST (str, STR_ASSOC ))
{
Rb_str_modify (str );
REALLOC_N (RSTRING (str)-> ptr, char, RSTRING (str)-> len + len );
Memcpy (RSTRING (str)-> ptr + RSTRING (str)-> len, ptr, len );
RSTRING (str)-> len + = len;
RSTRING (str)-> ptr [RSTRING (str)-> len] = '"0 ';
/* Sentinel */
Returnstr;
}
Returnrb_str_buf_cat (str, ptr, len );
}
VALUErb_str_cat2 (str, ptr) VALUEstr;
Constchar * ptr;
{
Returnrb_str_cat (str, ptr, strlen (ptr ));
}
// String. c