Use Ruby to process text tutorials, and use ruby to process text tutorials
Similar to Perl and Python, Ruby has excellent functions and is a powerful text processing language. This article briefly introduces Ruby's text data processing function and how to use Ruby to effectively process text data in different formats, whether CSV data or XML data.
Ruby string
Common acronyms
- CSV: comma-separated values
- REXML: Ruby Electric XML
- XML: Extensible Markup Language
String in Ruby is a powerful method for accommodating, comparing, and operating text data. In Ruby, String is a class that can be instantiated by calling String: new or assigning it a literal value.
When assigning values to Strings, you can use single quotes (') or double quotes (") to enclose values. There are several differences between single quotes and double quotes when assigning values to Strings. Double quotation marks support the use of a forward backslash (\) for escape sequences and the use of the # {} operator in strings to calculate expressions. Strings referenced by single quotes are simple and direct text.
Listing 1 is an example.
Listing 1. Processing Ruby strings: defining strings
message = 'Heal the World…'
puts message
message1 = "Take home Rs #{100*3/2} "
puts message1
Output :
# ./string1.rb
# Heal the World…
# Take home Rs 150
Here, the first string is defined by a pair of single quotes, and the second string is defined by a pair of double quotes. In the second string, the expression in # {} is calculated before display.
Another useful string definition method is usually used for multi-line string definition.
From now on, I will use the interactive Ruby console irb> for instructions. You should also install the console for Ruby installation. If it is not installed, we recommend that you obtain irb Ruby gem and install it. The Ruby console is a very useful tool for learning Ruby and Its modules. After installation, run the command irb>.
Listing 2. Processing Ruby strings: defining multiple strings
irb>> str = >>EOF
irb>> "hello world
irb>> "how do you feel?
irb>> "how r u ?
irb>> EOF
"hello, world\nhow do you feel?\nhow r u?\n"
irb>> puts str
hello, world
how do you feel?
how r u?
In Listing 2,> EOF and EOF are considered part of the string, including the \ n (line feed) character.
The Ruby String class has a powerful set of methods for operating and processing data stored in them. Examples in listing 3, 4, and 5 show some methods.
Listing 3. Handling Ruby strings: connection strings
irb>> str = "The world for a horse" # String initialized with a value
The world for a horse
irb>> str*2 # Multiplying with an integer returns a
# new string containing that many times
# of the old string.
The world for a horseThe world for a horse
irb>> str + " Who said it ? " # Concatenation of strings using the '+' operator
The world for a horse Who said it ?
irb>> str<<" is it? " # Concatenation using the '<<' operator
The world for a horse is it?
Extract the sub-string and operate multiple parts of the string
Listing 4. Processing Ruby strings: extract and operate
irb>> str[0] # The '[]' operator can be used to extract substrings, just
# like accessing entries in an array.
# The index starts from 0.
84 # A single index returns the ascii value
# of the character at that position
irb>> str[0,5] # a range can be specified as a pair. The first is the starting
# index , second is the length of the substring from the
# starting index.
The w
irb>> str[16,5]="Ferrari" # The same '[]' operator can be used
# to replace substrings in a string
# by using the assignment like '[]='
irb>>str
The world for a Ferrari
Irb>> str[10..22] # The range can also be specified using [x1..x2]
for a Ferrari
irb>> str[" Ferrari"]=" horse" # A substring can be specified to be replaced by a new
# string. Ruby strings are intelligent enough to adjust the
# size of the string to make up for the replacement string.
irb>> s
The world for a horse
irb>> s.split # Split, splits the string based on the given delimiter
# default is a whitespace, returning an array of strings.
["The", "world", "for", "a", "horse"]
irb>> s.each(' ') { |str| p str.chomp(' ') }
# each , is a way of block processing the
# string splitting it on a record separator
# Here, I use chomp() to cut off the trailing space
"The"
"world"
"for"
"a"
"horse"
The Ruby String class can also use many other practical methods, such as modifying the case, getting the String length, deleting the record separator, scanning the String, encryption, and decryption. Another useful method is freeze, which can make the string unmodifiable. After this method (str. freeze) is called for String str, str cannot be modified.
Ruby also has some methods called destructor. With an exclamation point (!) The ending method will permanently modify the string. Modify the regular method (with no exclamation point at the end) and return a copy of the string that calls them. The methods with exclamation points directly modify the strings that call them.
Listing 5. Handling Ruby strings: modifying strings permanently
irb>> str = "hello, world"
hello, world
irb>> str.upcase
HELLO, WORLD
irb>>str # str, remains as is.
Hello, world
irb>> str.upcase! # here, str gets modified by the '!' at the end of
# upcase.
HELLO, WORLD
irb>> str
HELLO, WORLD
In listing 5, the string in str is determined by upcase! Method modification, but the upcase method only returns a copy of the string after the case is modified. These! Methods are sometimes useful.
Ruby Strings is very powerful. After the data is captured in Strings, you can use any number of methods to easily and effectively process the data.
Process CSV files
A csv file is a common way to represent table-based data. A table-based file is usually used to export data from a workbook (such as a list of contacts with detailed information.
Ruby has a powerful library that can be used to process these files. Csv is the Ruby module responsible for processing CSV files. It has methods for creating, reading, and parsing CSV files.
Listing 6 shows how to create a CSV file and parse the file using the Ruby csv module.
Listing 6. Process CSV files: Create and parse a CSV file
require 'csv'
writer = CSV.open('mycsvfile.csv','w')
begin
print "Enter Contact Name: "
name = STDIN.gets.chomp
print "Enter Contact No: "
num = STDIN.gets.chomp
s = name+" "+num
row1 = s.split
writer << row1
print "Do you want to add more ? (y/n): "
ans = STDIN.gets.chomp
end while ans != "n"
writer.close
file = File.new('mycsvfile.csv')
lines = file.readlines
parsed = CSV.parse(lines.to_s)
p parsed
puts ""
puts "Details of Contacts stored are as follows..."
puts ""
puts "-------------------------------"
puts "Contact Name | Contact No"
puts "-------------------------------"
puts ""
CSV.open('mycsvfile.csv','r') do |row|
puts row[0] + " | " + row[1]
puts ""
end
Listing 7 shows the output:
Listing 7. Processing CSV files: Creating and parsing a CSV file output
Enter Contact Name: Santhosh
Enter Contact No: 989898
Do you want to add more ? (y/n): y
Enter Contact Name: Sandy
Enter Contact No: 98988
Do you want to add more ? (y/n): n
Details of Contacts stored are as follows...
---------------------------------
Contact Name | Contact No
---------------------------------
Santhosh | 989898
Let's take a quick look at this example.
First, it contains the csv module (require 'csv ').
To create a new CSV file mycsvfile.csv, open it by calling CSV. open. This returns a writer object.
In this example, a CSV file is created, which contains a simple contact list, storing the contact name and phone number. In the cycle, the user is required to enter the contact name and phone number. The name and phone number are connected into a string and then split into arrays containing two strings. This array is passed to the writer object to write the CSV file. In this way, a pair of CSV values are stored as a row in the file.
After the cycle ends, the task is completed. Close the writer and save the data in the file.
The next step is to parse the created CSV file.
One way to open and parse the File is to use the new CSV File name to create a new File object.
Call the readlines method to read all rows in the file into an array named lines.
By calling lines. to_s converts the lines array to a String object, and then passes the String object to CSV. parse method. This method parses CSV data and returns its content to an array containing arrays.
The following describes another method for opening and parsing the file. Use CSV. open to open the file again in Read mode. This returns an array of rows. Print each row in a certain format to display contact details. Each row corresponds to the row in the file.
As you can see, Ruby provides a powerful module to process CSV files and data.
Process XML files
For XML files, Ruby provides a powerful built-in library named REXML. This library can be used to read and parse XML documents.
View the following XML file and try to parse it using Ruby and REXML.
The following is a simple XML file that lists the content in a typical shopping cart of an online shopping center. It has the following elements:
- Cart -- root element
- User-purchased user
- Item -- item that the user adds to the shopping cart
- Id, price, and quantity -- sub-elements of the project
Listing 8 shows the XML structure:
Listing 8. processing XML files: Example XML files
<cart id="userid">
<item code="item-id">
<price>
<price/unit>
</price>
<qty>
<number-of-units>
</qty>
</item>
</cart>
Obtain the sample XML file from the download part. Now, load the XML file and use REXML to parse the file tree.
Listing 9. processing XML files: parsing XML files
require 'rexml/document'
include REXML
file = File.new('shoppingcart.xml')
doc = Document.new(file)
root = doc.root
puts ""
puts "Hello, #{root.attributes['id']}, Find below the bill generated for your purchase..."
puts ""
sumtotal = 0
puts "-----------------------------------------------------------------------"
puts "Item\t\tQuantity\t\tPrice/unit\t\tTotal"
puts "-----------------------------------------------------------------------"
root.each_element('//item') { |item|
code = item.attributes['code']
qty = item.elements["qty"].text.split(' ')
price = item.elements["price"].text.split(' ')
total = item.elements["price"].text.to_i * item.elements["qty"].text.to_i
puts "#[code]\t\t #{qty}\t\t #{price}\t\t #{total}"
puts ""
sumtotal += total
}
puts "-----------------------------------------------------------------------"
puts "\t\t\t\t\t\t Sum total : " + sumtotal.to_s
puts "-----------------------------------------------------------------------"
Listing 10 shows the output.
Listing 10. processing XML files: parsing XML file output
Hello, santhosh, Find below the bill generated for your purchase...
-------------------------------------------------------------------------
Item Quantity Price/unit Total
-------------------------------------------------------------------------
CS001 2 100 200
CS002 5 200 1000
CS003 3 500 1500
CS004 5 150 750
-------------------------------------------------------------------------
Sum total : 3450
--------------------------------------------------------------------------
Listing 9 parses the XML file of the shopping cart and generates a bill, which shows the total project and total purchases (see listing 10 ).
The following describes the procedure.
First, it contains the REXML module of Ruby, which has the method for parsing XML files.
Open the shoppingcart. xml file and create a Document object from the file. The object contains the parsed XML file.
Allocate the root of the document to the root of the element object. This will point to the cart tag in the XML file.
Each element object has an attribute object, which is a hash table of the element property. The attribute name is used as the key name and the attribute value as the key value. Here, root. attributes ['id'] provides the value of the id attribute of the root element (userid in this example ).
Next, initialize sumtotals to 0 and print the header.
Each element object also has an object elements, which has the each and [] methods for accessing child elements. This object traverses all the child elements with the item name (specified by the XPath expression // item. Each element also has a text attribute that contains the text value of the element.
Next, obtain the code attribute of the item element and the text value of the price and qty elements, and then calculate the Total project ). Print the details to the Bill and add the total projects to the total purchases (Sum total ).
Finally, print the total purchases.
This example shows how easy it is to parse XML files using REXML and Ruby! Similarly, it is easy to generate XML files in the running process to add and delete elements and their attributes.
Listing 11. processing XML files: generating XML files
doc = Document.new
doc.add_element("cart1", {"id" => "user2"})
cart = doc.root.elements[1]
item = Element.new("item")
item.add_element("price")
item.elements["price"].text = "100"
item.add_element("qty")
item.elements["qty"].text = "4"
cart .elements << item
The code in listing 11 creates an XML structure by creating a cart element, an item element, and its child elements, and then fills these child elements with values and adds them to the Document root.
Similarly, to delete Elements and attributes, use the delete_element and delete_attribute methods of the Elements object.
The method in the preceding example is called tree parsing ). Another XML document parsing method is stream parsing ). "Stream resolution" is faster than "tree resolution" and can be used to require fast resolution. "Stream Parsing" is based on events and uses listeners. When the parsing stream encounters a tag, it calls the listener and performs processing.
Listing 12 shows an example:
Listing 12. processing XML files: stream Parsing
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class Listener
include StreamListener
def tag_start(name, attributes)
puts "Start #{name}"
end
def tag_end(name)
puts "End #{name}"
end
end
listener = Listener.new
parser = Parsers::StreamParser.new(File.new("shoppingcart.xml"), listener)
parser.parse
Output in listing 13:
Listing 13. processing XML files: stream parsing output
Start cart
Start item
Start price
End price
Start qty
End qty
End item
Start item
Start price
End price
Start qty
End qty
End item
Start item
Start price
End price
Start qty
End qty
End item
Start item
Start price
End price
Start qty
End qty
End item
End cart
In this way, the combination of REXML and Ruby provides a very effective and intuitive way to process and operate XML data.
Conclusion
Ruby has a good set of built-in and external libraries that support fast, powerful, and effective text processing. You can use this feature to simplify and improve various text data processing tasks that may be encountered. This article is just a brief introduction to Ruby's text processing function. You can learn more about this function.
Without a doubt, Ruby is a powerful tool you need.