Use Ruby to process text tutorials, and use ruby to process text tutorials

Source: Internet
Author: User
Tags first string

Use Ruby to process text tutorials, and use ruby to process text tutorials

Similar to Perl and Python, Ruby has excellent functions and is a powerful text processing language. This article briefly introduces Ruby's text data processing function and how to use Ruby to effectively process text data in different formats, whether CSV data or XML data.
Ruby string
Common acronyms

  • CSV: comma-separated values
  • REXML: Ruby Electric XML
  • XML: Extensible Markup Language

String in Ruby is a powerful method for accommodating, comparing, and operating text data. In Ruby, String is a class that can be instantiated by calling String: new or assigning it a literal value.

When assigning values to Strings, you can use single quotes (') or double quotes (") to enclose values. There are several differences between single quotes and double quotes when assigning values to Strings. Double quotation marks support the use of a forward backslash (\) for escape sequences and the use of the # {} operator in strings to calculate expressions. Strings referenced by single quotes are simple and direct text.

Listing 1 is an example.
Listing 1. Processing Ruby strings: defining strings

message = 'Heal the World…'

puts message

message1 = "Take home Rs #{100*3/2} "

puts message1

Output :

# ./string1.rb

# Heal the World…

# Take home Rs 150

Here, the first string is defined by a pair of single quotes, and the second string is defined by a pair of double quotes. In the second string, the expression in # {} is calculated before display.

Another useful string definition method is usually used for multi-line string definition.

From now on, I will use the interactive Ruby console irb> for instructions. You should also install the console for Ruby installation. If it is not installed, we recommend that you obtain irb Ruby gem and install it. The Ruby console is a very useful tool for learning Ruby and Its modules. After installation, run the command irb>.
Listing 2. Processing Ruby strings: defining multiple strings

irb>> str = >>EOF

irb>> "hello world

irb>> "how do you feel?

irb>> "how r u ?

irb>> EOF

"hello, world\nhow do you feel?\nhow r u?\n"

irb>> puts str

hello, world
how do you feel?
how r u?

In Listing 2,> EOF and EOF are considered part of the string, including the \ n (line feed) character.

The Ruby String class has a powerful set of methods for operating and processing data stored in them. Examples in listing 3, 4, and 5 show some methods.
Listing 3. Handling Ruby strings: connection strings

irb>> str = "The world for a horse" # String initialized with a value

The world for a horse

irb>> str*2      # Multiplying with an integer returns a 
           # new string containing that many times
           # of the old string.

The world for a horseThe world for a horse

irb>> str + " Who said it ? "  # Concatenation of strings using the '+' operator

The world for a horse Who said it ?

irb>> str<<" is it? " # Concatenation using the '<<' operator

The world for a horse is it?

Extract the sub-string and operate multiple parts of the string
Listing 4. Processing Ruby strings: extract and operate

irb>> str[0] # The '[]' operator can be used to extract substrings, just 
      # like accessing entries in an array.
      # The index starts from 0.
84 # A single index returns the ascii value
      # of the character at that position

irb>> str[0,5] # a range can be specified as a pair. The first is the starting 
      # index , second is the length of the substring from the
      # starting index.

The w

irb>> str[16,5]="Ferrari" # The same '[]' operator can be used
         # to replace substrings in a string
         # by using the assignment like '[]='
irb>>str

The world for a Ferrari

Irb>> str[10..22] # The range can also be specified using [x1..x2] 

for a Ferrari

irb>> str[" Ferrari"]=" horse" # A substring can be specified to be replaced by a new
        # string. Ruby strings are intelligent enough to adjust the
        # size of the string to make up for the replacement string.

irb>> s

The world for a horse

irb>> s.split  # Split, splits the string based on the given delimiter
        # default is a whitespace, returning an array of strings.

["The", "world", "for", "a", "horse"]

irb>> s.each(' ') { |str| p str.chomp(' ') }

        # each , is a way of block processing the
   # string splitting it on a record separator
   # Here, I use chomp() to cut off the trailing space

"The"
"world"
"for"
"a"
"horse"

The Ruby String class can also use many other practical methods, such as modifying the case, getting the String length, deleting the record separator, scanning the String, encryption, and decryption. Another useful method is freeze, which can make the string unmodifiable. After this method (str. freeze) is called for String str, str cannot be modified.

Ruby also has some methods called destructor. With an exclamation point (!) The ending method will permanently modify the string. Modify the regular method (with no exclamation point at the end) and return a copy of the string that calls them. The methods with exclamation points directly modify the strings that call them.
Listing 5. Handling Ruby strings: modifying strings permanently

irb>> str = "hello, world"

hello, world

irb>> str.upcase

HELLO, WORLD

irb>>str   # str, remains as is.

Hello, world

irb>> str.upcase!  # here, str gets modified by the '!' at the end of 
        # upcase.
HELLO, WORLD

irb>> str

HELLO, WORLD

In listing 5, the string in str is determined by upcase! Method modification, but the upcase method only returns a copy of the string after the case is modified. These! Methods are sometimes useful.

Ruby Strings is very powerful. After the data is captured in Strings, you can use any number of methods to easily and effectively process the data.

Process CSV files

A csv file is a common way to represent table-based data. A table-based file is usually used to export data from a workbook (such as a list of contacts with detailed information.

Ruby has a powerful library that can be used to process these files. Csv is the Ruby module responsible for processing CSV files. It has methods for creating, reading, and parsing CSV files.

Listing 6 shows how to create a CSV file and parse the file using the Ruby csv module.
Listing 6. Process CSV files: Create and parse a CSV file

require 'csv'

writer = CSV.open('mycsvfile.csv','w')

begin

 print "Enter Contact Name: "

 name = STDIN.gets.chomp

 print "Enter Contact No: "

 num = STDIN.gets.chomp

 s = name+" "+num

 row1 = s.split

 writer << row1

 print "Do you want to add more ? (y/n): "

 ans = STDIN.gets.chomp

end while ans != "n"

writer.close

file = File.new('mycsvfile.csv')

lines = file.readlines

parsed = CSV.parse(lines.to_s)

p parsed

puts ""

puts "Details of Contacts stored are as follows..."

puts ""

puts "-------------------------------"

puts "Contact Name | Contact No"

puts "-------------------------------"

puts ""

CSV.open('mycsvfile.csv','r') do |row|

 puts row[0] + " | " + row[1] 

 puts ""
end

Listing 7 shows the output:
Listing 7. Processing CSV files: Creating and parsing a CSV file output

Enter Contact Name: Santhosh

Enter Contact No: 989898

Do you want to add more ? (y/n): y

Enter Contact Name: Sandy

Enter Contact No: 98988

Do you want to add more ? (y/n): n

Details of Contacts stored are as follows...

---------------------------------
Contact Name | Contact No
---------------------------------

Santhosh | 989898

Let's take a quick look at this example.

First, it contains the csv module (require 'csv ').

To create a new CSV file mycsvfile.csv, open it by calling CSV. open. This returns a writer object.

In this example, a CSV file is created, which contains a simple contact list, storing the contact name and phone number. In the cycle, the user is required to enter the contact name and phone number. The name and phone number are connected into a string and then split into arrays containing two strings. This array is passed to the writer object to write the CSV file. In this way, a pair of CSV values are stored as a row in the file.

After the cycle ends, the task is completed. Close the writer and save the data in the file.

The next step is to parse the created CSV file.

One way to open and parse the File is to use the new CSV File name to create a new File object.

Call the readlines method to read all rows in the file into an array named lines.

By calling lines. to_s converts the lines array to a String object, and then passes the String object to CSV. parse method. This method parses CSV data and returns its content to an array containing arrays.

The following describes another method for opening and parsing the file. Use CSV. open to open the file again in Read mode. This returns an array of rows. Print each row in a certain format to display contact details. Each row corresponds to the row in the file.

As you can see, Ruby provides a powerful module to process CSV files and data.

Process XML files

For XML files, Ruby provides a powerful built-in library named REXML. This library can be used to read and parse XML documents.

View the following XML file and try to parse it using Ruby and REXML.

The following is a simple XML file that lists the content in a typical shopping cart of an online shopping center. It has the following elements:

  • Cart -- root element
  • User-purchased user
  • Item -- item that the user adds to the shopping cart
  • Id, price, and quantity -- sub-elements of the project

Listing 8 shows the XML structure:
Listing 8. processing XML files: Example XML files

<cart id="userid">

<item code="item-id">

 <price>

 <price/unit>

 </price>

 <qty>

 <number-of-units>

 </qty>

</item>

</cart>

Obtain the sample XML file from the download part. Now, load the XML file and use REXML to parse the file tree.
Listing 9. processing XML files: parsing XML files

require 'rexml/document'

include REXML

file = File.new('shoppingcart.xml')

doc = Document.new(file)

root = doc.root

puts ""

puts "Hello, #{root.attributes['id']}, Find below the bill generated for your purchase..."

puts ""

sumtotal = 0

puts "-----------------------------------------------------------------------"

puts "Item\t\tQuantity\t\tPrice/unit\t\tTotal"

puts "-----------------------------------------------------------------------"

root.each_element('//item') { |item| 

code = item.attributes['code']

qty = item.elements["qty"].text.split(' ')

price = item.elements["price"].text.split(' ')

total = item.elements["price"].text.to_i * item.elements["qty"].text.to_i

puts "#[code]\t\t #{qty}\t\t   #{price}\t\t   #{total}"

puts ""

sumtotal += total

}

puts "-----------------------------------------------------------------------"

puts "\t\t\t\t\t\t  Sum total : " + sumtotal.to_s

puts "-----------------------------------------------------------------------"

Listing 10 shows the output.
Listing 10. processing XML files: parsing XML file output

Hello, santhosh, Find below the bill generated for your purchase...

-------------------------------------------------------------------------
Item   Quantity    Price/unit    Total
-------------------------------------------------------------------------
CS001    2       100      200

CS002    5       200      1000

CS003    3       500      1500

CS004    5       150      750

-------------------------------------------------------------------------
               Sum total : 3450
--------------------------------------------------------------------------

Listing 9 parses the XML file of the shopping cart and generates a bill, which shows the total project and total purchases (see listing 10 ).

The following describes the procedure.

First, it contains the REXML module of Ruby, which has the method for parsing XML files.

Open the shoppingcart. xml file and create a Document object from the file. The object contains the parsed XML file.

Allocate the root of the document to the root of the element object. This will point to the cart tag in the XML file.

Each element object has an attribute object, which is a hash table of the element property. The attribute name is used as the key name and the attribute value as the key value. Here, root. attributes ['id'] provides the value of the id attribute of the root element (userid in this example ).

Next, initialize sumtotals to 0 and print the header.

Each element object also has an object elements, which has the each and [] methods for accessing child elements. This object traverses all the child elements with the item name (specified by the XPath expression // item. Each element also has a text attribute that contains the text value of the element.

Next, obtain the code attribute of the item element and the text value of the price and qty elements, and then calculate the Total project ). Print the details to the Bill and add the total projects to the total purchases (Sum total ).

Finally, print the total purchases.

This example shows how easy it is to parse XML files using REXML and Ruby! Similarly, it is easy to generate XML files in the running process to add and delete elements and their attributes.
Listing 11. processing XML files: generating XML files

doc = Document.new

doc.add_element("cart1", {"id" => "user2"})

cart = doc.root.elements[1]

item = Element.new("item")

item.add_element("price")

item.elements["price"].text = "100"

item.add_element("qty")

item.elements["qty"].text = "4"

cart .elements << item

The code in listing 11 creates an XML structure by creating a cart element, an item element, and its child elements, and then fills these child elements with values and adds them to the Document root.

Similarly, to delete Elements and attributes, use the delete_element and delete_attribute methods of the Elements object.

The method in the preceding example is called tree parsing ). Another XML document parsing method is stream parsing ). "Stream resolution" is faster than "tree resolution" and can be used to require fast resolution. "Stream Parsing" is based on events and uses listeners. When the parsing stream encounters a tag, it calls the listener and performs processing.

Listing 12 shows an example:
Listing 12. processing XML files: stream Parsing

require 'rexml/document'

require 'rexml/streamlistener'

include REXML

class Listener

 include StreamListener

 def tag_start(name, attributes)

 puts "Start #{name}"

 end

 def tag_end(name)

 puts "End #{name}"

 end

end

listener = Listener.new

parser = Parsers::StreamParser.new(File.new("shoppingcart.xml"), listener)

parser.parse

Output in listing 13:
Listing 13. processing XML files: stream parsing output

Start cart

Start item

Start price

End price

Start qty

End qty

End item

Start item

Start price

End price

Start qty

End qty

End item

Start item

Start price

End price

Start qty

End qty

End item

Start item

Start price

End price

Start qty

End qty

End item

End cart

In this way, the combination of REXML and Ruby provides a very effective and intuitive way to process and operate XML data.

Conclusion

Ruby has a good set of built-in and external libraries that support fast, powerful, and effective text processing. You can use this feature to simplify and improve various text data processing tasks that may be encountered. This article is just a brief introduction to Ruby's text processing function. You can learn more about this function.

Without a doubt, Ruby is a powerful tool you need.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.