Tutorial on using Ruby to process text _ruby topics

Source: Internet
Author: User
Tags first string stdin


Like Perl and Python, Ruby has excellent functionality and is a powerful text-processing language. This article gives a brief introduction to Ruby's text-processing capabilities and how to use Ruby to effectively process text data in different formats, whether it is CSV or XML data.
Ruby string
Common Abbreviations


    • CSV: comma-separated values
    • Rexml:ruby Electric XML
    • XML: Extensible Markup Language


A String in Ruby is a powerful way to accommodate, compare, and manipulate text data. In Ruby, a String is a class that can be instantiated by calling String::new or assigning a literal value to it.



When assigning values to Strings, you can use single quotes (') or double quotes (") to surround values. Single and double quotes have several differences when assigning values to Strings. Double quotes support escape sequences use a forward backslash (\) and support the use of the #{} operator in a string to evaluate an expression. The string quoted in single quotes is straightforward text.



Listing 1 is an example.
Listing 1. Working with Ruby strings: defining strings


message = ' Heal of the world ... '

puts message

message1 = "Take home Rs #{100*3/2}"

puts Message1

Output:

#/string1.rb

# Heal the world ...

# Take Home Rs 150


Here, the first string is defined with a pair of single quotes, and the second string is defined with a pair of double quotes. In the second string, the expression in #{} is evaluated before it is displayed.



Another useful string definition method is commonly used for multiline string definitions.



From now on, I'll use the interactive Ruby console irb>> for instructions. Your Ruby installation should also install the console. If it is not installed, it is recommended that you obtain the IRB Ruby Gem and install it. The ruby console is a very useful tool to learn about Ruby and its modules. After installation, you can use the irb>> command to run it.
Listing 2. Working with Ruby strings: defining Multiple strings


irb>> str = >>eof

irb>> "Hello World

irb>>" What do you feel?

Irb>> "How r u?

" irb>> EOF

"Hello, world\nhow do you feel?\nhow r u?\n"

irb>> puts str

Hello
Feel?
how r u?


In Listing 2, all content in >>eof and EOF is treated as part of the string, including the \ n (newline) character.



The Ruby String class has a powerful set of methods for manipulating and manipulating data stored in them. The examples in listings 3, 4, and 5 show some of the methods.
Listing 3. Working with Ruby strings: Connection strings


irb>> str = "The World for a horse" # String initialized with a value of the world for

a horse

irb>> str* 2      # Multiplying with an integer returns a 
           # new string containing the ' many times # of the ' old
           string.

The world for a horsethe world for a horse

irb>> str + "who said it?"  # Concatenation of strings using the ' + ' operator the world for

a horse who said it?

Irb>> str<< "Is it?" "# concatenation using the ' << ' operator ' the world for

a horse is it?


Extract substrings and manipulate multiple parts of a string
Listing 4. Working with Ruby strings: extracting and manipulating


irb>> str[0] # The '[]' operator can be used to extract substrings, just 
      # like accessing entries in an array.
      # The index starts from 0.
84 # A single index returns the ascii value
      # of the character at that position

irb>> str[0,5] # a range can be specified as a pair. The first is the starting 
      # index , second is the length of the substring from the
      # starting index.

The w

irb>> str[16,5]="Ferrari" # The same '[]' operator can be used
         # to replace substrings in a string
         # by using the assignment like '[]='
irb>>str

The world for a Ferrari

Irb>> str[10..22] # The range can also be specified using [x1..x2] 

for a Ferrari

irb>> str[" Ferrari"]=" horse" # A substring can be specified to be replaced by a new
        # string. Ruby strings are intelligent enough to adjust the
        # size of the string to make up for the replacement string.

irb>> s

The world for a horse

irb>> s.split  # Split, splits the string based on the given delimiter
        # default is a whitespace, returning an array of strings.

["The", "world", "for", "a", "horse"]

irb>> s.each(' ') { |str| p str.chomp(' ') }

        # each , is a way of block processing the
   # string splitting it on a record separator
   # Here, I use chomp() to cut off the trailing space

"The"
"world"
"for"
"a"
"horse"


Ruby string classes can also use many other practical methods that can change case, get string lengths, delete record delimiters, scan strings, encrypt, decrypt, and so on. Another useful method is freeze, which makes the string not modifiable. After the method is invoked on String str (str.freeze), STR cannot be modified.



Ruby also has some methods called "destructor". A method that ends with an exclamation point (!) will permanently modify the string. The general method (with no exclamation at the end) modifies and returns a copy of the string that called them. The method with an exclamation point modifies the string that calls them directly.
Listing 5. Working with Ruby strings: permanently modifying strings


irb>> str = "Hello, world"

Hello, World

irb>> str.upcase

Hello, world

irb>>str   # str, remains as is.

Hello, World

irb>> str.upcase!  # here, Str gets modified by the '! ' in the ' End of 
        # upcase.
Hello, world

irb>> str

Hello, the world


In Listing 5, the string in STR is represented by the upcase! Method is modified, but the UpCase method returns only a copy of the string that has been case modified. These! methods are sometimes useful.



Ruby Strings is a very powerful feature. Once the data has been captured into the Strings, you can easily and efficiently process the data in any number of ways.



Working with CSV files



A CSV file is a very common way of representing tabular data, which is typically used as a format for data exported from a spreadsheet, such as a contact list with more information.



Ruby has a powerful library that can be used to process these files. CSV is the Ruby module that handles CSV files, and it has the means to create, read, and parse CSV files.



Listing 6 shows how to create a CSV file and use the Ruby CSV module to parse the file.
Listing 6. Working with CSV files: Creating and parsing a CSV file


Require ' csv '

writer = csv.open (' mycsvfile.csv ', ' W ')

begin

 print ' Enter contact name: '

 name = STDIN.gets.chomp

 print "Enter contact No:"

 num = STDIN.gets.chomp

 s = name+ "" +num

 row1 =

 S.split WR ITER << row1

 print "Do your want to add?" (y/n): "

 ans = STDIN.gets.chomp End while

ans!=" n "

writer.close

file = file.new (' mycsvfile.csv ') C15/>lines = file.readlines

parsed = Csv.parse (lines.to_s)

p parsed

puts ""

puts "Details of Contacts stored are as follows ... "puts"

"puts"

-------------------------------"

puts" contact Name | Contact No "

puts"-------------------------------"

puts" "

csv.open" (' Mycsvfile.csv ', ' r ') do |row|

 Puts Row[0] + "|" + row[1] 

 puts ""
end


Listing 7 shows the output:
Listing 7. Process CSV file: Create and parse a CSV file output


Enter Contact Name:santhosh

Enter contacts no:989898

do you want to add? (y/n): Y

Enter contact Name:sandy

Enter contacts no:98988

Do your want to add more? (y/n): n

Details of Contacts stored are as follows ...

---------------------------------Contact
Name | Contact No
---------------------------------

santhosh | 989898

Sandy | 98988


Let's take a quick look at this example.



First, include the CSV module (require ' CSV ').



To create a new CSV file mycsvfile.csv, use the Csv.open () call to open it. This returns a writer (writer) object.



This example creates a CSV file that contains a simple list of contacts that stores the contact's name and its phone number. In the loop, the user is asked to enter the contact name and phone number. The name and phone number are concatenated into a string and then split into an array of two strings. This array is passed to the writer object to write to the CSV file. In this way, a pair of CSV values is stored as a row in the file.



After the loop is over, the task is completed. Now close the writer and the data in the file is saved.



The next step is to resolve the created CSV file.



One way to open and resolve the file is to create a new file object with the new CSV file name.



Call the ReadLines method to read all the rows in the file into an array named lines.



Converts a lines array to a string object by calling lines.to_s, and then passes the string object to the Csv.parse method, which parses the CSV data and returns its contents as an array containing the array.



Here's another way to open and parse the file. Open the file again using the Csv.open call in read mode. This returns an array of rows. Print each line in a format to display contact details. Each row here corresponds to the row in the file.



As you can see, Ruby provides a powerful module for processing CSV files and data.



Working with XML files



For XML files, Ruby provides a powerful built-in library called Rexml. This library can be used to read and parse XML documents.



View the following XML file and try to parse it with Ruby and Rexml.



Here is a simple XML file that lists the contents of a typical shopping cart in an online shopping center. It has the following elements:


    • cart--root Element
    • user--Purchase User
    • item--items that users add to a shopping cart
    • ID, price, and child elements of the quantity--project


Listing 8 shows the structure of this XML:
Listing 8. Working with XML files: Sample XML files


<cart id= "userid" >

<item code= "Item-id" >

 <price>

 <price/unit>

 </price >

 <qty>

 <number-of-units>

 </qty>

</item>

</cart>


Get this sample XML file from the download section. Now, load the XML file and parse the file tree using Rexml.
Listing 9. Working with XML files: Parsing XML files


require 'rexml/document'

include REXML

file = File.new('shoppingcart.xml')

doc = Document.new(file)

root = doc.root

puts ""

puts "Hello, #{root.attributes['id']}, Find below the bill generated for your purchase..."

puts ""

sumtotal = 0

puts "-----------------------------------------------------------------------"

puts "Item\t\tQuantity\t\tPrice/unit\t\tTotal"

puts "-----------------------------------------------------------------------"

root.each_element('//item') { |item| 

code = item.attributes['code']

qty = item.elements["qty"].text.split(' ')

price = item.elements["price"].text.split(' ')

total = item.elements["price"].text.to_i * item.elements["qty"].text.to_i

puts "#[code]\t\t #{qty}\t\t   #{price}\t\t   #{total}"

puts ""

sumtotal += total

}

puts "-----------------------------------------------------------------------"

puts "\t\t\t\t\t\t  Sum total : " + sumtotal.to_s

puts "-----------------------------------------------------------------------"


Listing 10 shows the output.
Listing 10. Working with XML files: Parsing xml file output


Hello, Santhosh, find below the bill generated for your purchase

... -------------------------------------------------------------------------
Item   Quantity    price/unit    Total
-------------------------------------------------------------------------
CS001    2

CS002    5      1000

CS003    3      1500

CS004    5      750

-------------------------------------------------------------------------
               Sum total:3450
--------------------------------------------------------------------------


Listing 9 parses the shopping cart XML file and generates a bill that shows total project totals and purchase totals (see listing 10).



Below we describe the operation process concretely.



First, the Rexml module of Ruby, which has a method for parsing XML files.



Opens the Shoppingcart.xml file and creates a document object from the file that contains the parsed XML file.



Assigns the root of the document to the element object root. This points to the cart tag in the XML file.



Each element object has an attribute object that is a hash table of element attributes, where the property name is the key name and the property value is the key value. Here, root.attributes[' ID ' will provide the value of the id attribute of the root element (in this case, UserID).



Below, initialize the sumtotals to 0 and print the headers.



Each element object also has an object elements, which owns each and the [] method to access the child elements. This object traverses all child elements of the root element with the item name (specified through the XPath expression//item). Each element also has an attribute text that holds the text value of the element.



Next, get the Code property of the item element and the text value of the price and Qty elements, and then calculate the total for the item. Prints the details to the bill and adds the project total to the purchase total (Sum totals).



Finally, print the purchase totals.



This example shows how easy it is to parse XML files using Rexml and Ruby! Similarly, it is easy to generate XML files in run, add and delete elements, and their properties.
Listing 11. Working with XML files: Generating XML files


Doc = document.new

doc.add_element ("Cart1", {"id" => "user2"})

cart = doc.root.elements[1]

item = Element.new ("item")

item.add_element ("price")

item.elements["price"].text = "item.add_element"

(" Qty ")

item.elements[" qty "].text =" 4 "

cart. Elements << Item


The code in Listing 11 creates an XML structure by creating a cart element, an item element, and its child elements, and then populates the child elements with values and adds them to the Document root.



Similarly, to delete elements and attributes, use the Delete_element and Delete_attribute methods of the Elements object.



The method in the previous example is called Tree parsing (TreeView parsing). Another method of parsing XML documents is called Flow parsing (stream parsing). "Flow resolution" is faster than "tree parsing" and can be used to require quick parsing. "Stream resolution" is event-based, and it uses listeners. When a parse stream encounters a token, it invokes the listener and executes the processing.



Listing 12 shows an example:
Listing 12. Working with XML files: Stream parsing


Require ' rexml/document '

require ' Rexml/streamlistener '

include Rexml

class Listener

 include Streamlistener

 def tag_start (name, attributes)

 puts "start #{name}"

 end

 def tag_end (name)

 puts " End #{name} ' end end

listener = listener.new

parser = parsers::streamparser.new (File.new (" Shoppingcart.xml "), listener)

Parser.parse


Listing 13 shows the output:
Listing 13. Processing XML Files: Stream parsing output


Start Cart

Start Item start price, start

qty End

Qty End

item

start

item Start price, price,

start qty End

Qty End Item Start

Item

start

Start Qty

End Qty End Item Start Item start price end price

start qty End

Qty End

Item

End Cart


In this way, Rexml and Ruby combine to provide you with a powerful way to process and manipulate XML data very efficiently and intuitively.



Conclusion



Ruby has a good set of built-in libraries and external libraries that support fast, powerful, and efficient text processing. You can use this feature to simplify and improve the various text-processing work that you might encounter. This article is just a brief introduction to Ruby's text-processing capabilities, and you can learn more about this feature.



Without a doubt, Ruby is a powerful tool you need.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.