From a String
From a File
From the Internet
Parse Options
Encoding
Original: Parsing an html/xml DocumentParsing a html/xml document reading from a stringWe ' ve tried to make this easy on. really! We ' re here for make your life easier.1 html_doc = nokogiri::html ("")2 xml_doc = Nokogiri::xml (" ")Variables html_doc and xml_doc is Nokogiri
Disclaimer: every time we 've ve run a piece about benchmarking or performance numbers on Ruby inside, a retraction or significant correction has come out shortly thereafter. benchmarking is hard, uugly, and quite often wrong or biased. it is not useless, however, but if you depend on the results in any way, you shoshould certainly try to do your own benchmarking to confirm. Last week, libxml-Ruby 1 was released-a significant achievement since it had been under development for seven years. ISusp
First, the basic grammar1. Get the Nokogiri object directly as a string:
Html_doc = nokogiri::html ("
Html_doc and Xml_doc here are Nokogiri files.
2. Nokogiri objects can also be obtained through file handles:
f = File.Open ("Blossom.xml")
doc = Nokogiri
# Define the html url url = "http://mp3.baidu.com/" # Get the nokogiri document DOC = nokogiri: HTML (open (URL ))
In centos, the above sentence cannot be parsed, and n writing methods have been tried.
Later I found it was a nokogiri installation problem. If centos5.5 is used, the latest version of libxml2 is still too low. centos is so easy to use and stable
Installation
For Ubuntu, you need to install LIBXML2, libxslt these two components:
$ apt-get Install LIBXML2 libxslt
Then you can:
$ gem Install Nokogiri
Available optionsNokogiri provides some options for parsing files, which are commonly used:
Noblanks: Delete Empty node
Noent: Alternative Entities
NoError: Hide Error Reporting
STRICT: Accurate parsing, throwing an error when parsing to a file exceptio
' and if-doesn ' t help, filean issue with the output of ' brew--config ': HTTPS://GITHUB.COM/MXC l/homebrew/issuesthanks!Executing bundles。。。 Gem::ext::builderror:error:failed to build gem native extension. /users/wuxj/.rvm/rubies/ruby-2.1.2/bin/ruby-r./siteconf20140830-1163-1hd6znq.rb extconf.rb Building Nokogiri using Packaged libraries.-----Libiconv is missing. Please visit http://nokogiri.org/tutorials/installing_nokogiri.html to help with
... yesBuilding Nokogiri using packaged libraries.Checking for Gzdopen () In-lz ... yesChecking for iconv ... yes************************************************************************IMPORTANT NOTICE:
Building Nokogiri with a packaged version of libxml2-2.9.2With the following patches applied:-0001-revert-missing-initialization-for-the-catalog-module.patch-0002-fix-missing-entities-after-cve-2014-3660-f
a variety of crawl solutions. Some of them convert HTML to other formats, such as JSON, which makes it easier to extract what you want. Other solutions read HTML, you can define content as a function of an HTML hierarchy in which data is tagged. One such solution is Nokogiri, which supports parsing HTML and XML documents using the Ruby language. Other open source crawlers include pjscrape for JavaScript and beautiful Soup for Python. Pjscrape impleme
prompts to install the required components.
Gem install rack-cache-v '1. 2'
Then bundle install again, and an error is returned:
An error occurred while installing nokogiri (1.6.6.2), and Bundler cannot continue.Make sure that 'gem install nokogiri-V' 1. 6.6.2 ''succeeds before bundling.
Follow the prompts in gem install nokogiri-v '1. 6.6.2 'and run bundle ins
These two days need to put the word in the word to local, Youdao itself also provides such a function, but very single, can only export word spelling, phonetic transcription, Chinese semantics, not easy to learn and memorize words. So using Ruby to write this crawler, the whole process of the most important to solve is three questions:
URL redirection
Analog Login Youdao Dictionary
Management of cookies in HTTP access
HTTP redirectionWe open to the first page of the diction
search to use Open-uri and Nokogiri.First look at the Open-uri, which is a ruby built-in feature. To use Open-uri you only need to add require ' Open-uri ' to your code, and it's easy to use.1Open"http://www.ruby-lang.org/en") {|f|2F.each_line {|line|P Line}3P F.base_uri#4P F.content_type#"Text/html"5P F.charset#"Iso-8859-1"6P f.content_encoding# []7P f.last_modified#Thu Dec 02:45:02 UTC 20028}The Open function provides a file object that has been expanded to include meta information for some W
ERROR: Error: ERROR installing rails, invalid turails
Building native extensions. This could take a while...ERROR: Error installing rails: ERROR: Failed to build gem native extension. current directory: /var/lib/gems/2.3.0/gems/nokogiri-1.8.2/ext/nokogiri/usr/bin/ruby2.3 -r ./siteconf20180213-11055-1aanqyp.rb extconf.rbchecking if the C compiler accepts ... yesBuilding
Building native extensions.
This is could take a while ...
Error:error installing rails:ERROR:Failed to build gem native extension. Current directory:/var/lib/gems/2.3.0/gems/nokogiri-1.8.2/ext/nokogiri/usr/bin/ruby2.3-r./ SITECONF20180213-11055-1AANQYP.RB extconf.rb Checking if the C compiler accepts ... yes building Nokogiri using packaged L
Ibraries. Usin
Installation Plugin REDMINE_DMSF Summary
Background
Environment
Linux system version
Redmine version
DMSF plugin Version
installation process
Installing Redmine
Installing the DMSF Plugin
Using full-text search in Redmine
Principles behind DMSF
Basic knowledge of Xapian and Omega
Create/Query Index database manually
DMSF Configuration Modifications
Using full-Text se
#本程序功能: Download the Web image to local and save it by number. #使用Ruby1.9.3 is written under WINXP_SP3. Require ' Nokogiri ' require ' Open-uri ' #以下 parse the page based on the URL. page = nokogiri::html (open ("http://www.169bb.com/News/2014-12-20/093288.htm")) Arrlen = page.css (' img '). Lengthmypics = Array.new (Arrlen) #以下 the parsed image address into the mypics array. For x in 0...arrlenmypics[x] =
, configuration setting, log files, internet messaging, and filtering. Supported script scripts are known on the website, including python, Ruby, Java, PHP, Perl, and JavaScript.
require ‘yaml’ps2 = YAML.load_file(‘example.yaml’)ps2.each do |it| puts it.inspectendJSON
Activesupport JSON is available in rails. Its usage is as follows:
ActiveSupport::JSON.encode( [ {:a => 1 , :b => 2 } , "c", "d" ] )=> "[{\"a\":1,\"b\":2},\"c\",\"d\"]" ActiveSupport::JSON.decode( "[{\"a\":1,\"b\":2},\"c\",\"d\"]"
This article to share is the individual use Ruby to write the crawl page image code, very simple and practical, the need for small partners can refer to.
Some time ago to see a lot of people write the next sister script, I also write a
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Module commonhelper require ' Nokogiri ' require ' Open-uri ' def down_load_xmz site_url = ' http://www.xxx.com
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.