In order to adapt different items to different interested attribute's analytic storage, the data storage structure adopts the vertical attribute list method, that is, a URL page multiple attributes store multiple records, and corresponding storage according to text,html,data,num several typical types.
Create a Nutch database of the UTF-8 character set and execute the table initialization script, referencing the DDL:
CREATE TABLE ' Crawl_data ' (
' URL ' varchar (255) Not NULL,
' Code ' varchar (255) is not NULL,
' Name ' varchar (255) DEFAULT NULL,
' Category ' varchar (255) DEFAULT NULL,
' Order_index ' int (255) is DEFAULT NULL,
' Fetch_time ' datetime not NULL,
' Text_value ' text, ' Html_value ' text,
' Date_value ' datetime DEFAULT NULL,
' Num_value ' decimal (18,2) DEFAULT NULL
) Engine=innodb DEFAULT Charset=utf8;
650) this.width=650; "src=" Http://img.mp.itc.cn/upload/20160706/e4d62fb80ced45759643da9ee15a08f3_th.jpg "style=" border:0px;margin:0px;padding:0px;font-size:0px; "/>
In order to generally get multiple properties of the same URL in the business system or a friendly query display, refer to the following MySQL script implementation to convert the Vertical attribute List property to landscape Multi-column display mode. The relevant reference scripts on the Internet are mostly based on the vertical-to-horizontal SQL script of the pattern of sum summarization of numeric fields, the requirement of this project is a single string-based information conversion, after a period of groping and finally find based on Group_concat can be achieved:
SELECT URL, Fetch_time,
Group_concat (case if code = ' domain ' then text_value ELSE null END) as ' domain ',
Group_concat (case if code = ' name ' then text_value ELSE null END) as ' name ',
Group_concat (case if code = ' Brand ' then text_value ELSE null END) as ' brand ',
Group_concat (case if code = ' category ' then Text_value ELSE null END) as ' category ',
Group_concat (case if code = ' Purpose ' then text_value ELSE null END) as ' purpose ',
Group_concat (case when code = "Price" then num_value ELSE null END) as ' price ',
Group_concat (case if code = ' Refprice ' then num_value ELSE null END) as ' Refprice ',
Group_concat (case if code = ' Primaryimage ' then text_value ELSE null END) as ' primaryimage ' from the Crawl_data GROUP by URL , Fetch_time
650) this.width=650; "src=" Http://img.mp.itc.cn/upload/20160706/1a8a96424466479180c3333a69d7b528_th.jpg "style=" border:0px;margin:0px;padding:0px;font-size:0px; "/>
This article from the "11247808" blog, reproduced please contact the author!
Parse data store MySQL