Hadoop for diversified data sources of rundry computing reports

Source: Internet
Author: User

Diverse data sources are becoming more and more common in report Development. The effective support of the collection and computing reports for diverse data sources makes the development of such reports very simple, currently, in addition to traditional relational databases, the data source types supported by computing reports include TXT text, Excel, JSON, HTTP, Hadoop, and mongodb.

For Hadoop, you can directly access Hive or read data from HDFS to complete data computing and report development. The access to Hive is the same as that of common databases, so we will not go into details here. The following example shows the process of direct access to HDFS.

Report Description

Token), including the stock code, trading date, and closing price. Query and calculate the average closing price of each stock based on the specified month to analyze the stock price trend. The text content is as follows:

Code tradingDate price

120089 2009-01-0100:00:00 50.24

120123 2009-01-0100:00:00 10.35

120136 2009-01-0100:00:00 43.37

120141 2009-01-0100:00:00 41.86

120170 2009-01-0100:00:00 194.63

Unlike general report tools, the centralized computing report can directly access HDFS to read and compute data. The following is an implementation process.

Copy related jar packages

Hadoop Core packages and configuration packages, such as commons-configuration-1.6.jar, commons-lang-2.4.jar, hadoop-core-1.0.4.jar (Hadoop1.0.4), need to be loaded when you access HDFS using a computing report ). Copy the preceding jar files to the [Computing report installation directory] \ report \ lib and [computing set installation directory] \ esproc (if you need to edit and debug the script using the computing set editor).

Write computing scripts

Use the set computing editor to write scripts (stockFromHdfsTxt. dfx) to complete HDFS file reading and data filtering, and return the result set for the report. To receive the parameters passed in the report, set the script parameters first.

650) this. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/4D/E5/wKiom1RcLq7iEUREAACMUNGpvMo231.jpg "style =" float: none; "title =" report5_multisource_hadoop_1.jpg "alt =" wKiom1RcLq7iEUREAACMUNGpvMo231.jpg "/>

Edit script content:

650) this. width = 650; "src =" http://s3.51cto.com/wyfs02/M02/4D/E4/wKioL1RcLxSjpOIxAADkbij5zxE504.jpg "style =" float: none; "title =" report5_multisource_hadoop_2.jpg "alt =" wKioL1RcLxSjpOIxAADkbij5zxE504.jpg "/

A1: Use the HDFS file function to create an HDFS file cursor based on the file path and specified parameters;

A2: summarizes the closing price and quantity for the stock code;

A3: calculates the average closing price of each stock and returns the result set in the report using A4.

Edit a report template

Use the report designer to create a report template and set parameters:

650) this. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/4D/E5/wKiom1RcLq-xf55SAACHf-Gd45o740.jpg "style =" float: none; "title =" report5_multisource_hadoop_3.jpg "alt =" wKiom1RcLq-xf55SAACHf-Gd45o740.jpg "/>

Set the dataset, use the dataset type of the set calculator, and call the edited script file (stockFromHdfsTxt. dfx ).

650) this. width = 650; "src =" http://s3.51cto.com/wyfs02/M02/4D/E5/wKiom1RcLrCD4rPFAAEFW45RJZM764.jpg "style =" float: none; "title =" report5_multisource_hadoop_4.jpg "alt =" wkiom1rclrcd4rpfaaefw451_zm764.jpg "/

The dfx file path can be either an absolute or relative path, and the relative path is the dfx home directory configured in the relative options.

 

Edit the report expression and directly use the result set returned by the set computing script to complete report creation.

650) this. width = 650; "src =" http://s3.51cto.com/wyfs02/M01/4D/E4/wKioL1RcLxWgxucAAABVEGwatBE322.jpg "style =" float: none; "title =" report5_multisource_hadoop_5.jpg "alt =" wKioL1RcLxWgxucAAABVEGwatBE322.jpg "/>

It is worth noting that when previewing in the report designer, you need to copy the Hadoop jar package to the [installation directory of the computing report] \ report \ lib.

In addition to directly accessing HDFS text files, computation reports can also read compressed files in HDFS. The hdfsfile function is still used, and the decompression method is determined by the extension. For example, to access a Gzip file, you can write it as follows:

= Hdfsfile ("hdfs: // 192.168.1.210: 9000/usr/local/hadoop/data/stock_record _" + d_date + ". gz "," GBK "), you only need to include the extension in the url.

        

Through the above implementation, we can see that using the set computing script can easily read and compute HDFS files, and the external set computing script has a visual editing and debugging environment, the edited script can also be reused (called by other reports or programs ). However, if the script has been debugged and does not need to be reused, it is troublesome to maintain consistency between the two files (the set computing script and report template, in this case, it is easier to directly use the script dataset of the set computing report.

In the script dataset, you can write a script step by step to complete the computing task. The syntax is the same as that of the set calculator. You can also directly use the data source defined in the report (not included in this example) and parameters. You can use the script dataset as follows:

1. Click "add" in the dataset settings window. The dataset type dialog box is displayed. Select "script dataset ".

2. Compile the script in the pop-up script dataset editing window.

650) this. width = 650; "src =" http://s3.51cto.com/wyfs02/M00/4D/E4/wKioL1RcLxagwljZAAFBzSL5EmQ912.jpg "style =" float: none; "title =" report5_multisource_hadoop_6.jpg "alt =" wKioL1RcLxagwljZAAFBzSL5EmQ912.jpg "/>

Directly use the report-defined parameter arg1

3. Report parameter settings and report expressions, which are consistent with those of datasets using the cube.

 

When deploying a report, you also need to put the Hadoop-related jar file under the application classpath, for example, the application's web-inf \ lib.


This article is from the "high performance report data computing" blog, please be sure to keep this source http://report5.blog.51cto.com/8028595/1573935

Hadoop for diversified data sources of rundry computing reports

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.