Hadoop for report data sources

Source: Internet
Author: User

The data source types supported by the collection report, in addition to the traditional relational database, also support: txt text, Excel, JSON, HTTP, Hadoop, MongoDB, and so on.

For Hadoop, the collection report provides direct access to hive, as well as reading data from HDFs to complete data calculation and report development. The access to hive is the same as using JDBC as a normal database, which is not discussed here. The following is an example of the process of direct access to HDFS.

Report Description

Stock transactions are stored as text in HDFs on a monthly basis, with the file name Stock_record_yyyymm.txt (such as Stock_record_200901.txt), which includes the stock code, the trading date, and the closing price. Query and calculate the closing average price of each stock according to the specified month for stock trend analysis. The text reads as follows:

Code tradingdate Price

120089 2009-01-0100:00:00 50.24

120123 2009-01-0100:00:00 10.35

120136 2009-01-0100:00:00 43.37

120141 2009-01-0100:00:00 41.86

120170 2009-01-0100:00:00 194.63

Unlike the general reporting tools, the aggregate report can directly access the read calculation of the HDFS completion data, the following is the implementation process.

Copy related jar packages

you need to load Hadoop core packages and configuration packages when accessing HDFS using a set report, such as: Commons-configuration-1.6.jar, Commons-lang-2.4.jar, Hadoop-core-1.0.4.jar (Hadoop1.0.4). Copy the above jar to the [Collection report installation directory]\report\lib and the "Collector installation directory]\esproc (if you need to edit and debug the script using the Collector editor).

writing a calculation script

Write a script (STOCKFROMHDFSTXT.DFX) using the collection Editor, complete the file read-in and data filtering for HDFS, and return the result set for the report. Because you want to receive parameters for report delivery, you first set the script script parameters.


Edit the script.

A1: Use the Hdfsfile function to create HDFs file cursors based on file path and specified parameters;

A2: Summary closing price and quantity for stock code;

A3: Calculates the average close price for each stock, returning a result set for the report through A4.

edit a report template

Create a new report template using the Set Report Designer and set the parameters:

Set the dataset, use the "collector" dataset type, and invoke the edited script file (stockfromhdfstxt.dfx).

Where the DFX file path can be either an absolute path or a relative path, relative paths are configured in the relative option of the DFX home directory.

Edit the report expression to complete the report production directly using the result set returned by the collection script.

It is important to note that when previewing in Report Designer, you will need to copy the Hadoop-related jar package to the []\report\lib] under the Set Report installation directory.

In addition to the text files that directly access HDFs, the collection report can read the compressed files in HDFs. The Hdfsfile function is still used, and the extension determines the decompression method. For example, to access the gzip file you can write:

=hdfsfile ("Hdfs://192.168.1.210:9000/usr/local/hadoop/data/stock_record_" +d_date+ ". GZ", "GBK"), Simply include the extension in the URL.

As can be seen through the above implementation, the use of the collector script can easily complete the calculation of the HDFs file read, and the external set of the script has a visual editing debugging environment, the edited script can also be reused (by other reports or programs called). However, if the script has been debugged and does not need to be reused, it would be cumbersome to maintain the consistency of the two files (both the set and report templates), which makes it easier to use the script dataset directly for the set report.

In the script dataset, you can step through the script to complete the calculation tasks, the syntax is consistent with the concentrator, and you can use the report-defined data source directly (not covered in this example) and parameters. This can be done using a script dataset:

1. Click the "Add" button in the DataSet Settings window to pop up the dataset Type dialog and select "Script Data Set";

2. Write the script in the Popup Script dataset editing window;

Use the parameter arg1 of the report definition directly.

3. Report parameter settings and report expressions, consistent with the use of the collector dataset, are no longer mentioned.

When you deploy a report, you also need to put the relevant jar of Hadoop into your app classpath, such as the web-inf\lib of your app.


Collection Report Download: http://www.raqsoft.com.cn/?p=208.


Hadoop for report data sources

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.