MaxCompute Studio improves the UDF and MapReduce development experience.

Source: Internet
Author: User

MaxCompute Studio improves the UDF and MapReduce development experience.

UDF stands for User-Defined Function. MaxCompute provides many built-in functions to meet your computing needs. You can also create custom functions to meet your customized computing needs. There are three types of udfs that can be expanded by users: User-Defined Scalar Function, User-Defined Table Valued Function, and User-Defined Aggregation Function ).

At the same time, MaxCompute also provides MapReduce programming interfaces. You can use the Java APIs provided by MapReduce to write MapReduce programs to process data in MaxCompute.

With the end-to-end support provided by MaxCompute Studio, you can quickly start and familiarize yourself with developing your own UDF and MapReduce to improve efficiency. The following is an example of how to use maxcompute Studio to develop your UDF:

Create a MaxCompute Java Module

First, you must create a module in intellij idea for developing the MaxCompute Java program. Specifically, File | new | module... the module type is MaxCompute Java. Configure the installation path of Java JDK and MaxCompute console, click next, enter the module name, and click finish.

Here, the console is configured for two purposes:

  • Compiling udfs and MR depends on the relevant jar of the MaxCompute framework. These jar files exist in the lib directory of the console. studio can help you automatically import these lib files to the module dependency library.

  • Studio can be integrated with the console, and some actions can be easily operated through the console.

So far, a module that can develop MaxCompute java programs has been created, such as jDev. Main directories include:

  • Src (User-developed UDF | source code directory of the MR Program)
  • Examples (sample code directory, including single test examples. You can refer to the example here to develop your own program or write a single test)
  • Warehouse (schema and data required for local operation)

Create a UDF

Assume that the UDF we want to implement is to convert the string to lowercase (the built-in function TOLOWER has implemented this logic. Here we just use this simple requirement to illustrate how to develop UDF through studio ). Maxcompute studio provides a UDF | UDAF | UDTF | Mapper | CER | Driver template. In this way, you only need to write your own business code, and the Framework Code is automatically filled in by the template.

  • 1. Right-click the src directory and choose new | MaxCompute Java

  • 2. Enter the class name, for example, myudf. MyLower. Select the type. Here we select UDF and click OK.

  • 3. The framework code is automatically filled in the template. You only need to write the function code that converts the string to lowercase.

Test UDF

After UDF or MR is developed, the next step is to test your code to see if it meets expectations. Studio provides two testing methods:

Unit Test

Depending on the Local Run framework provided by MaxCompute, you only need to provide input data as you did in a common single test, and assertion output can easily test your own UDF or MR. There are various types of Single-test instances in the examples directory. You can refer to the examples to compile your own unit test. Here we create a test class for MyLowerTest to test our MyLower:

Sample Data Test

Many users want to sample some online table data to the Local Machine for testing, and studio also provides support. In the editor, UDF class MyLower. right-click on java and click "run". The "run configuration" dialog box is displayed. Configure MaxCompute project, table, and column. Here we want to convert the name field of the hy_test table to lowercase:

After clicking OK, studio will automatically download the sample data of the table to the local warehouse (highlighted data file) through tunnel, then read the data of the specified column and run the UDF locally, you can view log output and result printing on the console:

Publish a UDF

Okay, our MyLower. java test is passed. Next we will package it into a jar Resource (this step can be packaged using IDE, refer to the user manual) and upload it to the MaxComptute Server:

  • 1. Select Add Resource from the MaxCompute menu:

  • 2. Select the MaxCompute project to be uploaded, the jar package path, the resource name to be registered, and whether to force update when the resource or function already exists, and then click OK.

  • 3. After the jar package is uploaded successfully, you can register the UDF and select the Create Function menu item from the MaxCompute menu.

  • 4. Select the resource jar to be used, select the main class (studio automatically parses the main class contained in the resource jar for the user to choose from), enter the function name, and click OK.

Production and Use

Successfully uploaded jar Resources and successfully registered Functions (the Resources and Functions nodes under the corresponding Project of project Explorer can be seen in time, and the decompiled source code can be displayed by double-clicking) it can be used in actual production. Open the SQL editor of maxcompute studio and you will be able to use the mylower function we just wrote. syntax highlighting and function signature display are all in the following words:

MapReduce

Maxcompute studio supports MapReduce Development Processes similar to UDF development. The main differences are as follows:

  • The MapReduce program applies to the entire table, and the input and output tables have been specified in the Driver. Therefore, if you use sample data for testing, you only need to specify the project in the run configuration.

  • After MapReduce is developed, you only need to package it into a jar file to upload resources. This step is not registered.

  • For MapReduce, if you want to run it in production, you can use the console that studio seamlessly integrates. Specifically, right-click the Project in the project Explorer Window, select Open in Console, and enter a command similar to the following in the console command line:
    Jar-libjars wordcount. jar-classpath D: \ ODPS \ clt \ wordcount. jar com. aliyun. ODPS. examples. mr. WordCount wc_in wc_out;

About MaxCompute

Welcome to the MaxCompute dingtalk Group

To read the original article, click

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.