Datafu in Apache into the hatching state _hadoop

Source: Internet
Author: User

The project was created in January 2012, and was early positioned as a user-defined set of functions (UDF) for the Pig project. Compared to more general UDF sets such as Piggybank,datafu, it focuses more on data mining and statistical class functions, such as the calculation of the number of digits and sampling methods. A new library named Datafu Hourglass was added to this project in October 2013. Hourglass is a class library for MapReduce, which provides the ability to work with incremental data. It is typically handled by saving the state of the last job in HDFs and using it to process the new input. These two projects are now part of the incubator.

The Datafu in Apache is a big step forward in the process. Any project must undergo a rigorous review to complete the voting process before entering the incubator. DATAFU,2014, created in early 2012, successfully entered the incubator at the beginning of the year. Typically, an Apache project is hatched for a certain amount of time, and once the project's related services (wikis, mailing lists, tutorials, etc.) are completed, Datafu will end up hatching and become ASF's top-level project or Hadoop subproject.

With the recent entry into the Apache incubator, Datafu has many recent development plans. One of the most critical features is to provide the same UDF for Hive and crunch for a wider range of applications. This includes porting project build systems to Gradle, which are currently being done by Datafu communities. The benefit of the build system changing from ant to Gradle is the ability to consolidate the community to add new functionality to simpler processes.

The Datafu community is still relatively small, but has maintained steady growth. The recent contribution of Russell Jurney has made the Open NLP Project part of the Datafu 1.3.0. The focus of the mailing list is to add more UDF, as described by project contributors Matthew Hayes and Sam Shah, to make Datafu a "WD-40 of large data".

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.