Since its inception in 1999, the Apache Software Foundation has successfully built its own strong ecosystem. Many of the best open source projects have sprung up in their communities, and more and more domestic and international projects are being hatched in this open source community. It is learnt that all of the Apache projects now need to be incubated by incubators to meet a range of quality requirements before graduation. Projects that graduate from incubators either become top-level projects on their own or become sub-projects of other top-level projects.
To help you understand the standards of Apache incubation, this article counts the 11 top projects that have been successfully incubated and independently managed by Apache from January 1, 2016 to January 19, 2017. At the same time, you are welcome to comment on the area of interaction, chat about Apache's views, share the feelings that have been used.
1. Apache Beam
Apache Beam is the incubation project that Google contributed to the Apache Foundation on February 1, 2016 and officially announced its graduation on January 10, 2017 to upgrade to the Apache top program.
The main goal of Apache Beam is to unify the programming paradigm for batch and stream processing, providing a simple, flexible, feature-rich, and highly expressive SDK for infinite, disorderly, web-scale data set processing. The project focuses on the programming paradigm and interface definition of data processing, and does not involve implementation of a specific execution engine. Apache Beam hopes that data handlers developed based on Beam can be executed on any distributed computing engine.
2, Apache Eagle
Apache Eagle originated in EBay, the first to solve the problem of large-scale Hadoop cluster monitoring, October 26, 2015 submitted to Apache incubation, January 10, 2017 formally announced graduation into Apache top project.
Apache Eagle is an open source monitoring and alerting solution for intelligent real-time identification of security and performance issues on big data platforms such as Apache Hadoop, Apache Spark, and more. Apache Eagle mainly includes: high scalability, high scalability, low latency, dynamic coordination and other characteristics, support real-time monitoring of data behavior, can immediately monitor the access to sensitive data or malicious operations, and immediately take measures to respond.
3. Apache Geode
Originally developed by Gemstone Systems as a commercial product, Apache Geode was initially widely used in the financial sector as a transactional, low-latency data engine for Wall Street trading platforms. The code was submitted to the Apache incubator on April 27, 2015 and became Apache's top project on November 21, 2016.
Apache Geode is a data management platform that provides real-time, consistent access to data-critical applications throughout the cloud. It uses dynamic Data replication and partitioning techniques to achieve high availability, high performance, high scalability, and fault tolerance. In addition, for a distributed data container, Apache Geode is a memory-based data management system that provides reliable asynchronous event notification and reliable message delivery.
4. Apache Twill
Apache Twill submitted the code to the Apache incubator on November 14, 2013 and announced on July 27, 2016 that it would be the top program of Apache.
Apache Twill provides a rich built-in capability for common distributed applications for development, deployment, and management, greatly simplifying the operation and management of Hadoop clusters. It is now a key component behind the Cask Data Application Platform (CDAP), using YARN containers and Java threads as abstractions. Cdap is an open source integration and application platform that enables developers and organizations to easily build and deploy and manage data applications on Hadoop and Spark.
5. Apache Kudu
Apache Kudu is a data storage system developed by Cloudera, which became the Apache incubation project on December 3, 2015, officially announced on July 25, 2016 to upgrade to the Apache top project.
Apache Kudu is an open-source Columnstore engine built for the Hadoop ecosystem, designed to deliver flexible, high-performance analytics pipelines. It supports a number of operations in traditional databases, including real-time insertions, updates, and deletions. It is currently used in different companies and organizations in many industries, including retail, online service delivery, risk management, and digital advertising, among others, as well as some of the more familiar Xiaomi companies.
6. Apache Bahir
The code for Apache Bahir was originally extracted from the Apache Spark project and later as a standalone project and announced on June 29, 2016 to become the Apache top project.
Apache Bahir offers a wide range of streaming connectors (streaming connectors) and SQL data source extended analytics platform coverage, initially for Apache Spark only, and is currently available for Apache Flink, followed by Apa Che Beam and more platforms provide outreach services.
7. Apache Zeppelin
Apache Zeppelin is a Web-based notebook that supports interactive data analysis and provides a framework for visualizing data. 2013 Nflabs as a commercial data analysis product Peloton, December 23, 2014 into the Apache incubator, May 25, 2016 graduation for the Apache top project.
Apache Zeppelin helps developers to efficiently process data without worrying about command line and cluster details. Support for more than 20 back-end systems, easy to deploy and use, allows users to mix different languages, exchange data between the backend, adjust layouts, and allow the interaction of custom visualizations and cluster resources. You can create beautiful data-driven, interactive, and collaborative documents using SQL, Scala, and more.
8. Apache Tinkerpop
Apache Tinkerpop started in 2009 at the Los Alamos National Laboratory and, after releasing 2 versions, submitted it to the Apache incubator on January 16, 2015 and graduated from the Apache Top program on May 23, 2016.
Apache Tinkerpop is a graphical computing framework that provides developers with the tools they need to build modern graphics applications of any size in any application area. It unifies these highly variable graphical system models and accelerates development time, both for online transaction processing (OLTP) and online analytical Processing systems (OLAP), both for single-machine data and for large data in distributed environments.
9. Apache Apex
Apache Apex was originally created in 2012 at Datatorrent Inc., and entered the Apache incubator on August 17, 2015, and officially announced its graduation as Apache top project on April 25, 2016.
Apache Apex is an enterprise-class unified streaming and batch engine. Provides highly scalable, high-performance, fault-tolerant, stateful, secure, and distributed large data processing at the same time and easy to operate. The goal is to take advantage of the infrastructure provided by the two major components of Hadoop, YARN and Hadoop Distributed File System (HDFS), to stream Apache Hadoop through an enterprise-class platform.
10. Apache Sentry
Apache Sentry provides centralized, granular access control for Hadoop cluster metadata and data storage, becoming an Apache incubation project in August 2013 and graduating to the top of the Apache project on March 25, 2016.
Apache Sentry is a strengthened, fine-grained, role-based authorization system that provides six classes of rights-access policy management for different Hadoop components. Include: Support multi-permission model, also support the same permission control policy to the Togo computing framework and Data Directory access; Support Apache SOLR (search project), support SQL table permissions and HDFS file permissions synchronization, support the audit log of data management, support high availability (HA) Support the import and export of permission policies between different clusters and so on.
11. Apache Arrow
Originally developed based on the code of the Apache Drill project, Apache Arrow was built on a number of open source collaborations and provided a specification for the processing and interaction of column memory storage, graduating as Apache's top project on February 17, 2016.
Apache Arrow accelerates analytical processing by providing high-performance, column-based memory representations. Many processing algorithms benefit from this memory design. In addition to the traditional relational data, Arrow also supports complex data with dynamic patterns. For example, you can handle JSON data that is typically used in IoT workloads, modern applications, and log files, or allow for greater interoperability between large numbers of big data solutions.
Reprint http://www.lupaworld.com/article-262239-1.html
Inventory 11 top-level projects from Apache