List of Apache

Source: Internet
Author: User
Tags apache solr zookeeper elastic search sqoop value store accumulo

From http://projects.apache.org/indexes/quick.html

[Now, Future ], 2015-02-06 update.

Apache Accumulo

The Apache accumulo sorted, distributed Key/value Store is based on Google ' s BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of Cell-level access labels and a server-side prog Ramming mechanism that can modify key/value pairs at various points in the data management process.

Categories:database
Languages:java
Pmc:apache Accumulo

Apache Ambari

Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.

Categories:big-data
Languages:java, Python, JavaScript
Pmc:apache Ambari

Apache Avro

Apache Avro is a data serialization system.

Categories:library, Big-data
LANGUAGES:C, C + +, C #, Java, PHP, Python, Ruby
Pmc:apache Avro

apache Chukwa

chukwa are an open source data collection system for monitoring large Distributed systems. Chukwa is built on top of the Hadoop distributed File System (HDFS) and map/reduce framework and inherits Hadoop ' s Scalabi Lity and robustness. Chukwa also includes a? Exible and powerful toolkit for displaying, monitoring and analyzing results of the collected data.

categories:hadoop
Languages:java, Javascript
Pmc:apache chukwa

Apache Drill

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop D ATA Storage Systems. It was inspired on part by Google ' s Dremel.

Categories:big-data
Languages:java
Pmc:apache Drill

Apache giraph

Apache Giraph is a iterative graph processing system built for high scalability. For example, it's currently used at Facebook to analyze the social graph formed by users and their connections.

Categories:big-data
Languages:java
Pmc:apache Giraph

Apache Hadoop

Hadoop is a distributed computing platform. This includes the Hadoop distributed Filesystem (HDFS) and an implementation of MapReduce.

Categories:database
Languages:java
Pmc:apache Hadoop

Apache Hama

The Apache Hama is an efficient and scalable general-purpose BSP computing engine which can being used to speed up a large VA Riety of compute-intensive analytics applications.

Categories:big-data
Languages:java
Pmc:apache Hama

Apache HBase

Use Apache HBase Software if you need random, realtime Read/write access to your Big Data. This project's goal is the hosting of very large tables--billions of rows X millions of columns--atop clusters of Comm Odity hardware. HBase is a open-source, distributed, versioned, column-oriented store modeled after Google ' s bigtable:a distributed Stor Age System for structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides bigtable-like c Apabilities on top of Hadoop and HDFS.

Categories:database
Languages:java
Pmc:apache HBase

Apache Hive

the Apache Hive (TM) Data Warehouse software facilitates querying and managing large datasets residing in Distributed storage. Built on top of Apache Hadoop (TM), it provides * tools-to-enable easy data extract/transform/load (ETL) * A mechanism to Impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other DAT A storage systems such as Apache HBase (TM) * Query execution via MapReduce Hive defines a simple sql-like query language, Called HiveQL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who is familiar with the MapReduce framework to being able to plug I n their custom mappers and reducers to perform more sophisticated an analysis of that is not being supported by the built-in CAPAB Ilities of the language. HiveQL can also is extended with custom scalar functions (UDF's), aggregations (UDAF ' s), and table functions (UDTF ' s).

Categories:database
Languages:java
Pmc:apache Hive

Apache Lucene Core

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It's a technology suitable for nearly any application that requires Full-text search, especially cross-platform.

Categories:database
Languages:java
Pmc:apache Lucene

Apache Mahout

Scalable Machine Learning Library

Categories:library
Languages:java
Pmc:apache Mahout

apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene, the project had diversified and now comprises-codebases, namely:nutch 1.x:a well mature D, production ready crawler. 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which is great for batch processing. Nutch 2.x:an Emerging alternative taking direct inspiration from 1.x, but which differs in one key area; Storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent Mappings. This means we can implement a extremely flexibile model/stack for storing everything (fetch time, status, content, parsed Text, Outlinks, InLinks, etc.) into a number of NoSQL storage solutions. Being pluggable and modular of course has it ' s benefits, Nutch provides extensible interfaces such as Parse, Index and Sco Ringfilter ' s for custom implementations e.g. Apache Tika for parsing. Additonally, Pluggable indexing exists for Apache SOLR, Elastic Search, etc. Nutch can run on a single machine, but gains a lot of it strength from running in a Hadoop cluster

Categories:web-framework
Languages:java
Pmc:apache Nutch

Apache Oozie

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java Map-reduce, streaming map-reduce, Pig, Hive, Sqoop and DISTCP) as well as system specific jobs (such as Java programs and Shell scripts).

Categories:big-data
Languages:java, JavaScript
Pmc:apache Oozie

Apache Pig

Apache Pig is a platform for analyzing large data sets this consists of a high-level language for expressing data analysis Programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs was that their structure was amenable to substantial parallelization, which in turns en Ables them to handle very large data sets. Pig ' s infrastructure layer consists of a compiler that produces sequences of map-reduce programs. Pig ' s language layer consists of a textual language called Pig Latin, which has the following key properties: * Ease of PR Ogramming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations is explicitly encoded as data flow sequences, makin G them easy to write, understand, and maintain. * Optimization opportunities. The which tasks is encoded permits the system to optimize their execution automatically, allowing theUser to focus on semantics rather than efficiency. * Extensibility. Users can create their own functions to do special-purpose processing.

Categories:database
Languages:java
Pmc:apache Pig

Apache Spark

Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, Machin e Learning, and graph analytics.

Categories:big-data
Languages:java, Scala, Python
Pmc:apache Spark

Apache Sqoop

Apache Sqoop (TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Categories:big-data
Languages:java
Pmc:apache Sqoop

Apache Storm

Apache Storm is a distributed real-time computation system. Similar to about Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general PRI Mitives for doing real-time computation.

Categories:big-data
Languages:java
Pmc:apache Storm

Apache ZooKeeper

Apache ZooKeeper is a effort to develop and maintain an Open-source server which enables highly reliable distributed coor Dination.

Categories:database
Languages:java
Pmc:apache ZooKeeper

List of Apache

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.