Enumerate the top ten open source technology in the Big data field, ten companies

Source: Internet
Author: User
Keywords Can very big data open source ten

Ten Open Source technology:

Apache HBase: This large data management platform is built on Google's powerful bigtable management engine. As a database with open source, Java coding, and distributed multiple advantages, HBase was originally designed for the Hadoop platform, and this powerful data management tool is also used by Facebook to manage the vast data of the messaging platform.

Apache Storm: A distributed real-time computing system for processing high-speed, large data streams. Storm adds reliable real-time data processing capabilities to Apache Hadoop, while also adding low latency dashboards, security alerts, and improved operating methods to help businesses capture business opportunities and develop new businesses more efficiently.

Apache Spark: This technology uses the memory computation, starts from the Multiple Iterations batch processing, allows the data to load into the memory to do repeatedly the query, moreover also fuses the data warehouse, the flow processing and the graph computation and so on many kinds of computational paradigm, Spark uses the Scala language realization, constructs in the HDFs, It's a good combination with Hadoop and runs 100 times times faster than MapReduce.

Apache Hadoop: This technology quickly becomes one of the big data management standards. When it is used to manage large datasets, Hadoop represents a very good performance for complex distributed applications, and the flexibility of the platform enables it to run on commercial hardware systems, and it can easily integrate structured, semi-structured, and even unstructured datasets.

Apache Drill: How big a DataSet do you have? In fact, no matter how big a dataset you have, drill can handle it easily. An interactive analysis platform was established to support HBase, Cassandra, and Mongodb,drill, allowing for large-scale data throughput and rapid results.

Apache Sqoop: Maybe your data is now locked in the old system, Sqoop can help you solve the problem. This platform is a concurrent connection that allows you to easily transfer data from a relational database system to Hadoop, customizing the data type and the mapping of metadata propagation. In fact, you can also import data (such as new data) into HDFs, Hive, and HBase.

Apache Giraph: This is a powerful graphics processing platform with good scalability and usability. The technology has been adopted by Facebook, Giraph can be run in a Hadoop environment and can be deployed directly to existing Hadoop systems. In this way, you can get powerful distributed mapping capabilities, while also leveraging existing large data processing engines.

The Cloudera Impala:impala model can also be deployed on your existing Hadoop cluster to monitor all queries. The technology, like MapReduce, has a powerful batch capability and Impala for real-time SQL queries, and with efficient SQL queries, you can quickly learn about data on large data platforms.

Gephi: It can be used to correlate and quantify information, and you can get a different insight from the data by creating powerful visualizations for the data. Gephi already supports multiple chart types and can be run on large networks with millions of nodes. Gephi has an active user community, Gephi also provides a large number of plug-ins, can and the existing system perfect integration, it can also be complex it connectivity, distributed systems in various nodes, data flow and other information for visualization analysis.

MongoDB: This solid platform has been admired by many organizations, and it has excellent performance in large data management. MongoDB was originally created by DoubleClick employees and is now widely used in large data management. MongoDB is a NoSQL database developed using open source technology that can be used to store and process data on a platform such as JSON. At present, the New York Times, Craigslist and many companies have adopted MongoDB to help them manage large datasets. (Couchbase server also serves as a reference).

Top Ten companies:

Amazon Web Services

Forrester calls AWS the "Cloud Overlord," and when it comes to big data in the cloud computing world, it has to mention Amazon. The company's Hadoop product is known as EMR (Elastic Map Reduce), and AWS explains that the product uses Hadoop technology to provide large data management services, but it is not a pure open-source Hadoop that has been modified and is now specifically used on the AWS Cloud.

Forrester says EMR has good market prospects. Many companies provide services to customers based on EMR, and some companies apply EMR to data query, modeling, integration, and management. And AWS is innovating, and Forrester says future EMR can be scaled automatically based on workload needs. Amazon plans to provide more powerful EMR support for its products and services, including its redshift data Warehouse, the newly released kenesis real-time processing engine, and the planned NoSQL database and business intelligence tools. But AWS does not have its own version of the Hadoop release.

Cloudera

Cloudera has a release of open source Hadoop, a distribution that incorporates many of the technologies of the Apache OSS Open source project, but the distributions based on these technologies have also made great strides. Cloudera has developed a number of features for its Hadoop release, including the Cloudera manager for management and monitoring, as well as the SQL engine called Impala. Cloudera's Hadoop distribution is based on open source Hadoop, but it's not a pure open-source product. When Cloudera customers need some functionality that Hadoop does not have, Cloudera engineers will implement these features or find a partner with the technology. "Cloudera's innovative approach is true to core Hadoop, but because it enables rapid innovation and is responsive to customer needs, this makes it different from other vendors," says Forrester. "At present, Cloudera platform has more than 200 paid customers, some customers with Cloudera technical support has been able to cross over 1000 nodes to achieve the effective management of PB-level data."

Hortonworks

Like Cloudera, Hortonworks is a pure Hadoop technology company. Unlike Cloudera, Hortonworks believes Open-source Hadoop is more powerful than any other vendor's Hadoop release. Hortonworks's goal is to build the Hadoop ecosystem and the Hadoop user community to advance the development of open source projects. The Hortonworks platform is tightly linked to open source Hadoop, and company executives say it benefits users because it protects them from being stuck with vendors (if Hortonworks customers want to leave the platform, they can easily switch to other open source platforms). This is not to say that Hortonworks relies entirely on open source Hadoop technology, but because it returns all the results of its development to the open source community, such as Ambari, a tool developed by Hortonworks to populate the cluster management project vulnerabilities. Hortonworks's solution has been supported by vendors such as Teradata, Microsoft, Red hat and SAP.

IBM

When a business considers large IT projects, many people first think that IBM.IBM is one of the main players in the Hadoop project, and Forrester says IBM has more than 100 Hadoop deployments, and many of its customers have petabytes of data. IBM has extensive experience in many fields such as Grid computing, global data centers, and implementation of large enterprise data projects. "IBM plans to continue to integrate many technologies such as SPSS analysis, high-performance computing, BI tools, data management and modeling, and workload management for High-performance computing." ”

Intel

Like AWS, Intel continually improves and optimizes Hadoop to run on its own hardware, specifically, to allow Hadoop to run on its Xeon chips, helping users break some of the limitations of the Hadoop system, and make software and hardware more integrated, Intel's Hadoop release has done a better job of this. Forrester points out that Intel has only recently launched the product, so there is a lot of potential for the company to improve in the future, both Intel and Microsoft are considered potential shares in the Hadoop market.

MAPR Technologies

MAPR's Hadoop release may be best so far, but many people may not have heard of it. Forrester's survey of Hadoop users shows that MAPR has the highest rating and its distribution has the highest score on architecture and data processing capabilities. MAPR has incorporated a special set of features into its Hadoop release. For example, network File System (NFS), disaster recovery, and high-availability features. Forrester says MAPR is not as Cloudera and Hortonworks in the Hadoop market as mapr to be a real big business, but also to strengthen partnerships and marketing.

Microsoft

Microsoft has been keeping a low profile on open source, but in the big data situation it has to consider Windows compatible with Hadoop and is actively engaged in open source projects to promote the development of the Hadoop ecosystem more broadly. We can see the results in Microsoft's public cloud Windows Azure hdinsight products. Microsoft's Hadoop service is based on the Hortonworks distribution and is tailored for azure.

Microsoft also has a number of other projects, including a project called Polybase, that allows Hadoop queries to implement some of the features of SQL Server queries. Forrester says: "Microsoft has a great advantage in the market for databases, data warehousing, Cloud, OLAP, BI, spreadsheets (including PowerPivot), collaboration and development tools, and Microsoft has a huge user base, but there is still a long way to go to become an industry leader in the field of Hadoop. ”

Pivotal Software

EMC and VMware part of the large data business spin-off portfolio Pivotal.pivotal has been working to build a superior Hadoop release, and Pivotal has added new tools on the basis of open source Hadoop, including a SQL engine called HAWQ, and a hadoo to address large data issues. P Application. Forrester says the advantage of the pivotal Hadoop platform is that it consolidates the many technologies of pivotal, EMC, and VMware, and that pivotal's real advantage is actually the backing of two big companies, EMC and VMware. So far, pivotal has less than 100 users, and mostly small and medium sized customers.

Teradata

For Teradata, Hadoop is both a threat and an opportunity. Data management, especially with regard to SQL and relational databases, is a Teradata area of expertise. So the rise of a nosql platform like Hadoop could threaten Teradata. Instead, Teradata accepted Hadoop, and Teradata, in collaboration with Hortonworks, integrated SQL technology into the Hadoop platform, This allows Teradata customers to easily use the data stored in the Teradata Data Warehouse on the Hadoop platform.

Amplab

By turning data into information, we can understand the world, and that's what Amplab does. Amplab focuses on machine learning, data mining, database, information retrieval, natural language processing and speech recognition, and strives to improve the screening techniques for information including opaque datasets. In addition to spark, open source distributed SQL query engine shark also stems from Amplab,shark has a very high query efficiency, with good compatibility and scalability. In recent years, the development of computer science into a new era, and Amplab for us to the use of large data, cloud computing, communications and other resources and technology to solve the problem of flexible solutions to cope with the increasingly complex challenges.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.