10 Big data frameworks and tools for Java developers

Source: Internet
Author: User
Tags cassandra memcached solr zookeeper hazelcast

The biggest challenges facing it developers today are complexity, hardware becoming more complex, OS becoming more complex, programming languages and APIs becoming more complex, and the applications we build are becoming more complex. According to a survey by the foreign media, the mid-soft excellence expert lists some of the tools or frameworks that Java programmers have been using for the last 12 months and may make sense to you.


Let's take a look at the concept of big data. According to Wikipedia, big data is a broad term for large or complex datasets, so traditional data processing programs are not enough to support such a large volume.

In many cases, using a SQL database to store/Retrieve data is a good choice. Now, in many cases, it can no longer satisfy our purpose, it all depends on the use case changes.

Now to discuss a number of different non-SQL storage/processing data tools, such as NoSQL database, full-text search engine, real-time streaming, graphics database, etc.

1. mongodb--'s most popular, cross-platform, document-oriented database.

MongoDB is a database based on distributed file storage, written in the C + + language. Designed to provide scalable, high-performance data storage solutions for Web applications. Application performance is dependent on database performance, MongoDB is the most abundant in non-relational database, most like relational database, with the release of Mongdb 3.4, its application scenario is further expanded.

The core advantages of MongoDB are flexible document models, highly available replication sets, and extensible Shard clusters. You can try to understand MongoDB in several ways, such as real-time monitoring of MONGODB tools, memory usage and page faults, number of connections, database operations, replication sets, and so on.

2, elasticsearch--for cloud-built distributed restful search engine.

Elasticsearch is a Lucene-based search server. It provides a full-text search engine with distributed multiuser capabilities, based on a restful web interface. Elasticsearch is a popular enterprise-class search engine developed in Java and published as an open source under the Apache license terms.

Elasticsearch is not only a full-text search engine, but also a distributed real-time document store, where each field is indexed and searchable, and is a distributed search engine with real-time analytics, and can scale to hundreds of servers to store and process petabytes of data. Elasticsearch uses Lucene to perform its indexing function at the bottom, so many of its basic concepts originate from Lucene.

3. cassandra--Open source distributed database management system, originally developed by Facebook, is designed to handle large amounts of data on many commodity servers, providing high availability with no single point of failure.

Apache Cassandra is a set of open source distributed NoSQL database systems. The data model for Google BigTable is a fully distributed architecture with Amazon Dynamo. After 2008 Open source, since Cassandra Good scalability, by Digg, Twitter and other Web 2.0 sites adopted, became a popular distributed structured data storage solution.

Because Cassandra is written in Java, it is theoretically possible to run on machines with JDK6 and above, and the official test JDK is openjdk and Sun's JDK. Cassandra operation commands, similar to our usual operation of the relational database, for the familiar with MySQL friends, the operation will be easy to get started.

4, redis--Open source (BSD license) memory data structure storage, used as database, cache and message agent. Redis is an open source, Web-enabled, memory-based, Key-value database that is written in the ANSI C language, and provides APIs in multiple languages. Redis has three main features that differentiate it from many other competitors: Redis is a database that stores data entirely in memory, and uses disk for persistence purposes only; Redis has a relatively rich data type compared to many key-value data storage systems; Redis can copy data to any number

5, hazelcast--Java-based open source memory data grid.

The Hazelcast is an in-memory data grid In-memory, which provides Java programmer mission-critical transactions and trillions of memory applications. Although Hazelcast does not have the so-called "Master", there is still a leader node (the oldest member), which is similar to zookeeper in leader, but the implementation principle is completely different. At the same time, the data in the Hazelcast is distributed, and each member holds some data and the corresponding backup data, which is also different from zookeeper.

Hazelcast's ease of use is well-liked by developers, but it needs to be considered carefully if it is to be used.

6, ehcache--widely used open source Java distributed cache. Primarily for general purpose caches, Java EE, and lightweight containers.

ehcache is a pure Java in-process caching framework, which is fast and capable, and is the default Cacheprovider in Hibernate. The main features are: fast and simple, with a variety of cache policies, cache data has two levels, memory and disk, so there is no need to worry about capacity issues; The cached data is written to disk during virtual machine restart, distributed cache can be done through RMI, pluggable APIs, and listening interface with cache and cache manager; Supports multi-cache manager instances, as well as multiple cache areas for an instance, and provides caching implementations for hibernate.

7. hadoop--Open source software framework written in Java for distributed storage, and for very large data users can develop distributed programs without knowing the distributed underlying details. Take advantage of clustering for high-speed operations and storage. Hadoop implements a distributed filesystem (Hadoop Distributed File System), referred to as HDFs. The core design of the Hadoop framework is: HDFs and MapReduce. HDFS provides storage for massive amounts of data, and MapReduce provides calculations for massive amounts of data.

8, solr--Open Source Enterprise search platform, written in Java, from the Apache Lucene project.

SOLR is a stand-alone enterprise Search application server that provides API interfaces similar to Web-service. The user can submit an XML file of a certain format to the search engine server via an HTTP request, generate an index, or make a lookup request through an HTTP GET operation and get the returned result in XML format.

Like Elasticsearch, it is also based on Lucene, but it has been extended to provide a richer query language than Lucene, while being configurable, extensible, and optimized for query performance.

9. The most active project in Spark--apache Software Foundation is an open source cluster computing framework.

Spark is an open-source cluster computing environment similar to Hadoop, but there are some differences between the two that make spark more advantageous in some workloads, in other words, Spark enables the memory distribution dataset, in addition to providing interactive queries, It can also optimize iteration workloads.

Spark is implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop, Spark and Scala are tightly integrated, and Scala can be as easy as manipulating local collection objects to

10, memcached--Universal Distributed memory cache system.

Memcached is a set of distributed cache system, originally Danga Interactive for LiveJournal development, but is used by many software (such as MediaWiki). Memcached as a high-speed distributed cache server, with the following features: Simple protocol, based on Libevent event processing, built-in memory storage.

I'd like to recommend the Big Data Learning Exchange Qun531629188 I created myself.


Whether it's Daniel or the college student who wants to study


I am very welcome to the small series, today's information has been uploaded to the group files, not regularly share dry goods,


including my own to organize a new big data for the 2018 study tutorial, welcome beginner and advanced small partners.


10 Big data frameworks and tools for Java developers

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.