According to a recent survey of usage of big data tools, we know the Java program Ape's favorite big Data tool.
Question: What tools or frameworks do they like most in the last year?
Respondents can choose the options in the list or list their own, this article is mainly concerned about big data tools. The previous Java survey included the following:
- Development language
- Web Framework
- Application Server
- Database Tools
- SQL data
- Big Data
- Development tools
- Cloud providers
Now, let's look at the definition of big data on Wikipedia:
Big data, broadly speaking, is a large and complex set of data in which traditional data processing methods will no longer apply.
Traditional SQL databases are sufficient for general scenarios. In other scenarios, the traditional database can carry a limited number of databases, and now there are more and more tools to use. The key depends on the scene.
Now let's talk about different non-SQL tools to store/manipulate data-NoSQL databases, memory caches, full-text search engines, live streaming, graphics databases, and more.
Big Data-survey results
- mongodb-a very popular, cross-platform, document-oriented database.
- Elasticsearch-is a distributed RESTful search engine designed for cloud computing.
- cassandra-an open-source distributed database management system. Originally designed and developed by Facebook, it is deployed on a large number of commercial servers to process large amounts of data. High availability, no single point of failure.
- redis-Open Source (BSD) memory data structure storage, memory library, cache, message broker.
- Hazelcast-a Java-based memory data grid.
- Ehcache-is a widely used open source Java distributed cache, EE, lightweight container.
- hadoop-Open source distributed Big Data framework developed in Java to handle very large-scale data, Hadoop is a clustered deployment.
- Solr-uses the open source enterprise search platform developed by Java. Originally belonged to the Apache Lucene project.
- The most active project in SPARK-ASF is an open-source, clustered computing framework.
- memcached– Universal Distributed Cache system.
- Apache hive-supports class-SQL encapsulation in Hadoop, which turns SQL statements into Mr Programs to execute.
- The Apache kafka– high-throughput, distributed, messaging-subscription system was first developed by Linkin.
- Akka–java was developed to build highly concurrent, JVM-based resilient message-driven applications.
- hbase-The open source distributed non-relational database developed by Google's bigtable paper. The development language is Java, with HDFs as the underlying storage.
- neo4j– Open Source graphics database implemented in Java.
- couchbase– Open Source Distributed NoSQL database for document and optimized for interactive applications.
- Apache storm– Open Source distributed real-time computing system.
- Couchdb– uses JSON to store data in an open-source document-oriented NoSQL database.
- Oracle's coherence– in-memory data grid solution enables organizations to predict the scale of mission-critical applications by providing fast access to hot data.
- titan– Scalable Graphics database optimized for clustered storage and querying hundreds of millions of of graphical data.
- Amazon dynamodb– a fast, flexible NoSQL database that can meet the needs of all scale applications, including persistence, millisecond latency.
- Real-time data computing platform on Amazon kinesis–aws.
- datomic– provides full transaction support, cloud computing, distributed database, development language with Clojure.
Java programmer in the Big Data tools, MongoDB stable first!