Selection of big data technology routes for Small and Medium-sized Enterprises
Currently, big data is mainly used in the Internet and e-commerce fields, and is gradually used in the telecom and power industries. For the majority of small and medium-sized enterprises, big data has heard too much. However, the technical threshold for big data is still high. From the technical line, choosing the technical solutions used by large companies may be unacceptable.
My company chose the industry-wide hadoop solution. After a year, three employees, one demo version, have not yet been released. Big Data is really fascinating.
Small and medium-sized enterprises should select their own big data technology routes. With the help of big companies, we can't afford it. Is there any big data solution suitable for small and medium-sized enterprises? I carefully collected a few for your reference. 1. Cassandra + presto Cassandra is an open source distributed nosql database system. It was initially developed by Facebook and used to store inbox and other simple format data. It integrates the googlebigtable data model with the fully distributed architecture of amazondynamo. Facebook opened Cassandra in 2008. Since then, because of Cassandra's excellent scalability, Cassandra has become a popular distributed structured data storage solution adopted by well-known Web 2.0 websites such as Digg and Twitter. Cassandra features:
- Distributed
- Column-based structure
- High scalability
Cassandra provides the following features:
- Flexible mode
- Scalability
- Multiple Data Centers
- Range Query
- List Data Structure
- Distributed write operations
- Consistent hash
- Gossip protocol simplifies cluster management
- Real-time update
- Efficient secondary index
- Efficient Data Compression
Presto is an open-source "interactive" SQL query engine developed in Java. It was built by Facebook, the original creator of hive. Presto uses a method similar to impala, that is, it provides interactive experience while still using existing data sets stored on hadoop. It also needs to be installed on many "nodes", similar to Impala. Presto provides the following features:
- ANSI-SQL syntax support (may be a ANSI-92)
- JDBC driver
- A set of connectors used to read data from an existing data source. Connectors include HDFS, hive, and Cassandra.
- Interaction with hive MetaStore for mode sharing
Integration of Presto/CASSANDRA: Ad-Hoc analysis over Cassandra data with Facebook presto http://blog.csdn.net/china_world/article/details/39966699 2. trafodion: transactional SQL on hbase trafodion is an open source project sponsored by HP, develop an enterprise-level SQL hbase solution for Big Data transactions or business workloads at the HP lab and HP-it. Trafodion is licensed in Apache license, version 2.0. Trafodion is built on scalability, elasticity, and hadoop flexibility. The extension of trafodion hadoop ensures transaction integrity and enables various new big data applications to run on hadoop.
Key features of trafodion
- Full-functioned ansi SQL language support
- JDBC/ODBC connectivity for Linux/Windows clients
- Acid Distributed Transaction protection loss SS multiple statements, tables and rows
- Performance improvements for OLTP workloads with compile-time and run-time Optimizations
- Support for large data sets using a parallel-aware Query Optimizer
|
Key benefits of trafodion
- Reuse existing SQL skills and improve developer productivity
- Distributed acid transactions guarantee data consistency loss SS multiple rows and tables
- Interoperability with existing tools and applications
- Hadoop and Linux distribution neutral
- Easy to add to your existing hadoop infrastructure
|
Selection of big data technology routes for Small and Medium-sized Enterprises