NoSQL database Cassandra (i)

Source: Internet
Author: User
Tags cassandra zookeeper



With the development of Internet technology, the requirement of data storage is high, there is high demand in capacity, security, backup, high availability and so on, the popular relational database has SQL Server, MySQL, Orcale, etc., non-relational database has key, value type Redis , Memcached, the document database has MongoDB, CouchDB, and the list of types of HBase, Cassandra. A wide variety of knowledge points to learn more and more, we in the technology selection, we must follow the "no best technology, only the most suitable technology." Because of the business needs of some tossing new technology, the next will be a preliminary study of the Cassandra process to make notes for later review.



1. first knowledge of Cassandra



Apache Cassandra is a highly scalable, high-performance distributed NoSQL database. The Cassandra is designed to handle large amounts of data on many servers, providing high availability without worrying about single point of failure.



The Cassandra has a distributed architecture capable of handling large amounts of data. Data is placed on different machines with multiple replication factors to achieve high availability without worrying about a single point of failure.



Official website: http://cassandra.apache.org/Help document: http://cassandra.apache.org/doc/latest/contactus.html



Current mainstream version:Apache Cassandra 3.11 Apache Cassandra 3.0



Apache Cassandra 2.2 Apache Cassandra 2.1


  Not at the moment. Cassandra Compare new books, online can search cassandra combat  cassandra , the best way to do this is to study official documents.


Comparison between 1.1 Cassandra and relational database


The
Cassandra relational database
cassandra is used to process unstructured data. The rdbms is used to process structured data. The
cassandra has a flexible mode. The rdbms has a fixed pattern.
in Cassandra,keyspace< /code> is the outermost container that contains the data that corresponds to the application.


In MySQL and other relational database, there are the concept of tables and libraries, different types of database in the creation of libraries and other ways are not the same, MySQL and other relational database must first create a database and table structure to insert data, and Redis based on the number of databases in the configuration file, Several databases have been generated, and only a select switch is required. MongoDB is a special kind of special database, there is no table concept is a library and collection, under certain circumstances, do not create their own, can directly insert data is very convenient. Cassandra there is no concept of library, inside iskeyspaceand the entity of the table. Some of these methods are similar to relational databases such as MySQL, where there is a big gap between the two.



Comparison between 1.2 Cassandra and HBase







HBase Cassandra
HBase is based on BigTable (Google) Cassandra is based on Dynamodb (Amazon). It was originally developed by a former Amazon engineer on Facebook. This is one of the reasons why Cassandra supports multiple data centers.
HBase uses the Hadoop infrastructure (ZOOKEEPER,NAMENODE,HDFS). Organizations that deploy Hadoop must have knowledge of Hadoop and HBase. Cassandra is developed separately from Hadoop, and its basic tools and operational knowledge requirements are different from Hadoop. However, for analysis, many Cassandra deployments use Cassandra + Storm (using zookeeper) and/or Cassandra + Hadoop.
The Hbase-hadoop base tool has several "moving parts" consisting of Zookeeper,name Node,hbase master and Data nodes, and zookeeper is a cluster, which is naturally fault-tolerant. The name node requires clustering for fault tolerance. Cassandra uses a single node type. All nodes are equal and all functions are performed. Any node can be used as a coordinator to ensure that there are no spof. Adding storm or Hadoop will of course increase the complexity of your infrastructure.
HBase is ideal for range-based scanning. Cassandra does not support range-based row scanning, which may be limited in some use cases.
HBase provides asynchronous replication across an HBase cluster. Cassandra Random partitioning provides row replication across a single line.
HBase supports only sequential partitioning. Cassandra formally supports orderly partitioning, but Cassandra does not have production users using ordered distribution, due to hot spots created by hotspots and difficult to operate.
Because of the orderly partitioning, hbase can be easily placed horizontally while also supporting Rowkey range scanning. If the data is stored in Cassandra columns to support range scanning, the actual limit for the size of the Cassandra Row is10megabytes.
HBase supports atomic comparisons and settings, and HBase supports transactions within a row. Cassandra does not support atomic comparisons and settings.
HBase does not support single-line read load balancing, and only one regional server is available at a time. Cassandra will support single-line read load balancing.
The Bloom filter can be used for hbase as another form of index. Cassandra uses the Bloom filter for key lookups.
Triggers are supported by the coprocessor feature in HBase. Cassandra does not support coprocessor functionality


in recent years, with the development of big data technology and industrial chain, Hadoop, Spark, Storm and other technologies developed rapidly, at the same time, a lot of large data related technical staff in short supply, the price has turned a lot, let me this kind of cock silk envy. HBase is a pioneer and cornerstone in the field of big data storage. Play a very important role. But the overall structure of the volume is really not small, the overall architecture is more complex than Cassandra, virtually increased the complexity of the system and maintainability.



1.3 Internet companies using Cassandra



Foreign:




    • Ebay:200+tb,400+m Write, 100+m read, application scenario: Social signals on the commodity details page, such as like,want,own,favorites, user and commodity hunch taste graph; Time series such as mobile notifications, Anti-cheating, SOA, monitoring, log services, etc.;

    • Netflix: A large-scale cluster with 288+96+60 instances, 1.1 million writes per second, 3 AWS EC2 Zone automatic replicas in the eastern United States, total 3.3 million writes/sec;

    • apple:75000+ nodes, 10s of Pbs,millions ops/s, largest cluster 1000+ nodes


Domestic:


    • from the public data, there should be at least 1500 clusters of servers. The reasons for choosing Cassandra are as follows: Few teams, tight demand, open source projects, no single point, no center, suitable for online business, code understandable, team members have code base, the community is more active .

    • Hangzhou with Shield technology specific usage is not clear, only know the underlying data storage architecture mainly based on Cassandra, is a big data wind control, anti-fraud company, the development is very rapid.


2. Installation and Practice



1. Environmental Requirements


Installing cassandraprerequisitesthe latest version of Java 8, either the Oracle Java standard Edition 8 or OpenJDK 8. Toverify the correct version of Java installed, type java-version. For using Cqlsh, the latest version of Python 2.7. To verify this you havethe correct version of Python installed, type Python--version according to the official website know the document needs JAVA8 and python2.7 support now a lot of production The environment is already using centos7.x operating system, and centos7.x is self-python2.7, we check it ourselves, the lack of python2.7 and java8 situation, please install it yourself.


2. Common installation Methods


    • Binary installation

    • SOURCE Installation

    • Yum Package Manager installation

      Installation Instructions page: http://cassandra.apache.org/download/


Binary installation method Simple answer fast, do not need to compile, after the installation package download, the network dependence is relatively low.



3. Single-Machine installation test



Operating system: CentOS 7.1



Cassandra:cassandra 3.11.1



Installation method: Yum installation is able to surf the internet,



Yum Source information:


/etc/yum.repos.d/cassandra.repo[cassandra]name=apache cassandrabaseurl=gpgcheck=1repo_gpgcheck=1gpgkey= Installing sudo Yum install Cassandra Start service Cassandra start service boot chkconfig Cassandra on


Cassandra more relevant content, follow-up on the common keyspace operation, as well as table operations, additions and deletions, daily monitoring, security and backup, high-availability cluster and other related knowledge.










NoSQL database Cassandra (i)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.