Overview of the Gossip protocolNodes in the Cassandra cluster do not have primary and secondary points, and they communicate through a protocol called gossip. Through the gossip protocol, they can know what nodes are in the cluster and how they are state. Each gossip message has a version number on it, the nodes can compare to the received messages to see which messages I need to update, what messages I have and others don't, and then talk to each oth
Column in Cassandra is a ternary group {Name,value,timestamp}
Name
Name is required and has two ways of generating it:
For the static column family, its value is specified by the administrator who created the column family.
For dynamic column family, its value is dynamically set by the client application.
A secondary index can be built on name (secondary index)
Value
Value is not required, such as column familiy, which is equivalent to materi
What is replication?
In Cassandra, replication is the storage of data to multiple nodes to ensure reliability and error tolerance. When you create a keyspace (equivalent to a table in a relational database), you must give a copy placement policy (Replica placement strategy)
What is a replica factor (Replica Factor)?
This number determines several copies, for example, if set to 1, it means that there is only one copy per line, and so on. All copies
After study, decided to cql3/queryprocessor.java inside.Here are two functions, the first of which isPublic Resultmessage process (String queryString, Querystate querystate, queryoptions options, long Querystartnanotime)The function takes a SQL statement of type String, normalizes it (judging whether it is legitimate), and then calls the functionProcessstatement (prepared, querystate, Options, querystartnanotime);For the specific treatment.We build bench functions in the same classpublic void Be
-CQL driver and CQL native protocols
Int
Integers
32-bit signed integer
List
N/A
A collection of one or more ordered elements
Map
N/A
A Json-style Array of literals: {literal:literal, literal:literal ...}
Set
N/A
A collection of one or more elements
Text
Strings
UTF-8 encoded string
Timestamp
Integers, strings
Date plus time, encoded as 8 bytes since epo
Cassandra hbase
Consistency
Quorum NRW PolicySynchronizes Merkle tree using the gossip Protocol to maintain data consistency between cluster nodes.
Single Node, no replication, Strong Consistency
Availability
1. Data is replicated based on the consistent hash adjacent nodes. The data exists in multiple nodes and is not spof.2. If a node goes down, new data from hash to the node is automatically routed to the next node for hi
Keyspace is a container for application data, which corresponds to a schema in a relational database. It is used to group column family. Each application in a cluster has only one keyspace.
When you create a keyspace, you can specify a replication_factor to indicate several replicas:
To create a method:
(Method 1: Use the "DATA Modeling" in Opscenter)
You can also use the command line CASSANDRA-CLI:
CREATE keyspace Charles_learn_cassandra with
Tags: AOP org jmx example init exec 2.0 lines www.1. Prepare for Work 1.1 install spark and configure spark-env.shYou need to install spark before using Spark-shell, please refer to http://www.cnblogs.com/swordfall/p/7903678.htmlIf you use only one node, you can not configure the slaves file, the
Cassandra supports multiple data types at the CQL language level.
CQL type
corresponding Java type
Describe
Ascii
String
ASCII string
bigint
Long
64-bit integer
Blob
Bytebuffer/byte[]
Binary arrays
Boolean
Boolean
Boolean
Counter
Long
Counter, support the increase or decrease of atomicity, do not suppo
Source: comparison of various nosql databases in http://hi.baidu.com/eastdoor/blog/item/758d0e3eedb5d92471cf6c14.html Cassandra, MongoDB, CouchDB, Redis, Riak, HBaseCouchDBDevelopment language: ErlangMain advantages: data consistency and ease of useLicense: ApacheProtocol: HTTP/RESTApplicable: accumulated, less changed data. Or a large number of versions are required.Example: CRM, CMS systems. multi-site deployment is allowed.RedisDevelopment language
Use the Nodetool tool that under the Cassandra/bin directoryNodetool statusThere is and nodes has been off line.Nodetool Removenode 199553f1-f310-41e1-b4a4-f5a1fef7d1b8Wait for a momentAnd on Cassandra Server terminal you'll see:INFO 10:50:36 Removing Host:199553f1-f310-41e1-b4a4-f5a1fef7d1b8INFO 10:50:36 sleeping for 30000ms to ensure/192.168.1.201 does no changeINFO 10:51:06 Advertising Removal for/192.16
All nodes in the Cassandra cluster are peers, so read/write operations can occur on any node in the cluster, and perhaps this node does not require read/write data, so the node that interacts with the user becomes the coordinator node.
Write request for single data center:
When the client is sent to the coordinator node, the coordinator node sends the write request to all nodes in the cluster that have a replica of the target row (target node), such
The column family in the Cassandra corresponds to a table in the relational database that is used to store rows and fields.
Column number in column family is not fixed
In a relational database, each row contains the same number of fields. However, in Cassandra, although column family can define the metadata for column (metadata), the actual number of fields per line is determined by the client program, so
)
-local
--in-local-dc
Use to only repair nodes in the same datacenter.
-pr
--partitioner-range
Run a repair the partition ranges that is primary on a replica.
-seq Start_token
--sequentialstart_token
Run a sequential repair.
-st Start_token
--start-tokenstart_token
Specify the token (Start_token) at which the repair range starts.
-tr
--trace
Trace the repair. Traces is logged tosystem_traces.even
Recently saw a post on the spark architecture, the author is Alexey Grishchenko. The students who have seen Alexey blog should know that he understands spark very deeply, read his "spark-architecture" this blog, a kind of clairvoyant feeling, from the JVM memory allocation to the Spark cluster resource management, step
Tag: blog http OS file 2014 Art
Preface:
Spark has been very popular recently. This article does not talk about spark principles, but studies how to compile spark cluster construction and service scripts. We hope to understand spark clusters from the perspective of running scripts.
The company launched the online project Spark has nearly 1 over time. Effective, spark in fact, excellent distributed computing platform to improve productivity.Start this note. The previous seminar Spark Research Report was shared (it will be divided into articles due to space limitations), in order to help friends who have just contacted
Original link: http://www.raincent.com/content-85-11052-1.html
In the field of large data, only deep digging in the field of data science, to walk in the academic forefront, in order to be in the underlying algorithms and models to walk in front of, and thus occupy the leading position. Source: Canada Rice Valley Large dataIn the field of large data, only deep digging in the field of data science, to walk in the academic forefront, in order to be in the underlying algorithms and models to walk i
1. Optimization? Why? How? When? What?
"Spark applications also need to be optimized. "Many people may have this question," not already have code generators, executive optimizer, pipeline or something. ”。 Yes, Spark does have some powerful built-in tools to make your code faster when it executes. But if everything depends on the tools, framework to do, I think that can only illustrate two questions: you a
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.