Spark Cassandra Connector use

Last Update:2016-04-24 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, Cassandra Preparation

Start Cqlsh,

cqlsh_host=172.16.163.131 Bin/cqlsh

Cqlsh>create keyspace productlogs with REPLICATION = {' class ': ' Org.apache.cassandra.locator.SimpleStrategy ', ' Replication_factor ': ' 2 ' } cqlsh>CREATE TABLE productlogs.logs (    IDs uuid,    app_name text,    App_version text, city    text,    client_time timestamp,    country text,    created_at timestamp,     int ,    device_id text    ,int,    modle_name text,    Province text,    remote _ip text,    updated_at timestamp,    PRIMARY KEY (IDS))

2. Spark Cassandra Conector jar Package

Create a new empty project, use SBT, introduce connector, package as Spark-cassandra-connector-full.jar

The significance of this step is that the official connector package does not have to be relied on, so, when using the official package directly, you need to find out the dependencies. Different versions depend on the package and version is not the same, for simplicity, directly hit a full package

3. Start Spark-shell

/opt/db/spark-1.5.2-bin-hadoop2.6/bin/spark-shell--master Spark://u1:7077  --jars ~/ Spark-cassandra-connector-full.jar

The following is the Sparkshell command

4. Prepare the data source:

//Most documents may stop the current SC, and then restart one, in fact, there is no need, directly on the original SC add Cassandra parameters just fineScala>sc.getconf.set ("Spark.cassandra.connection.host", "172.16.163.131")//reading a data source on HDFsScala>val df = sc.textfile ("/data/logs")//introducing the required command spaceScala>ImportOrg.apache.spark.sql._scala>ImportOrg.apache.spark.sql.types._scala>ImportCom.datastax.spark.connector._scala>ImportJava.util.UUID//define SHCMEAScala>val schema =Structtype (Structfield ("IDs", StringType,true):: Structfield ("id", Integertype,true):: Structfield ("App_name", StringType,true):: Structfield ("App_version", StringType,true):: Structfield ("Client_time", Timestamptype,true):: Structfield ("device_id", StringType,true):: Structfield ("Modle_name", StringType,true):: Structfield ("Cs_count", Integertype,true):: Structfield ("Created_at", Timestamptype,true):: Structfield ("Updated_at", Timestamptype,true):: Structfield ("Remote_ip", StringType,true):: Structfield ("Country", StringType,true):: Structfield ("Province", StringType,true):: Structfield ("City", StringType,true):: Nil)//Specify the schema of the data sourceScala>val Rowrdd = Df.map (_.split ("\ T")). Map (P = = Row (Uuid.randomuuid (). toString (), p (0). ToInt, P (1), P (2), Java.sql.Timestamp.valueOf (P (3)), P (4), P (5), P (6). ToInt, Java.sql.Timestamp.valueOf (P (7)), Java.sql.Timestamp.valueOf (P (8)), p (9), p (Ten), p (one), p (12)) Scala>val df=Sqlcontext.createdataframe (Rowrdd, schema) Scala>df.registertemptable ("Logs")//look at the results .Scala>sqlcontext.sql ("SELECT * from Logs limit 1"). Show

5. Deposit data into Cassandra

scala>Import  org.apache.spark.sql.cassandra._scala>df.write.format (" Org.apache.spark.sql.cassandra "). Options (Map (" table "," Logs "," Keyspace "," Productlogs ")). Save ()

6. Remove the data that has just been saved:

Scala>import org.apache.spark.sql.cassandra._scala>val cdf = sqlcontext.read.  Format ("Org.apache.spark.sql.cassandra").  Options (Map ("table", "Logs", "Keyspace", "Productlogs")).  Load (). Registertemptable ("Logs") scala>sqlcontext.sql ("Select * from Logs_jsut_save limit 1"). Show

Spark Cassandra Connector use

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More