Spark Cassandra Connector use

Source: Internet
Author: User
Tags cassandra

1, Cassandra Preparation

Start Cqlsh,

cqlsh_host=172.16.163.131 Bin/cqlsh

Cqlsh>create keyspace productlogs with REPLICATION = {' class ': ' Org.apache.cassandra.locator.SimpleStrategy ', ' Replication_factor ': ' 2 ' } cqlsh>CREATE TABLE productlogs.logs (    IDs uuid,    app_name text,    App_version text, city    text,    client_time timestamp,    country text,    created_at timestamp,     int ,    device_id text    ,int,    modle_name text,    Province text,    remote _ip text,    updated_at timestamp,    PRIMARY KEY (IDS))

2. Spark Cassandra Conector jar Package

Create a new empty project, use SBT, introduce connector, package as Spark-cassandra-connector-full.jar

The significance of this step is that the official connector package does not have to be relied on, so, when using the official package directly, you need to find out the dependencies. Different versions depend on the package and version is not the same, for simplicity, directly hit a full package

3. Start Spark-shell

/opt/db/spark-1.5.2-bin-hadoop2.6/bin/spark-shell--master Spark://u1:7077  --jars ~/ Spark-cassandra-connector-full.jar

The following is the Sparkshell command

4. Prepare the data source:

//Most documents may stop the current SC, and then restart one, in fact, there is no need, directly on the original SC add Cassandra parameters just fineScala>sc.getconf.set ("Spark.cassandra.connection.host", "172.16.163.131")//reading a data source on HDFsScala>val df = sc.textfile ("/data/logs")//introducing the required command spaceScala>ImportOrg.apache.spark.sql._scala>ImportOrg.apache.spark.sql.types._scala>ImportCom.datastax.spark.connector._scala>ImportJava.util.UUID//define SHCMEAScala>val schema =Structtype (Structfield ("IDs", StringType,true):: Structfield ("id", Integertype,true):: Structfield ("App_name", StringType,true):: Structfield ("App_version", StringType,true):: Structfield ("Client_time", Timestamptype,true):: Structfield ("device_id", StringType,true):: Structfield ("Modle_name", StringType,true):: Structfield ("Cs_count", Integertype,true):: Structfield ("Created_at", Timestamptype,true):: Structfield ("Updated_at", Timestamptype,true):: Structfield ("Remote_ip", StringType,true):: Structfield ("Country", StringType,true):: Structfield ("Province", StringType,true):: Structfield ("City", StringType,true):: Nil)//Specify the schema of the data sourceScala>val Rowrdd = Df.map (_.split ("\ T")). Map (P = = Row (Uuid.randomuuid (). toString (), p (0). ToInt, P (1), P (2), Java.sql.Timestamp.valueOf (P (3)), P (4), P (5), P (6). ToInt, Java.sql.Timestamp.valueOf (P (7)), Java.sql.Timestamp.valueOf (P (8)), p (9), p (Ten), p (one), p (12)) Scala>val df=Sqlcontext.createdataframe (Rowrdd, schema) Scala>df.registertemptable ("Logs")//look at the results .Scala>sqlcontext.sql ("SELECT * from Logs limit 1"). Show

5. Deposit data into Cassandra

scala>Import  org.apache.spark.sql.cassandra._scala>df.write.format (" Org.apache.spark.sql.cassandra "). Options (Map (" table "," Logs "," Keyspace "," Productlogs ")). Save ()

6. Remove the data that has just been saved:

Scala>import org.apache.spark.sql.cassandra._scala>val cdf = sqlcontext.read.  Format ("Org.apache.spark.sql.cassandra").  Options (Map ("table", "Logs", "Keyspace", "Productlogs")).  Load (). Registertemptable ("Logs") scala>sqlcontext.sql ("Select * from Logs_jsut_save limit 1"). Show

Spark Cassandra Connector use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.