Apache Spark Technology 4--use spark to import a JSON file into Cassandra

Last Update:2014-09-06 Source: Internet

Author: User

Tags cassandra datastax spark rdd

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Welcome reprint, Reproduced please indicate the source.

Profile

This article briefly describes how to use Spark-cassandra-connector to import a JSON file into the Cassandra database, a comprehensive example that uses spark.

Pre-conditions

Suppose you have read the 3 of technical combat and installed the following software

Jdk
Scala
SBt
Cassandra
Spark-cassandra-connector

Experimental purpose

Import the data that exists in the JSON file into the Cassandra database, the official tool currently provided by Cassandra is json2sstable, which I have not tried to succeed because of the lack of knowledge about Cassandra itself.

But the thought of a JSON file can be read in spark SQL, and Spark-cassadra-connector provides the ability to deposit the RDD into the database, I wonder if I can combine the two.

Create keyspace and table

To reduce complexity, continue to use the keyspace and table in combat 3,

CREATE KEYSPACE test WITH replication = {‘class‘: ‘SimpleStrategy‘, ‘replication_factor‘: 1 };CREATE TABLE test.kv(key text PRIMARY KEY, value int);

Start Spark-shell

Consistent with the description in combat 3.

Bin/spark-shell--driver-class-path/root/working/spark-cassandra-connector/spark-cassandra-connector/target/ scala-2.10/spark-cassandra-connector_2.10-1.1.0-snapshot.jar:/root/.ivy2/cache/org.apache.cassandra/ cassandra-thrift/jars/cassandra-thrift-2.0.9.jar:/root/.ivy2/cache/org.apache.thrift/libthrift/jars/ libthrift-0.9.1.jar:/root/.ivy2/cache/org.apache.cassandra/cassandra-clientutil/jars/ cassandra-clientutil-2.0.9.jar:/root/.ivy2/cache/com.datastax.cassandra/cassandra-driver-core/jars/ cassandra-driver-core-2.0.4.jar:/root/.ivy2/cache/io.netty/netty/bundles/netty-3.9.0.final.jar:/root/.ivy2/ cache/com.codahale.metrics/metrics-core/bundles/metrics-core-3.0.2.jar:/root/.ivy2/cache/org.slf4j/slf4j-api/ Jars/slf4j-api-1.7.7.jar:/root/.ivy2/cache/org.apache.commons/commons-lang3/jars/commons-lang3-3.3.2.jar:/root /.ivy2/cache/org.joda/joda-convert/jars/joda-convert-1.2.jar:/root/.ivy2/cache/joda-time/joda-time/jars/ Joda-time-2.3.jar:/root/.ivy2/cache/org.apache.cassandra/cassandra-all/jars/cAssandra-all-2.0.9.jar:/root/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.7.2.jar

Preparing the JSON file

Take the Person.json file that comes with spark as an example, as shown in the following

{"name":"Andy", "age":30}{"name":"Justin", "age":19}

Data import

Assuming that the Person.json file is stored in the $spark_home directory, after starting Spark-shell, execute the following statement

sc.stopimport com.datastax.spark.connector._import org.apache.spark._val conf = new SparkConf()conf.set("spark.cassandra.connection.host", "127.0.0.1")val sc = new SparkContext("local[2]", "Cassandra Connector Test", conf)val sqlContext = new org.apache.spark.sql.SQLContext(sc)val path = "./people.json"val people = sqlContext.jsonFile(path)people.map(p=>(p.getString(10),p.getInt(0)))      .saveToCassandra("test","kv",SomeColumns("key","value"))

Note:

Jsonfile returns the Jsonrdd, where each member is a row type, and does not directly effect Savetocassandra to Jsonrdd, requires a first step conversion that is the map process
The GETXXX function used in map takes out its value in the case of a previously known data type
Last Savetocassandra the stored procedure that triggered the data

Another place worth documenting is that if the table created in Cassandra uses the UUID as primary key, use the following function in Scala to generate the UUID

import java.util.UUIDUUID.randomUUID

Verification steps

Use Cqlsh to see if the data is actually written to the TEST.KV table.

Summary

This experiment combines the following knowledge

Spark SQL
Spark RDD conversion function
Spark-cassandra-connector

Apache Spark Technology 4--use spark to import a JSON file into Cassandra

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More