Apache Spark Technology 4--use spark to import a JSON file into Cassandra

Source: Internet
Author: User
Tags cassandra datastax spark rdd

Welcome reprint, Reproduced please indicate the source.

Profile

This article briefly describes how to use Spark-cassandra-connector to import a JSON file into the Cassandra database, a comprehensive example that uses spark.

Pre-conditions

Suppose you have read the 3 of technical combat and installed the following software

    1. Jdk
    2. Scala
    3. SBt
    4. Cassandra
    5. Spark-cassandra-connector
Experimental purpose

Import the data that exists in the JSON file into the Cassandra database, the official tool currently provided by Cassandra is json2sstable, which I have not tried to succeed because of the lack of knowledge about Cassandra itself.

But the thought of a JSON file can be read in spark SQL, and Spark-cassadra-connector provides the ability to deposit the RDD into the database, I wonder if I can combine the two.

Create keyspace and table

To reduce complexity, continue to use the keyspace and table in combat 3,

CREATE KEYSPACE test WITH replication = {‘class‘: ‘SimpleStrategy‘, ‘replication_factor‘: 1 };CREATE TABLE test.kv(key text PRIMARY KEY, value int);
Start Spark-shell

Consistent with the description in combat 3.

Bin/spark-shell--driver-class-path/root/working/spark-cassandra-connector/spark-cassandra-connector/target/ scala-2.10/spark-cassandra-connector_2.10-1.1.0-snapshot.jar:/root/.ivy2/cache/org.apache.cassandra/ cassandra-thrift/jars/cassandra-thrift-2.0.9.jar:/root/.ivy2/cache/org.apache.thrift/libthrift/jars/ libthrift-0.9.1.jar:/root/.ivy2/cache/org.apache.cassandra/cassandra-clientutil/jars/ cassandra-clientutil-2.0.9.jar:/root/.ivy2/cache/com.datastax.cassandra/cassandra-driver-core/jars/ cassandra-driver-core-2.0.4.jar:/root/.ivy2/cache/io.netty/netty/bundles/netty-3.9.0.final.jar:/root/.ivy2/ cache/com.codahale.metrics/metrics-core/bundles/metrics-core-3.0.2.jar:/root/.ivy2/cache/org.slf4j/slf4j-api/ Jars/slf4j-api-1.7.7.jar:/root/.ivy2/cache/org.apache.commons/commons-lang3/jars/commons-lang3-3.3.2.jar:/root /.ivy2/cache/org.joda/joda-convert/jars/joda-convert-1.2.jar:/root/.ivy2/cache/joda-time/joda-time/jars/ Joda-time-2.3.jar:/root/.ivy2/cache/org.apache.cassandra/cassandra-all/jars/cAssandra-all-2.0.9.jar:/root/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.7.2.jar 
Preparing the JSON file

Take the Person.json file that comes with spark as an example, as shown in the following

{"name":"Andy", "age":30}{"name":"Justin", "age":19}
Data import

Assuming that the Person.json file is stored in the $spark_home directory, after starting Spark-shell, execute the following statement

sc.stopimport com.datastax.spark.connector._import org.apache.spark._val conf = new SparkConf()conf.set("spark.cassandra.connection.host", "127.0.0.1")val sc = new SparkContext("local[2]", "Cassandra Connector Test", conf)val sqlContext = new org.apache.spark.sql.SQLContext(sc)val path = "./people.json"val people = sqlContext.jsonFile(path)people.map(p=>(p.getString(10),p.getInt(0)))      .saveToCassandra("test","kv",SomeColumns("key","value"))

Note:

    1. Jsonfile returns the Jsonrdd, where each member is a row type, and does not directly effect Savetocassandra to Jsonrdd, requires a first step conversion that is the map process
    2. The GETXXX function used in map takes out its value in the case of a previously known data type
    3. Last Savetocassandra the stored procedure that triggered the data

Another place worth documenting is that if the table created in Cassandra uses the UUID as primary key, use the following function in Scala to generate the UUID

import java.util.UUIDUUID.randomUUID
Verification steps

Use Cqlsh to see if the data is actually written to the TEST.KV table.

Summary

This experiment combines the following knowledge

    1. Spark SQL
    2. Spark RDD conversion function
    3. Spark-cassandra-connector

Apache Spark Technology 4--use spark to import a JSON file into Cassandra

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.