Welcome reprint, Reproduced please indicate the source.
Profile
This article briefly describes how to use Spark-cassandra-connector to import a JSON file into the Cassandra database, a comprehensive example that uses spark.
Pre-conditions
Suppose you have read the 3 of technical combat and installed the following software
- Jdk
- Scala
- SBt
- Cassandra
- Spark-cassandra-connector
Experimental purpose
Import the data that exists in the JSON file into the Cassandra database, the official tool currently provided by Cassandra is json2sstable, which I have not tried to succeed because of the lack of knowledge about Cassandra itself.
But the thought of a JSON file can be read in spark SQL, and Spark-cassadra-connector provides the ability to deposit the RDD into the database, I wonder if I can combine the two.
Create keyspace and table
To reduce complexity, continue to use the keyspace and table in combat 3,
CREATE KEYSPACE test WITH replication = {‘class‘: ‘SimpleStrategy‘, ‘replication_factor‘: 1 };CREATE TABLE test.kv(key text PRIMARY KEY, value int);
Start Spark-shell
Consistent with the description in combat 3.
Bin/spark-shell--driver-class-path/root/working/spark-cassandra-connector/spark-cassandra-connector/target/ scala-2.10/spark-cassandra-connector_2.10-1.1.0-snapshot.jar:/root/.ivy2/cache/org.apache.cassandra/ cassandra-thrift/jars/cassandra-thrift-2.0.9.jar:/root/.ivy2/cache/org.apache.thrift/libthrift/jars/ libthrift-0.9.1.jar:/root/.ivy2/cache/org.apache.cassandra/cassandra-clientutil/jars/ cassandra-clientutil-2.0.9.jar:/root/.ivy2/cache/com.datastax.cassandra/cassandra-driver-core/jars/ cassandra-driver-core-2.0.4.jar:/root/.ivy2/cache/io.netty/netty/bundles/netty-3.9.0.final.jar:/root/.ivy2/ cache/com.codahale.metrics/metrics-core/bundles/metrics-core-3.0.2.jar:/root/.ivy2/cache/org.slf4j/slf4j-api/ Jars/slf4j-api-1.7.7.jar:/root/.ivy2/cache/org.apache.commons/commons-lang3/jars/commons-lang3-3.3.2.jar:/root /.ivy2/cache/org.joda/joda-convert/jars/joda-convert-1.2.jar:/root/.ivy2/cache/joda-time/joda-time/jars/ Joda-time-2.3.jar:/root/.ivy2/cache/org.apache.cassandra/cassandra-all/jars/cAssandra-all-2.0.9.jar:/root/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.7.2.jar
Preparing the JSON file
Take the Person.json file that comes with spark as an example, as shown in the following
{"name":"Andy", "age":30}{"name":"Justin", "age":19}
Data import
Assuming that the Person.json file is stored in the $spark_home directory, after starting Spark-shell, execute the following statement
sc.stopimport com.datastax.spark.connector._import org.apache.spark._val conf = new SparkConf()conf.set("spark.cassandra.connection.host", "127.0.0.1")val sc = new SparkContext("local[2]", "Cassandra Connector Test", conf)val sqlContext = new org.apache.spark.sql.SQLContext(sc)val path = "./people.json"val people = sqlContext.jsonFile(path)people.map(p=>(p.getString(10),p.getInt(0))) .saveToCassandra("test","kv",SomeColumns("key","value"))
Note:
- Jsonfile returns the Jsonrdd, where each member is a row type, and does not directly effect Savetocassandra to Jsonrdd, requires a first step conversion that is the map process
- The GETXXX function used in map takes out its value in the case of a previously known data type
- Last Savetocassandra the stored procedure that triggered the data
Another place worth documenting is that if the table created in Cassandra uses the UUID as primary key, use the following function in Scala to generate the UUID
import java.util.UUIDUUID.randomUUID
Verification steps
Use Cqlsh to see if the data is actually written to the TEST.KV table.
Summary
This experiment combines the following knowledge
- Spark SQL
- Spark RDD conversion function
- Spark-cassandra-connector
Apache Spark Technology 4--use spark to import a JSON file into Cassandra