This article demonstrates how to use Spark as the analysis engine and Cassandra as the data storage, and use Spring Boot to develop the driver.

1. Prerequisites

  • Install Spark (Spark-1.5.1 is used in this article, for example, the installation directory is/opt/spark)
  • Install Cassandra (3.0 +)

Create keyspace

CREATE KEYSPACE hfcb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

Create table

CREATE TABLE person ( id text PRIMARY KEY, first_name text, last_name text);

Insert Test Data

insert into person (id,first_name,last_name) values('1','wang','yunfei');insert into person (id,first_name,last_name) values('2','peng','chao');insert into person (id,first_name,last_name) values('3','li','jian');insert into person (id,first_name,last_name) values('4','zhang','jie');insert into person (id,first_name,last_name) values('5','liang','wei');

2. spark-cassandra-connector Installation

To enable Spark-1.5.1 to use Cassandra as data storage, add the dependencies of the following jar package (for example, place the package in the/opt/spark/managed-lib/directory, which can be arbitrary ):

cassandra-clientutil-3.0.2.jarcassandra-driver-core-3.1.4.jarguava-16.0.1.jarcassandra-thrift-3.0.2.jar joda-convert-1.2.jarjoda-time-2.9.9.jarlibthrift-0.9.1.jarspark-cassandra-connector_2.10-1.5.1.jar

Under the/opt/spark/conf directory, create a spark-env.sh file and enter the following content


3. Spring Boot Application Development

Add spark-cassandra-connector and spark Dependencies

<dependency>   <groupId>com.datastax.spark</groupId>   <artifactId>spark-cassandra-connector_2.10</artifactId>   <version>1.5.1</version>  </dependency>  <dependency>   <groupId>org.apache.spark</groupId>   <artifactId>spark-core_2.10</artifactId>   <version>1.5.1</version>  </dependency>  <dependency>   <groupId>org.apache.spark</groupId>   <artifactId>spark-sql_2.10</artifactId>   <version>1.5.1</version>  </dependency>

Configure the spark and cassandra paths in application. yml.

spark.master: spark://master:7077cassandra.host: hfcb

In particular, spark: // master: 7077 is a domain name rather than an ip address. You can modify the local hosts file to map the master and ip addresses.

Configure SparkContext and CassandraSQLContext

@Configurationpublic class SparkCassandraConfig { @Value("${spark.master}") String sparkMasterUrl; @Value("${cassandra.host}") String cassandraHost; @Value("${cassandra.keyspace}") String cassandraKeyspace; @Bean public JavaSparkContext javaSparkContext(){  SparkConf conf = new SparkConf(true)    .set("spark.cassandra.connection.host", cassandraHost)//    .set("spark.cassandra.auth.username", "cassandra")//    .set("spark.cassandra.auth.password", "cassandra")    .set("spark.submit.deployMode", "client");  JavaSparkContext context = new JavaSparkContext(sparkMasterUrl, "SparkDemo", conf);  return context; } @Bean public CassandraSQLContext sqlContext(){  CassandraSQLContext cassandraSQLContext = new CassandraSQLContext(javaSparkContext().sc());  cassandraSQLContext.setKeyspace(cassandraKeyspace);  return cassandraSQLContext; } }

Simple call

@Repositorypublic class PersonRepository { @Autowired CassandraSQLContext cassandraSQLContext; public Long countPerson(){  DataFrame people = cassandraSQLContext.sql("select * from person order by id");  return people.count(); }}

You can run it as in the general Spring Boot program.

Source Code address: https://github.com/wiselyman/spring-spark-cassandra.git


The above is an example of Spring Boot integrated development with Spark and Cassandra systems. I hope it will help you. If you have any questions, please leave a message, the editor will reply to you in a timely manner. Thank you very much for your support for the help House website!

