Spark Bulk Read Redis data-pipeline (Scala)

Source: Internet
Author: User
Tags auth redis

Recently, when processing data, you need to join the raw data with Redis data, in the process of reading Redis, encountered some problems, by the way to make a note, hoping for other students also helpful. During the experiment, it was not stressful to read Redis one at a time when the amount of data was 100,000 levels, but when the amount of data reached tens, the problem arose, even with the mappartitions of Spark. Therefore, consider using Redis's pipeline (if you have a better approach, please do not hesitate to advise). PS: This article is mainly for the Scala language, because there is no Scala version of Redis pipeline online, I hope this article can provide a reference for beginners.

This article will first show you how to use Scala to read Redis data one by one, and then introduce the use of pipeline. method One, read the Redis data row by line

In this article, the main use is the Redis.clients.jedis.Jedis library, if you are using SBT to run Spark, you can do the following configuration in BUILD.SBT:

Name: = "Sparkredisexp"

version: = "1.0.0"

scalaversion: = "2.10.4"

librarydependencies + = " Org.apache.spark "percent" Spark-core "%" 1.3.1 "

librarydependencies + =" redis.clients "%" Jedis "%" 2.6.2 "

resolvers + = "Akka respository" at "http://repo.akka.io/releases/"

The corresponding Jedis library can be downloaded to GitHub for jedis-2.6.2.jar:https://github.com/csuldw/workutils/tree/master/spark/deps. Below, please see the detailed content. Import Redis Libraries

First, the Redis library is imported, and the Redis.clients.jedis.Jedis library is used here.

Import Redis.clients.jedis.Jedis
Connect to Redis

Then connect Redis, the main settings redishost, Redisport, if there is a password, password authentication is required.

Val redishost = "localhost"
val redisport = 8080
val redisclient = new Jedis (redishost, redisport)
Redisclien T.auth (Redispassword)
Read Redis data

Next, you can get Redis data directly using get

val keys = Array ("Key1", "Key2", "Key3", "Key4") for
(key <-keys) {
  println (Redisclient.get (key))
}

The above method does not use Redis's pipeline, which can be used when the data is low. The following describes how to use pipeline to bulk read Redis data. method Two, using Redis pipeline bulk read Redis data

In relation to the first method, there is an additional two libraries, redis.clients.jedis.Pipeline, and redis.clients.jedis.Response that need to be introduced here. Import related Libraries

Import Redis.clients.jedis.Jedis
import redis.clients.jedis.Pipeline
import Redis.clients.jedis.Response
Connect to Redis

This action is the same as above, as follows:

Val redishost = "localhost"
val redisport = 8080
val redisclient = new Jedis (redishost, redisport)
Redisclien T.auth (Redispassword)
use pipeline to read one of the data (simplified version)

Give the code first, and then explain it below.

var tempredisres = map[string, response[string]] ()
val keys = Array ("Key1", "Key2", "Key3", "Key4")
val pp = Redis Client.pipelined () for
(key <-keys) {
  tempredisres ++= Map (Key, Pp.get (key)) 
}
Pp.sync ()

Because the Get method under Redis.clients.jedis.Jedis pipelined gets the return value of a response[string] type, the above defines a temporary variable map[string, response[ String]] Type Tempredisres,key is a string type, value is a response[string] type and is used to hold the return value of Pp.get (key). When the For loop is finished, sync with sync. This enables the pipeline functionality of Redis. use pipeline to read data in two (enhanced version)

To prevent accidental failure when connecting to Redis, we need to set a number of attempts to ensure that the data is somewhat correct. Therefore, add a layer of connection logic outside the above code, as follows:

var tempredisres = map[string, response[string]] ()
val keys = Array ("Key1", "Key2", "Key3", "Key4")
var trytimes = 2
var flag = False While
(Trytimes > 0 &&!flag) {
  try{
    val pp = redisclient.pipelined ()
    fo R (key <-keys) {
      tempredisres ++= Map (Key, Pp.get (key))
    }
    pp.sync ()
    flag = True
  }catch { C13/>case e:exception = {
      flag = False
      println ("redis-timeout" + e)
      trytimes = tryTimes-1
    }
  }finally{
    redisclient.disconnect ()
  }
}

Again: Pp.get () is the result of a response[string], please see redis-clients-jedis-pipeline for more information.

Ok, this article to this end, more wonderful articles, please enter the http://www.csuldw.com.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.