Spark-stream access to Redis Database sample

Source: Internet
Author: User
Tags date now redis serialization static class redis server

Recently, a number of flow calculation handlers have been written on the Spark-stream, the program architecture is as follows


The program runs on Spark-stream, and my goal is to Kafka, Redis parameters are supported at startup.

Implemented in Scala.

The address of Redis server is written dead, my program wants to move a position, want to change code to compile again.

It was time to spend, and now write to share with you, to improve the efficiency of the latter.


As pictured above Spark is a distributed engine, the Redis pool created in driver is recreated on worker, and the reference article defines a REDIS connection pool management class, Redis pool is a static variable of the class, which is automatically created by the JVM when it is loaded. There is a gap between this and my expectations.

The Redis management object is created in driver, then the object is broadcast, and then the broadcast object is fetched on a worker, which makes the parameter variable, but the Redis management object is only instantiated once on each worker.

Driver

DRIVER specifies serialization, Spark supports two serialization methods, and Java and Kryo,kryo are more efficient.

The information says that the Kryo way needs to register the class, but I did not register also can run successfully.

public static void Main (string[] args) {
        if (Args.length < 3) {
& nbsp;           System.err.println ("Usage:kafka_spark_redis <brokers> <topics> <redisserver>\n "+
                     "  <brokers> Kafka broker list \ n" +
                     "   <topics> List of topic to consume \ n "+
                     "<redisServer> redis server address \ n \ nthe");
            system.exit (1);
       }

       /* Parse parameters */
        String Brokers = Args[0];
        String topics = args[1];
        String redisserver = args[2];

       //Create stream context, two-second data count batch
         sparkconf sparkconf = new sparkconf (). Setappname ("Kafka_spark_redis");
//        sparkconf.set ("Spark.serializer", " Org.apache.spark.serializer.JavaSerializer ");//java serial number speed is not kryo faster
         Sparkconf.set ("Spark.serializer", "Org.apache.spark.serializer.KryoSerializer");
//        sparkconf.set ("Spark.kryo.registrator", "Myregistrator");
        Javastreamingcontext jssc = new Javastreamingcontext (sparkconf, Durations.seconds (2));
        Javasparkcontext sc = jssc.sparkcontext ();

hashset<string> topicsset = new Hashset<string> (Arrays.aslist (Topics.split (","));
hashmap<string, string> kafkaparams = new hashmap<string, string> ();
Kafkaparams.put ("Metadata.broker.list", brokers);
Kafkaparams.put ("Group.id", "kakou-test");

Redis Connection Pool Management class
Redisclient redisclient = new Redisclient (redisserver);//create Redis Connection Pool management class

Broadcast Reids connection Pool management Object
Final broadcast<redisclient> Broadcastredis = Sc.broadcast (redisclient);

Creating a Stream Processing object
javapairinputdstream<string, string> messages = Kafkautils.createdirectstream (
JSSC,
String.class,/* Kafka Key class * *
String.class,/* Kafka value class * *
Stringdecoder.class,/* Key Decoding class * *
Stringdecoder.class,/* Value decoding class * *
Kafkaparams,/* Kafka parameters, such as setting Kafka broker * *
Topicsset * * Topic name to be consumed
);

       //splitting lines into words
        Javadstream<string> lines = Messages.map (new function<tuple2<string, String>, string> () {
            //@Override
            //Kafka came key-value to
             Public String Call (tuple2<string, string> tuple2) {

Take value
return tuple2._2 ();
}
});
/* Large number of omitted * *
........
}
Redisclient

Redisclient is the class that you implement, overloading Write/read these two serialization and deserialization functions in your class, and note that if you are Java serializer you need to implement additional interfaces.

Calls to the write serialization function are triggered when driver broadcasts.

public class Redisclient implements Kryoserializable {
public static Jedispool Jedispool;
Public String host;

Public Redisclient () {
Runtime.getruntime (). Addshutdownhook (New Cleanworkthread ());
}

Public redisclient (String host) {
This.host=host;
Runtime.getruntime (). Addshutdownhook (New Cleanworkthread ());
Jedispool = new Jedispool (new Genericobjectpoolconfig (), host);
}

Static class Cleanworkthread extends thread{
@Override
public void Run () {
System.out.println ("Destroy Jedis pool");
if (null!= jedispool) {
Jedispool.destroy ();
Jedispool = null;
}
}
}

Public Jedis getresource () {
return Jedispool.getresource ();
}

public void Returnresource (Jedis Jedis) {
Jedispool.returnresource (Jedis);
}

public void Write (Kryo kryo, output output) {
Kryo.writeobject (output, host);
}

public void Read (Kryo kryo, input input) {
Host=kryo.readobject (input, string.class);
This.jedispool =new Jedispool (New Genericobjectpoolconfig (), host);
}
}
Worker

To get the broadcast variable in the FOREACHRDD, the broadcast variable triggers the call to Redisclient's parameterless deserialization function, and then the deserialization function, and our approach is to create the Redis Pool in the deserialization function.

Standard output, the vehicle license plate and blacklist to match, to match the successful, save to the Redis.
Paircar.foreachrdd (new function2<javarddPublic Void Call (javarddDate Now=new date ();
Rdd.foreachpartition (New voidfunction<iteratorpublic void Call (IteratorString TMP1;
String TMP2;
Date Now=new date ();
Redisclient Redisclient=broadcastredis.getvalue ();
Jedis Jedis=redisclient.getresource ();

......

Redisclient.returnresource (Jedis);
}
});
Conclusion

Spark has encapsulated distributed computing, but many scenarios still need to understand its working mechanism, and many problems and performance optimization are closely related to spark's working mechanism.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.