Spark-stream access to Redis Database sample

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, a number of flow calculation handlers have been written on the Spark-stream, the program architecture is as follows

The program runs on Spark-stream, and my goal is to Kafka, Redis parameters are supported at startup.

Implemented in Scala.

The address of Redis server is written dead, my program wants to move a position, want to change code to compile again.

It was time to spend, and now write to share with you, to improve the efficiency of the latter.

As pictured above Spark is a distributed engine, the Redis pool created in driver is recreated on worker, and the reference article defines a REDIS connection pool management class, Redis pool is a static variable of the class, which is automatically created by the JVM when it is loaded. There is a gap between this and my expectations.

The Redis management object is created in driver, then the object is broadcast, and then the broadcast object is fetched on a worker, which makes the parameter variable, but the Redis management object is only instantiated once on each worker.

Driver

DRIVER specifies serialization, Spark supports two serialization methods, and Java and Kryo,kryo are more efficient.

The information says that the Kryo way needs to register the class, but I did not register also can run successfully.

public static void Main (string[] args) {
        if (Args.length < 3) {
& nbsp;           System.err.println ("Usage:kafka_spark_redis <brokers> <topics> <redisserver>\n "+
                     " <brokers> Kafka broker list \ n" +
                   " <topics> List of topic to consume \ n "+
                     "<redisServer> redis server address \ n \ nthe");
            system.exit (1);
       }

       /* Parse parameters */
        String Brokers = Args[0];
        String topics = args[1];
        String redisserver = args[2];

       //Create stream context, two-second data count batch
         sparkconf sparkconf = new sparkconf (). Setappname ("Kafka_spark_redis");
//        sparkconf.set ("Spark.serializer", " Org.apache.spark.serializer.JavaSerializer ");//java serial number speed is not kryo faster
       Sparkconf.set ("Spark.serializer", "Org.apache.spark.serializer.KryoSerializer");
//        sparkconf.set ("Spark.kryo.registrator", "Myregistrator");
        Javastreamingcontext jssc = new Javastreamingcontext (sparkconf, Durations.seconds (2));
        Javasparkcontext sc = jssc.sparkcontext ();

hashset<string> topicsset = new Hashset<string> (Arrays.aslist (Topics.split (","));
hashmap<string, string> kafkaparams = new hashmap<string, string> ();
Kafkaparams.put ("Metadata.broker.list", brokers);
Kafkaparams.put ("Group.id", "kakou-test");

Redis Connection Pool Management class
Redisclient redisclient = new Redisclient (redisserver);//create Redis Connection Pool management class

Broadcast Reids connection Pool management Object
Final broadcast<redisclient> Broadcastredis = Sc.broadcast (redisclient);

Creating a Stream Processing object
javapairinputdstream<string, string> messages = Kafkautils.createdirectstream (
JSSC,
String.class,/* Kafka Key class * *
String.class,/* Kafka value class * *
Stringdecoder.class,/* Key Decoding class * *
Stringdecoder.class,/* Value decoding class * *
Kafkaparams,/* Kafka parameters, such as setting Kafka broker * *
Topicsset * * Topic name to be consumed
);

       //splitting lines into words
        Javadstream<string> lines = Messages.map (new function<tuple2<string, String>, string> () {
          //@Override
            //Kafka came key-value to
             Public String Call (tuple2<string, string> tuple2) {

Take value
return tuple2._2 ();
}
});
/* Large number of omitted * *
........
}
Redisclient

Redisclient is the class that you implement, overloading Write/read these two serialization and deserialization functions in your class, and note that if you are Java serializer you need to implement additional interfaces.

Calls to the write serialization function are triggered when driver broadcasts.

public class Redisclient implements Kryoserializable {
public static Jedispool Jedispool;
Public String host;

Public Redisclient () {
Runtime.getruntime (). Addshutdownhook (New Cleanworkthread ());
}

Public redisclient (String host) {
This.host=host;
Runtime.getruntime (). Addshutdownhook (New Cleanworkthread ());
Jedispool = new Jedispool (new Genericobjectpoolconfig (), host);
}

Static class Cleanworkthread extends thread{
@Override
public void Run () {
System.out.println ("Destroy Jedis pool");
if (null!= jedispool) {
Jedispool.destroy ();
Jedispool = null;
}
}
}

Public Jedis getresource () {
return Jedispool.getresource ();
}

public void Returnresource (Jedis Jedis) {
Jedispool.returnresource (Jedis);
}

public void Write (Kryo kryo, output output) {
Kryo.writeobject (output, host);
}

public void Read (Kryo kryo, input input) {
Host=kryo.readobject (input, string.class);
This.jedispool =new Jedispool (New Genericobjectpoolconfig (), host);
}
}
Worker

To get the broadcast variable in the FOREACHRDD, the broadcast variable triggers the call to Redisclient's parameterless deserialization function, and then the deserialization function, and our approach is to create the Redis Pool in the deserialization function.

Standard output, the vehicle license plate and blacklist to match, to match the successful, save to the Redis.
Paircar.foreachrdd (new function2<javarddPublic Void Call (javarddDate Now=new date ();
Rdd.foreachpartition (New voidfunction<iteratorpublic void Call (IteratorString TMP1;
String TMP2;
Date Now=new date ();
Redisclient Redisclient=broadcastredis.getvalue ();
Jedis Jedis=redisclient.getresource ();

......

Redisclient.returnresource (Jedis);
}
});
Conclusion

Spark has encapsulated distributed computing, but many scenarios still need to understand its working mechanism, and many problems and performance optimization are closely related to spark's working mechanism.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark-stream access to Redis Database sample

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark-stream access to Redis Database sample

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support