In general, when we use dataset<row> for Groupbykey, you will find that the last parameter of this method requires a encoder, so how do these encoder be defined?
General data types
Staticencoder<byte[]> BINARY () an encoder forarrays of bytes.StaticEncoder<boolean> Boolean () an Encoder forNullableBooleantype.StaticEncoder<byte> Byte () an Encoder forNullablebytetype.StaticEncoder<java.sql.date> Date () an Encoder fornullable date type.StaticEncoder<java.math.bigdecimal> DECIMAL () an Encoder fornullable decimal type.StaticEncoder<double> Double () an Encoder forNullableDoubletype.StaticEncoder<float> Float () an Encoder forNullablefloattype.StaticEncoder<integer> INT () an Encoder forNullableinttype.StaticEncoder<long> Long () an Encoder forNullableLongtype.StaticEncoder<short> short () an Encoder forNullable Shorttype.StaticEncoder<string> String () an Encoder fornullable string type.StaticEncoder<java.sql.timestamp> Timestamp () an Encoder forNullable timestamp type.
Example:
Static methods on encoders. Import spark.implicits._ // Static methods on encoders. List<String> data = arrays.aslist ("abc", "ABC", "XYZ"); Dataset
Class Type:
Or constructed from Java Beans: Encoders.bean (MyClass. class
Tuple type:
generic type of tuple
Encoder<tuple2<integer, string>> encoder2 = encoders.tuple (Encoders.int (), encoders.string ()); List<tuple2<integer, string>> data2 = arrays.aslist (new Scala. Tuple2 (1, "a"); Dataset<tuple2<integer, string>> ds2 = Context.createdataset (data2, Encoder2);
A tuple contains classes:
encoder<tuple2<string, myclass>> Encoder = Encoders.tuple (encoders.string (), Encoders.bean (MyClass. Class));
For Encoder please refer to "http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Encoder.html"
For encoders please refer to "http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Encoders.html"
Kafka:zk+kafka+spark Streaming cluster environment Construction (24) structured streaming:encoder