標籤:des style 使用 os strong io 資料 for
“決勝雲端運算大資料時代” Spark亞太地區研究院100期公益大講堂 【第6期互動問答分享】
Q1:spark streaming 可以不同資料流 join嗎?
Spark Streaming不同的資料流可以進行join操作;
Spark Streaming is an extension of the core Spark API that allows enables high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets and be processed using complex algorithms expressed with high-level functions like map
, reduce
, join
and window
join(otherStream, [numTasks]):When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.
Q2:flume 與 spark streaming 適合 叢集 模式嗎?
Flume與Spark Streaming是為叢集而生的;
For input streams that receive data over the network (such as, Kafka, Flume, sockets, etc.), the default persistence level is set to replicate the data to two nodes for fault-tolerance.
Using any input source that receives data through a network - For network-based data sources like Kafka and Flume, the received input data is replicated in memory between nodes of the cluster (default replication factor is 2).
Q3:spark有缺點嘛?
Spark的核心缺點在於對記憶體的佔用比較大;
在以前的版本中Spark對資料的處理主要的是粗粒度的,難以進行精細的控制;
後來加入Fair模式後可以進行細粒度的處理;
Q4:spark streming現在有生產使用嗎?
Spark Streaming非常易於在生產環境下使用;
無需部署,只需安裝好Spark,,就按照好了Spark Streaming;
國內像皮皮網等都在使用Spark Streaming;