Narrow dependence Narrow dependency
Map,filter,union,
Join (co-partitioned) formulates which unique sub-rdd The Shard in the parent RDD is specifically assigned to
In parallel, the Rdd shard is independent.
Shards that rely on the same ID only
Range Shard
One to Dependency
Range dependency
Inside can previously computed partition
The computation can be merged, can greatly improve the efficiency, writing may be a number of functions, when the execution of the merger into a function, greatly reducing the fragmented memory or disk resources.
Wide dependency
Groupbykey,join with inputs not co-partitioned
Shards of multiple child rdd will depend on the same parent Rdd Shard
Or, the Shard of the same parent RDD has multiple sub-rdd shards used.
will produce shuffle.
Shuffle dependency
"Hash Shuffle,sort Shuffle"
Spark RDD's wide dependency and narrow dependency-(video note)