[Continuation of the Spark][python]sortbykey example
What is the Collect () effect of the RDD?
The continuation of the [Spark][python]sortbykey example]
In []: Mydata004.collect ()
OUT[20]:
[[u ' 00001 ', U ' sku933 '],
[u ' 00001 ', U ' sku022 '],
[u ' 00001 ', U ' sku912 '],
[u ' 00001 ', U ' sku331 '],
[u ' 00002 ', U ' sku010 '],
[u ' 00003 ', U ' sku888 '],
[u ' 00004 ', U ' sku411 ']
In []: Mydata004.count ()
OUT[22]: 7
In [All]: Mydata005.count ()
---------------------------------------------------------------------------
TypeError Traceback (most recent)
<ipython-input-23-c1554a7ccdd7> in <module> ()
----> 1 Mydata005.count ()
Typeerror:count () takes exactly one argument (0 given)
In []: type (mydata005)
OUT[24]: List
in [+]: type (mydata004)
OUT[25]: Pyspark.rdd.PipelinedRDD
By comparison, mydata005 is a list.
This means that collect will return a list.
If you run <rdd>.collect in an interactive environment, the contents of all elements of this RDD are displayed.
[Spark] [Python] What is the collect effect of the RDD?