[Spark] [Python] DataFrame Select Operation Example

Source: Internet
Author: User

[Example of a limited record taken in Spark][python]dataframe

the continuation


In [4]: Peopledf.select ("Age")
OUT[4]: Dataframe[age:bigint]

In [5]: Mydf=people.select ("Age")
---------------------------------------------------------------------------
Nameerror Traceback (most recent)
<ipython-input-5-b5b723b62a49> in <module> ()
----> 1 Mydf=people.select ("Age")

Nameerror:name ' People ' is not defined

In [6]: Mydf=peopledf.select ("Age")

In [7]: Mydf.take (3)
17/10/05 05:13:02 INFO Storage. Memorystore:block broadcast_5 stored as values in memory (estimated size 230.1 kb, free 871.7 KB)
17/10/05 05:13:02 INFO Storage. Memorystore:block broadcast_5_piece0 stored as bytes in memory (estimated size 21.4 kb, free 893.1 KB)
17/10/05 05:13:02 INFO Storage. Blockmanagerinfo:added Broadcast_5_piece0 in Memory on localhost:55073 (size:21.4 KB, free:208.7 MB)
17/10/05 05:13:02 INFO Spark. Sparkcontext:created broadcast 5 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:02 INFO Storage. Memorystore:block broadcast_6 stored as values in memory (estimated size 251.1 kb, free 1144.2 KB)
17/10/05 05:13:02 INFO Storage. Memorystore:block broadcast_6_piece0 stored as bytes in memory (estimated size 21.6 kb, free 1165.8 KB)
17/10/05 05:13:02 INFO Storage. Blockmanagerinfo:added Broadcast_6_piece0 in Memory on localhost:55073 (size:21.6 KB, free:208.7 MB)
17/10/05 05:13:02 INFO Spark. Sparkcontext:created broadcast 6 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO mapred. Fileinputformat:total input paths to process:1
17/10/05 05:13:03 INFO Spark. Sparkcontext:starting Job:take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:got Job 2 (take at <ipython-input-7-745486715568>:1) with 1 output partitions
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:final Stage:resultstage 2 (take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:parents of Final stage:list ()
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:missing parents:list ()
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:submitting Resultstage 2 (mappartitionsrdd[14] at take at <ipython-input-7-745486715568>:1), which Has no missing parents
17/10/05 05:13:03 INFO Storage. Memorystore:block broadcast_7 stored as values in memory (estimated size 4.3 kb, free 1170.2 KB)
17/10/05 05:13:03 INFO Storage. Memorystore:block broadcast_7_piece0 stored as bytes in memory (estimated size 2.5 kb, free 1172.6 KB)
17/10/05 05:13:03 INFO Storage. Blockmanagerinfo:added Broadcast_7_piece0 in Memory on localhost:55073 (size:2.5 KB, free:208.7 MB)
17/10/05 05:13:03 INFO Spark. Sparkcontext:created broadcast 7 from broadcast at dagscheduler.scala:1006
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Resultstage 2 (mappartitionsrdd[14) at take at <ipython-input-7-745486715 568&GT;:1)
17/10/05 05:13:03 INFO Scheduler. Taskschedulerimpl:adding Task Set 2.0 with 1 tasks
17/10/05 05:13:03 INFO Scheduler. Tasksetmanager:starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,process_local, 2149 bytes)
17/10/05 05:13:03 INFO executor. Executor:running task 0.0 in stage 2.0 (TID 2)
17/10/05 05:13:03 INFO Rdd. Hadooprdd:input split:hdfs://localhost:8020/user/training/people.json:0+179
17/10/05 05:13:03 INFO CodeGen. Generateunsafeprojection:code generated in 113.719806 ms
17/10/05 05:13:03 INFO executor. executor:finished task 0.0 in stage 2.0 (TID 2). 2235 bytes result sent to driver
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:resultstage 2 (take at <ipython-input-7-745486715568>:1) finished in 0.493 s
17/10/05 05:13:03 INFO Scheduler. tasksetmanager:finished task 0.0 in stage 2.0 (TID 2) in 487 MS on localhost (1/1)
17/10/05 05:13:03 INFO Scheduler. Taskschedulerimpl:removed TaskSet 2.0, whose tasks has all completed, from pool
17/10/05 05:13:03 INFO Scheduler. Dagscheduler:job 2 finished:take at <ipython-input-7-745486715568>:1, took 0.737231 s
OUT[7]: [Row (Age=none), Row (age=30), Row (age=19)]

In [8]:

[Spark] [Python] DataFrame Select Operation Example

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.