[Spark] [Python] Example of taking a limited record out of a dataframe

Source: Internet
Author: User
Tags deprecated hdfs dfs

[Spark] [Python] Example of a dataframe in which a limited record is taken:

SqlContext = Hivecontext (SC)

PEOPLEDF = SqlContext.read.json ("People.json")

Peopledf.limit (3). Show ()

===

[Email protected] ~]$ HDFs dfs-cat People.json
{"Name": "Alice", "Pcode": "94304"}
{"Name": "Brayden", "age": +, "Pcode": "94304"}
{"Name": "Carla", "age": +, "Pcoe": "10036"}
{"Name": "Diana", "Age": 46}
{"Name": "Etienne", "Pcode": "94104"}
[Email protected] ~]$


In [1]: SqlContext = Hivecontext (SC)

In [2]: PEOPLEDF = SqlContext.read.json ("People.json")
17/10/05 05:03:11 INFO Hive. Hivecontext:initializing execution Hive, version 1.1.0
17/10/05 05:03:11 INFO Client. clientwrapper:inspected Hadoop version:2.6.0-cdh5.7.0
17/10/05 05:03:11 INFO Client. clientwrapper:loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/05 05:03:14 INFO hive.metastore:Trying to connect to Metastore with URI thrift://localhost.localdomain:9083
17/10/05 05:03:14 INFO hive.metastore:Opened A connection to Metastore, current connections:1
17/10/05 05:03:15 INFO hive.metastore:Connected to Metastore.
17/10/05 05:03:16 INFO session. sessionstate:created HDFS directory:file:/tmp/spark-99a33db4-b69a-46a9-8032-f87d63299040/scratch/training
17/10/05 05:03:16 INFO session. sessionstate:created Local Directory:/tmp/4e1c5259-7ae8-482c-ae77-94d3a0c51f91_resources
17/10/05 05:03:16 INFO session. sessionstate:created HDFS directory:file:/tmp/spark-99a33db4-b69a-46a9-8032-f87d63299040/scratch/training/ 4e1c5259-7ae8-482c-ae77-94d3a0c51f91
17/10/05 05:03:16 INFO session. sessionstate:created Local Directory:/tmp/training/4e1c5259-7ae8-482c-ae77-94d3a0c51f91
17/10/05 05:03:16 INFO session. sessionstate:created HDFS directory:file:/tmp/spark-99a33db4-b69a-46a9-8032-f87d63299040/scratch/training/ 4e1c5259-7ae8-482c-ae77-94d3a0c51f91/_tmp_space.db
17/10/05 05:03:16 INFO session. Sessionstate:no Tez session required at this point. Hive.execution.engine=mr.
17/10/05 05:03:16 INFO JSON. Jsonrelation:listing Hdfs://localhost:8020/user/training/people.json on Driver
17/10/05 05:03:19 INFO Storage. Memorystore:block broadcast_0 stored as values in memory (estimated size 251.1 kb, free 251.1 KB)
17/10/05 05:03:20 INFO Storage. Memorystore:block broadcast_0_piece0 stored as bytes in memory (estimated size 21.6 kb, free 272.7 KB)
17/10/05 05:03:20 INFO Storage. Blockmanagerinfo:added Broadcast_0_piece0 in Memory on localhost:55073 (size:21.6 KB, free:208.8 MB)
17/10/05 05:03:20 INFO Spark. Sparkcontext:created broadcast 0 from JSON at Nativemethodaccessorimpl.java:-2
17/10/05 05:03:20 INFO mapred. Fileinputformat:total input paths to process:1
17/10/05 05:03:21 INFO Spark. Sparkcontext:starting Job:json at Nativemethodaccessorimpl.java:-2
17/10/05 05:03:21 INFO Scheduler. Dagscheduler:got Job 0 (JSON at nativemethodaccessorimpl.java:-2) with 1 output partitions
17/10/05 05:03:21 INFO Scheduler. Dagscheduler:final Stage:resultstage 0 (json at nativemethodaccessorimpl.java:-2)
17/10/05 05:03:21 INFO Scheduler. Dagscheduler:parents of Final stage:list ()
17/10/05 05:03:21 INFO Scheduler. Dagscheduler:missing parents:list ()
17/10/05 05:03:21 INFO Scheduler. Dagscheduler:submitting resultstage 0 (mappartitionsrdd[3] at JSON at Nativemethodaccessorimpl.java:-2), which have no MIS Sing parents
17/10/05 05:03:21 INFO Storage. Memorystore:block broadcast_1 stored as values in memory (estimated size 4.3 kb, free 277.1 KB)
17/10/05 05:03:21 INFO Storage. Memorystore:block broadcast_1_piece0 stored as bytes in memory (estimated size 2.4 kb, free 279.5 KB)
17/10/05 05:03:21 INFO Storage. Blockmanagerinfo:added Broadcast_1_piece0 in Memory on localhost:55073 (size:2.4 KB, free:208.8 MB)
17/10/05 05:03:21 INFO Spark. Sparkcontext:created broadcast 1 from broadcast at dagscheduler.scala:1006
17/10/05 05:03:21 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Resultstage 0 (mappartitionsrdd[3) at the JSON at Nativemethodaccessorimpl.java: -2)
17/10/05 05:03:21 INFO Scheduler. Taskschedulerimpl:adding task set 0.0 with 1 tasks
17/10/05 05:03:21 INFO Scheduler. Tasksetmanager:starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,process_local, 2149 bytes)
17/10/05 05:03:21 INFO executor. Executor:running task 0.0 in stage 0.0 (TID 0)
17/10/05 05:03:21 INFO Rdd. Hadooprdd:input split:hdfs://localhost:8020/user/training/people.json:0+179
17/10/05 05:03:21 INFO Configuration.deprecation:mapred.tip.id is deprecated. Instead, use Mapreduce.task.id
17/10/05 05:03:21 INFO Configuration.deprecation:mapred.task.id is deprecated. Instead, use Mapreduce.task.attempt.id
17/10/05 05:03:21 INFO Configuration.deprecation:mapred.task.is.map is deprecated. Instead, use Mapreduce.task.ismap
17/10/05 05:03:21 INFO Configuration.deprecation:mapred.task.partition is deprecated. Instead, use Mapreduce.task.partition
17/10/05 05:03:21 INFO Configuration.deprecation:mapred.job.id is deprecated. Instead, use Mapreduce.job.id
17/10/05 05:03:22 INFO executor. executor:finished task 0.0 in stage 0.0 (TID 0). 2354 bytes result sent to driver
17/10/05 05:03:22 INFO Scheduler. Dagscheduler:resultstage 0 (JSON at nativemethodaccessorimpl.java:-2) finished in 0.931 s
17/10/05 05:03:22 INFO Scheduler. tasksetmanager:finished task 0.0 in stage 0.0 (TID 0) in 850 ms on localhost (1/1)
17/10/05 05:03:22 INFO Scheduler. taskschedulerimpl:removed TaskSet 0.0, whose tasks has all completed, from pool
17/10/05 05:03:22 INFO Scheduler. Dagscheduler:job 0 Finished:json at Nativemethodaccessorimpl.java:-2, took 1.388410 s
17/10/05 05:03:23 INFO Hive. Hivecontext:default Warehouse Location Is/user/hive/warehouse
17/10/05 05:03:23 INFO Hive. hivecontext:initializing Metastore Client version 1.1.0 using Spark classes.
17/10/05 05:03:23 INFO Client. clientwrapper:inspected Hadoop version:2.6.0-cdh5.7.0
17/10/05 05:03:23 INFO Client. clientwrapper:loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/05 05:03:23 INFO Spark. contextcleaner:cleaned Accumulator 2
17/10/05 05:03:23 INFO Storage. Blockmanagerinfo:removed Broadcast_1_piece0 on localhost:55073 in memory (size:2.4 KB, free:208.8 MB)
17/10/05 05:03:25 INFO hive.metastore:Trying to connect to Metastore with URI thrift://localhost.localdomain:9083
17/10/05 05:03:25 INFO hive.metastore:Opened A connection to Metastore, current connections:1
17/10/05 05:03:25 INFO hive.metastore:Connected to Metastore.
17/10/05 05:03:25 INFO session. sessionstate:created Local Directory:/tmp/684b38e5-72f0-4712-81d4-4c439e093f5c_resources
17/10/05 05:03:25 INFO session. sessionstate:created HDFS Directory:/tmp/hive/training/684b38e5-72f0-4712-81d4-4c439e093f5c
17/10/05 05:03:25 INFO session. sessionstate:created Local Directory:/tmp/training/684b38e5-72f0-4712-81d4-4c439e093f5c
17/10/05 05:03:25 INFO session. sessionstate:created HDFS Directory:/tmp/hive/training/684b38e5-72f0-4712-81d4-4c439e093f5c/_tmp_space.db
17/10/05 05:03:25 INFO session. Sessionstate:no Tez session required at this point. Hive.execution.engine=mr.

In [3]: Peopledf.limit (3). Show ()
17/10/05 05:04:09 INFO Storage. Memorystore:block broadcast_2 stored as values in memory (estimated size 65.5 kb, free 338.2 KB)
17/10/05 05:04:10 INFO Storage. Memorystore:block broadcast_2_piece0 stored as bytes in memory (estimated size 21.4 kb, free 359.6 KB)
17/10/05 05:04:10 INFO Storage. Blockmanagerinfo:added Broadcast_2_piece0 in Memory on localhost:55073 (size:21.4 KB, free:208.8 MB)
17/10/05 05:04:10 INFO Spark. Sparkcontext:created broadcast 2 from showstring at nativemethodaccessorimpl.java:-2
17/10/05 05:04:10 INFO Storage. Memorystore:block broadcast_3 stored as values in memory (estimated size 251.1 kb, free 610.7 KB)
17/10/05 05:04:11 INFO Storage. Memorystore:block broadcast_3_piece0 stored as bytes in memory (estimated size 21.6 kb, free 632.4 KB)
17/10/05 05:04:11 INFO Storage. Blockmanagerinfo:added Broadcast_3_piece0 in Memory on localhost:55073 (size:21.6 KB, free:208.7 MB)
17/10/05 05:04:11 INFO Spark. Sparkcontext:created broadcast 3 from showstring at nativemethodaccessorimpl.java:-2
17/10/05 05:04:12 INFO mapred. Fileinputformat:total input paths to process:1
17/10/05 05:04:12 INFO Spark. Sparkcontext:starting job:showstring at Nativemethodaccessorimpl.java:-2
17/10/05 05:04:12 INFO Scheduler. Dagscheduler:got Job 1 (showstring at nativemethodaccessorimpl.java:-2) with 1 output partitions
17/10/05 05:04:12 INFO Scheduler. Dagscheduler:final Stage:resultstage 1 (showstring at nativemethodaccessorimpl.java:-2)
17/10/05 05:04:12 INFO Scheduler. Dagscheduler:parents of Final stage:list ()
17/10/05 05:04:12 INFO Scheduler. Dagscheduler:missing parents:list ()
17/10/05 05:04:12 INFO Scheduler. Dagscheduler:submitting resultstage 1 (mappartitionsrdd[9] at showstring at Nativemethodaccessorimpl.java:-2), which Has no missing parents
17/10/05 05:04:12 INFO Storage. Memorystore:block broadcast_4 stored as values in memory (estimated size 5.9 kb, free 638.2 KB)
17/10/05 05:04:12 INFO Storage. Memorystore:block broadcast_4_piece0 stored as bytes in memory (estimated size 3.3 kb, free 641.5 KB)
17/10/05 05:04:12 INFO Storage. Blockmanagerinfo:added Broadcast_4_piece0 in Memory on localhost:55073 (size:3.3 KB, free:208.7 MB)
17/10/05 05:04:12 INFO Spark. Sparkcontext:created broadcast 4 from broadcast at dagscheduler.scala:1006
17/10/05 05:04:12 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Resultstage 1 (mappartitionsrdd[9 "at Showstring" at Nativemethodaccessorimpl . java:-2)
17/10/05 05:04:12 INFO Scheduler. Taskschedulerimpl:adding Task Set 1.0 with 1 tasks
17/10/05 05:04:12 INFO Scheduler. Tasksetmanager:starting task 0.0 in Stage 1.0 (TID 1, localhost, partition 0,process_local, 2149 bytes)
17/10/05 05:04:12 INFO executor. Executor:running task 0.0 in Stage 1.0 (TID 1)
17/10/05 05:04:12 INFO Rdd. Hadooprdd:input split:hdfs://localhost:8020/user/training/people.json:0+179
17/10/05 05:04:14 INFO CodeGen. Generateunsafeprojection:code generated in 1563.240244 ms
17/10/05 05:04:14 INFO CodeGen. Generatesafeprojection:code generated in 182.529448 ms
17/10/05 05:04:15 INFO executor. executor:finished task 0.0 in Stage 1.0 (TID 1). 2328 bytes result sent to driver
17/10/05 05:04:15 INFO Scheduler. Dagscheduler:resultstage 1 (showstring at nativemethodaccessorimpl.java:-2) finished in 2.549 s
17/10/05 05:04:15 INFO Scheduler. Dagscheduler:job 1 finished:showstring at Nativemethodaccessorimpl.java:-2, took 2.852393 s
17/10/05 05:04:15 INFO Scheduler. tasksetmanager:finished task 0.0 in Stage 1.0 (TID 1) in 2547 ms on localhost (1/1)
17/10/05 05:04:15 INFO Scheduler. Taskschedulerimpl:removed TaskSet 1.0, whose tasks has all completed, from pool
+----+-------+-----+-----+
| age| name|pcode| pcoe|
+----+-------+-----+-----+
|null| alice|94304| null|
| 30| brayden|94304| null|
| 19| carla| null|10036|
+----+-------+-----+-----+


In [4]:

[Spark] [Python] Example of taking a limited record out of a dataframe

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.