[Spark] [Python] Example of Spark accessing MySQL, generating dataframe:
Mydf001=sqlcontext.read.format ("jdbc"). Option ("url", "Jdbc:mysql://localhost/loudacre") \
. Option ("DBTable", "accounts"). Option ("User", "training"). Option ("Password", "training"). Load ()
In []: Mydf001=sqlcontext.read.format ("jdbc"). Option ("url", "Jdbc:mysql://localhost/loudacre") \
:. Option ("DBTable", "accounts"). Option ("User", "training"). Option ("Password", "training"). Load ()
17/10/03 05:59:53 INFO Hive. Hivecontext:default Warehouse Location Is/user/hive/warehouse
17/10/03 05:59:53 INFO Hive. hivecontext:initializing Metastore Client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO Client. clientwrapper:inspected Hadoop version:2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO Client. clientwrapper:loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore:Trying to connect to Metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore:Opened A connection to Metastore, current connections:1
17/10/03 05:59:56 INFO hive.metastore:Connected to Metastore.
17/10/03 05:59:56 INFO session. sessionstate:created Local Directory:/tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session. sessionstate:created HDFS Directory:/tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session. sessionstate:created Local Directory:/tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session. sessionstate:created HDFS Directory:/tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session. Sessionstate:no Tez session required at this point. Hive.execution.engine=mr.
In [11]:
In [All]: type (mydf001)
OUT[11]: Pyspark.sql.dataframe.DataFrame
In []: Mydf001.count ()
17/10/03 06:00:29 INFO Spark. Sparkcontext:starting Job:count at Nativemethodaccessorimpl.java:-2
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:registering RDD 2 (count at Nativemethodaccessorimpl.java:-2)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:got Job 0 (count at nativemethodaccessorimpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:final Stage:resultstage 1 (count at Nativemethodaccessorimpl.java:-2)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:parents of final stage:list (Shufflemapstage 0)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:missing Parents:list (shufflemapstage 0)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:submitting shufflemapstage 0 (mappartitionsrdd[2] at count at Nativemethodaccessorimpl.java:-2), which have No missing parents
17/10/03 06:00:30 INFO Storage. Memorystore:block broadcast_0 stored as values in memory (estimated size 11.0 kb, Free 11.0 KB)
17/10/03 06:00:31 INFO Storage. Memorystore:block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 kb, free 16.1 KB)
17/10/03 06:00:31 INFO Storage. Blockmanagerinfo:added Broadcast_0_piece0 in Memory on localhost:36793 (size:5.2 KB, free:208.8 MB)
17/10/03 06:00:31 INFO Spark. Sparkcontext:created broadcast 0 from broadcast at dagscheduler.scala:1006
17/10/03 06:00:31 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Shufflemapstage 0 (mappartitionsrdd[2) at count at Nativemethodaccessorimpl. JAVA:-2)
17/10/03 06:00:31 INFO Scheduler. Taskschedulerimpl:adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO Scheduler. Tasksetmanager:starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,process_local, 1911 bytes)
17/10/03 06:00:31 INFO executor. Executor:running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO CodeGen. Generatemutableprojection:code generated in 425.82589 ms
17/10/03 06:00:32 INFO CodeGen. Generateunsafeprojection:code generated in 78.278589 ms
17/10/03 06:00:33 INFO CodeGen. Generatemutableprojection:code generated in 84.676206 ms
17/10/03 06:00:33 INFO CodeGen. Generateunsaferowjoiner:code generated in 60.144399 ms
17/10/03 06:00:33 INFO CodeGen. Generateunsafeprojection:code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc. jdbcrdd:closed Connection
17/10/03 06:00:34 INFO executor. executor:finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO Scheduler. tasksetmanager:finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO Scheduler. taskschedulerimpl:removed TaskSet 0.0, whose tasks has all completed, from pool
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:shufflemapstage 0 (count at nativemethodaccessorimpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:looking for newly runnable stages
17/10/03 06:00:34 INFO Scheduler. DAGScheduler:running:Set ()
17/10/03 06:00:34 INFO Scheduler. DAGScheduler:waiting:Set (resultstage 1)
17/10/03 06:00:34 INFO Scheduler. DAGScheduler:failed:Set ()
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:submitting resultstage 1 (mappartitionsrdd[5] at count at Nativemethodaccessorimpl.java:-2), which have no MI Ssing Parents
17/10/03 06:00:34 INFO Storage. Memorystore:block broadcast_1 stored as values in memory (estimated size 12.1 kb, free 28.3 KB)
17/10/03 06:00:34 INFO Storage. Memorystore:block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO Storage. Blockmanagerinfo:added Broadcast_1_piece0 in Memory on localhost:36793 (size:5.6 KB, free:208.8 MB)
17/10/03 06:00:34 INFO Spark. Sparkcontext:created broadcast 1 from broadcast at dagscheduler.scala:1006
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Resultstage 1 (mappartitionsrdd[5) at count at Nativemethodaccessorimpl.java :-2)
17/10/03 06:00:34 INFO Scheduler. Taskschedulerimpl:adding Task Set 1.0 with 1 tasks
17/10/03 06:00:34 INFO Scheduler. Tasksetmanager:starting task 0.0 in Stage 1.0 (TID 1, localhost, partition 0,node_local, 1999 bytes)
17/10/03 06:00:34 INFO executor. Executor:running task 0.0 in Stage 1.0 (TID 1)
17/10/03 06:00:34 INFO Storage. Shuffleblockfetcheriterator:getting 1 Non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO Storage. shuffleblockfetcheriterator:started 0 Remote fetches in + MS
17/10/03 06:00:35 INFO CodeGen. Generatemutableprojection:code generated in 52.636353 ms
17/10/03 06:00:35 INFO CodeGen. Generatemutableprojection:code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor. executor:finished task 0.0 in Stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO Scheduler. Dagscheduler:resultstage 1 (count at Nativemethodaccessorimpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO Scheduler. tasksetmanager:finished task 0.0 in Stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO Scheduler. Taskschedulerimpl:removed TaskSet 1.0, whose tasks has all completed, from pool
17/10/03 06:00:35 INFO Scheduler. Dagscheduler:job 0 Finished:count at Nativemethodaccessorimpl.java:-2, took 6.451521 s
OUT[12]: 129761
In [13]:
[Spark] [Python] Example of Spark accessing MySQL, generating dataframe: