[Spark] [Python] Example of Spark accessing MySQL, generating dataframe:

Source: Internet
Author: User

[Spark] [Python] Example of Spark accessing MySQL, generating dataframe:

Mydf001=sqlcontext.read.format ("jdbc"). Option ("url", "Jdbc:mysql://localhost/loudacre") \
. Option ("DBTable", "accounts"). Option ("User", "training"). Option ("Password", "training"). Load ()

In []: Mydf001=sqlcontext.read.format ("jdbc"). Option ("url", "Jdbc:mysql://localhost/loudacre") \
:. Option ("DBTable", "accounts"). Option ("User", "training"). Option ("Password", "training"). Load ()
17/10/03 05:59:53 INFO Hive. Hivecontext:default Warehouse Location Is/user/hive/warehouse
17/10/03 05:59:53 INFO Hive. hivecontext:initializing Metastore Client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO Client. clientwrapper:inspected Hadoop version:2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO Client. clientwrapper:loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore:Trying to connect to Metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore:Opened A connection to Metastore, current connections:1
17/10/03 05:59:56 INFO hive.metastore:Connected to Metastore.
17/10/03 05:59:56 INFO session. sessionstate:created Local Directory:/tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session. sessionstate:created HDFS Directory:/tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session. sessionstate:created Local Directory:/tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session. sessionstate:created HDFS Directory:/tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session. Sessionstate:no Tez session required at this point. Hive.execution.engine=mr.

In [11]:


In [All]: type (mydf001)
OUT[11]: Pyspark.sql.dataframe.DataFrame

In []: Mydf001.count ()
17/10/03 06:00:29 INFO Spark. Sparkcontext:starting Job:count at Nativemethodaccessorimpl.java:-2
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:registering RDD 2 (count at Nativemethodaccessorimpl.java:-2)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:got Job 0 (count at nativemethodaccessorimpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:final Stage:resultstage 1 (count at Nativemethodaccessorimpl.java:-2)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:parents of final stage:list (Shufflemapstage 0)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:missing Parents:list (shufflemapstage 0)
17/10/03 06:00:29 INFO Scheduler. Dagscheduler:submitting shufflemapstage 0 (mappartitionsrdd[2] at count at Nativemethodaccessorimpl.java:-2), which have No missing parents
17/10/03 06:00:30 INFO Storage. Memorystore:block broadcast_0 stored as values in memory (estimated size 11.0 kb, Free 11.0 KB)
17/10/03 06:00:31 INFO Storage. Memorystore:block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 kb, free 16.1 KB)
17/10/03 06:00:31 INFO Storage. Blockmanagerinfo:added Broadcast_0_piece0 in Memory on localhost:36793 (size:5.2 KB, free:208.8 MB)
17/10/03 06:00:31 INFO Spark. Sparkcontext:created broadcast 0 from broadcast at dagscheduler.scala:1006
17/10/03 06:00:31 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Shufflemapstage 0 (mappartitionsrdd[2) at count at Nativemethodaccessorimpl. JAVA:-2)
17/10/03 06:00:31 INFO Scheduler. Taskschedulerimpl:adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO Scheduler. Tasksetmanager:starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,process_local, 1911 bytes)
17/10/03 06:00:31 INFO executor. Executor:running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO CodeGen. Generatemutableprojection:code generated in 425.82589 ms
17/10/03 06:00:32 INFO CodeGen. Generateunsafeprojection:code generated in 78.278589 ms
17/10/03 06:00:33 INFO CodeGen. Generatemutableprojection:code generated in 84.676206 ms
17/10/03 06:00:33 INFO CodeGen. Generateunsaferowjoiner:code generated in 60.144399 ms
17/10/03 06:00:33 INFO CodeGen. Generateunsafeprojection:code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc. jdbcrdd:closed Connection
17/10/03 06:00:34 INFO executor. executor:finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO Scheduler. tasksetmanager:finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO Scheduler. taskschedulerimpl:removed TaskSet 0.0, whose tasks has all completed, from pool
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:shufflemapstage 0 (count at nativemethodaccessorimpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:looking for newly runnable stages
17/10/03 06:00:34 INFO Scheduler. DAGScheduler:running:Set ()
17/10/03 06:00:34 INFO Scheduler. DAGScheduler:waiting:Set (resultstage 1)
17/10/03 06:00:34 INFO Scheduler. DAGScheduler:failed:Set ()
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:submitting resultstage 1 (mappartitionsrdd[5] at count at Nativemethodaccessorimpl.java:-2), which have no MI Ssing Parents
17/10/03 06:00:34 INFO Storage. Memorystore:block broadcast_1 stored as values in memory (estimated size 12.1 kb, free 28.3 KB)
17/10/03 06:00:34 INFO Storage. Memorystore:block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO Storage. Blockmanagerinfo:added Broadcast_1_piece0 in Memory on localhost:36793 (size:5.6 KB, free:208.8 MB)
17/10/03 06:00:34 INFO Spark. Sparkcontext:created broadcast 1 from broadcast at dagscheduler.scala:1006
17/10/03 06:00:34 INFO Scheduler. Dagscheduler:submitting 1 missing tasks from Resultstage 1 (mappartitionsrdd[5) at count at Nativemethodaccessorimpl.java :-2)
17/10/03 06:00:34 INFO Scheduler. Taskschedulerimpl:adding Task Set 1.0 with 1 tasks
17/10/03 06:00:34 INFO Scheduler. Tasksetmanager:starting task 0.0 in Stage 1.0 (TID 1, localhost, partition 0,node_local, 1999 bytes)
17/10/03 06:00:34 INFO executor. Executor:running task 0.0 in Stage 1.0 (TID 1)
17/10/03 06:00:34 INFO Storage. Shuffleblockfetcheriterator:getting 1 Non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO Storage. shuffleblockfetcheriterator:started 0 Remote fetches in + MS
17/10/03 06:00:35 INFO CodeGen. Generatemutableprojection:code generated in 52.636353 ms
17/10/03 06:00:35 INFO CodeGen. Generatemutableprojection:code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor. executor:finished task 0.0 in Stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO Scheduler. Dagscheduler:resultstage 1 (count at Nativemethodaccessorimpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO Scheduler. tasksetmanager:finished task 0.0 in Stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO Scheduler. Taskschedulerimpl:removed TaskSet 1.0, whose tasks has all completed, from pool
17/10/03 06:00:35 INFO Scheduler. Dagscheduler:job 0 Finished:count at Nativemethodaccessorimpl.java:-2, took 6.451521 s
OUT[12]: 129761

In [13]:

[Spark] [Python] Example of Spark accessing MySQL, generating dataframe:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.