#-*-coding:utf-8-*- fromPysparkImportSparkcontext, sparkconf fromPyspark.sqlImportSqlContextImportNumPy as Npappname="Jhl_spark_1" #name of your applicationmaster ="Local" #set up a standaloneconf = sparkconf (). Setappname (AppName). Setmaster (Master)#Configure Sparkcontextsc = Sparkcontext (conf=conf) SqlContext=SqlContext (SC) URL='JDBC:ORACLE:THIN:@127.0.0.1:1521:ORCL'TableName='V_JSJQZ'Properties={"User":"Xho","Password":"SYS"}DF=SQLCONTEXT.READ.JDBC (url=url,table=tablename,properties=properties)#Df=sqlcontext.read.format ("jdbc"). Option ("url", url). Option ("DBTable", tablename). Option ("User", "Xho"). Option ("Password", "sys"). Load ()#register as a table and use it in the SQL statementDf.registertemptable ("V_JSJQZ")#SQL can run on a rdds that has been registered as a tableDf2=sqlcontext.sql ("Select ZBLX,BS,JS,JG from V_JSJQZ t order by Zblx,bs") List_data=df2.topandas ()#Convert Format todataframeList_data = List_data.dropna ()#cleaning operation to remove data with null valuesList_data = Np.array (list_data). ToList ()#ToListRddv1=sc.parallelize (List_data)#parallelization of data into RddRddv2=rddv1.map (LambdaX: (x[0]+'^'+x[1],[[float (x[2]), float (x[3]]) RDDv3=rddv2.reducebykey (Lambdaa,b:a+b) sc.stop ()
The Pyspark is in the Python folder in the Spark installation folder and needs to be copied to Anoconda's lib under Site-packages.
There is no configuration of environment variables in the code, do not want to configure the environment variables in the local machine can go to search spark in Python environment variable configuration
Learn essays Pyspark JDBC operations Oracle Database