Sun Qiqung accompany you to learn the--spark regularization and Sparksql

Last Update:2015-04-16 Source: Internet

Author: User

Tags log log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This blog program is to read the files in the Hadoop HDFs, using the regular dissolve the data in the specified format, and then loaded into the Sparksql database.

Regularization if you're not sure about it, see Regular expressions 30-minute Getting Started tutorial

The contents of the file are:

CREATE TABLE IF not EXISTS ' Rs_user ' (
' ID ' mediumint (8) unsigned not NULL auto_increment,
' UID ' mediumint (8) unsigned DEFAULT NULL,
' URL ' varchar (255) DEFAULT NULL,
The ' title ' varchar (1024x768) DEFAULT NULL,
PRIMARY KEY (' id ')
) Engine=innodb DEFAULT CHARSET=GBK auto_increment=59573;

INSERT into ' rs_user ' (' id ', ' uid ', ' url ', ' title ') VALUES
(1, 269781, ' http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid=721360 ', ' [sports] [other][2002-year Asian Games badminton men's singles final Tawfik vs Li Yu one] [RMVB] [Mandarin] '),
(2, 256188, ' http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid=721360 ', ' [sports] [other][2002-year Asian Games badminton men's singles final Tawfik vs Li Yu one] [RMVB] [Mandarin] '),

Package com.spark.firstApp

Import Org.apache.spark.SparkContext
Import Org.apache.spark._
Import org.apache.log4j. {level, Logger}

Object Hellospark {
Case class Person (id:int,uid:string,url:string,title:string)
def main (args:array[string]): Unit = {
Logger.getlogger ("Org.apache.spark"). SetLevel (Level.warn)
Logger.getlogger ("Org.eclipse.jetty.server"). SetLevel (Level.off)//Remove log log
Val conf = new sparkconf (). Setappname ("Hellospark")
Val sc = new Sparkcontext (conf)
Val sqlcontext = new Org.apache.spark.sql.SQLContext (SC)
Import Sqlcontext.implicits._
Val r = "" "\d*, \d*, ' http://[a-z/.? &=0-9]* ', ' [^ ']+ ' "" ". R
Val data=sc.textfile ("/user/root/home/rs_user.sql"). Map (s=>s.mkstring).
Map (Z=>r.findallin (z). toList). Filter (_.length>0). Map (_.head.split (","). ToList)
Val People=data.map (P=>person (P (0). Toint,p (1), P (2), P (3)). TODF ()
People.registertemptable ("People")
Val teen=sqlcontext.sql ("Select title from people where uid= ' 199988 '")
Teen.map (t = "title:" + t). Collect (). foreach (println)
Sc.stop ()
}

}

Submit a task:

[Email protected]:/# spark-submit--master spark://192.168.0.10:7077--class Com.spark.firstApp.HelloSpark-- Executor-memory 100m/root/ideaprojects/firstsparkapp/out/artifacts/firstsparkappjar/firstsparkappjar.jar

Output Result:

Spark assembly have been built with Hive, including DataNucleus jars on Classpath
15/04/15 21:53:56 INFO slf4j. Slf4jlogger:slf4jlogger started
15/04/15 21:53:56 INFO remoting:starting Remoting
15/04/15 21:53:57 INFO remoting:remoting started; Listening on addresses: [Akka.tcp://[email protected]:52584]
15/04/15 21:53:57 INFO Server. Server:jetty-8.y.z-snapshot
15/04/15 21:53:57 INFO Server. abstractconnector:started [Email protected]:54183
15/04/15 21:54:03 INFO Server. Server:jetty-8.y.z-snapshot
15/04/15 21:54:03 INFO Server. abstractconnector:started [Email protected]:4040
15/04/15 21:54:12 WARN util. sizeestimator:failed to check whether Usecompressedoops is set; Assuming yes
15/04/15 21:54:21 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
15/04/15 21:54:21 WARN Snappy. Loadsnappy:snappy Native Library not loaded
15/04/15 21:54:21 INFO mapred. Fileinputformat:total input paths to process:1
Title: [' [Other] [Video][lol][Smile Curl January 13 Double row three occasions set] [smile curl commentary][mp4]
Title: [' [Other]] [video][lol][smz24 commentary: S5 Blind monk Li Qing's full Gank Tour _ HD][smz24 commentary][mp4]

Sun Qiqung accompany you to learn the--spark regularization and Sparksql

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More