Spark SQL Statistics which constellation of people most like to open room

Last Update:2016-12-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, it's interesting to see an example that has been specially reproduced.

Analyze data using Spark SQL

In this step, we use Spark SQL to group the 2000W data by constellation to see which constellation people prefer to open the room.
Of course, using pure spark can also do our analysis, because the actual spark SQL is ultimately done with Spark.
The actual test found that this data is not fully adhere to a schema, some of the data format is not correct, some data items are also wrong. In the code we want to eliminate that disturbing data.
Anyway, we use this data tester to play, and there is no strict requirements to organize which error data.

Look at the code first:

val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Customer(name: String, gender: String, ctfId: String, birthday: String, address: String) val customer = sc.textFile( "/mnt/share/2000W/*.csv" ).map(_.split( "," )).filter(line => line.length > 7 ).map(p => Customer(p( 0 ), p( 5 ), p( 4 ), p( 6 ), p( 7 ))).distinct() customer.registerTempTable( "customer" ) def toInt(s: String):Int = { try { s.toInt } catch { case e:Exception => 9999 } } def myfun(birthday: String) : String = { var rt = "未知" if (birthday.length == 8 ) { val md = toInt(birthday.substring( 4 )) if (md >= 120 & md <= 219 ) rt = "水瓶座" else if (md >= 220 & md <= 320 ) rt = "双鱼座" else if (md >= 321 & md <= 420 ) rt = "白羊座" else if (md >= 421 & md <= 521 ) rt = "金牛座" else if (md >= 522 & md <= 621 ) rt = "双子座" else if (md >= 622 & md <= 722 ) rt = "巨蟹座" else if (md >= 723 & md <= 823 ) rt = "狮子座" else if (md >= 824 & md <= 923 ) rt = "处女座" else if (md >= 924 & md <= 1023 ) rt = "天秤座" else if (md >= 1024 & md <= 1122 ) rt = "天蝎座" else if (md >= 1123 & md <= 1222 ) rt = "射手座" else if ((md >= 1223 & md <= 1231 ) | (md >= 101 & md <= 119 )) rt = "摩蝎座" else rt = "未知" } rt } sqlContext.registerFunction( "constellation" , (x:String) => myfun(x)) var result = sqlContext.sql( "SELECT constellation(birthday), count(constellation(birthday)) FROM customer group by constellation(birthday)" ) result.collect().foreach(println)

In order to use Spark SQL, you need to introduce sqlContext.createSchemaRDD . A core object of Spark SQL is SchemaRDD . The above import can implicitly convert an RDD to Schemardd.
Then we define the Customer class, which is used to map the data of each row, we use only a little information, such as address, email and so on.
Next, read all the CSV files from the 2000W folder, create an RDD and register the customer.
Since none of the built-in functions can map birth together as constellations, we need to define a mapping function myfun and register it in Sparkcontext. This allows us to use this function in SQL statements. Similarly, the length function of the string is not currently supported, and you can add one of these functions. Since some dates are incorrect, all of them have added an "unknown" constellation in particular. There may be two kinds of error data, one is a date error, but this line is not formatted, and the other fields are mapped to the date of birth. We ignore them when we analyze them.

Then execute a grouped SQL statement. The SQL statement query result type is Schemardd, and it inherits all the actions of the RDD.
Finally, the results are printed out.

[双子座,1406018][双鱼座,1509839][摩蝎座,2404812][金牛座,1406225][水瓶座,1635358][巨蟹座,1498077][处女座,1666009][天秤座,1896544][白羊座,1409838][射手座,1614915][未知,160483][狮子座,1613529]

It seems that the scorpion people like to open a room, obviously more than the other constellations of people.

We can also analyze the ratio of men to women in the open room:

123	`......result = sqlContext.sql("SELECT gender, count(gender) FROM customer where gender = ‘F‘ or gender = ‘M‘ group by gender")result.collect().foreach(println)`

The results show that the number of men and women who open rooms is about 2:1

12	`[F,6475461][M,12763926]`

Spark SQL Statistics which constellation of people most like to open room

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark SQL Statistics which constellation of people most like to open room

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark SQL Statistics which constellation of people most like to open room

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support