Label: Spark SQL provides SQL query functionality on Big Data , similar to Shark's role in the entire ecosystem, which can be collectively referred to as SQL on Spark. Previously, Shark's query compilation and optimizer relied on hive, which made shark have to maintain a hiv
Spark SQL is a spark module that processes structured data. It provides a programming abstraction such as Dataframes. It can also be used as a distributed SQL query engine at the same time.DataframesDataframe is a distributed collection of data with column names. The equivalent of a table in a relational database or a
.
Built-in UDF (custom function)
Class SQL query, which is converted to mapreduce execution.
HIVEQL is not fully compatible with the SQL-92 standard:1) It supports multirow Insert function and CREATE TABLE function through select;2) Only basic indexing function is supported;3) does not support transactional and materialized view functions;4) Only limited sub
()//Save the processed data to a MySQL database using JDBC to become a table, note that here to use the user and not use username, because the system also has a username, will overwrite your user nameVal properties=NewProperties () properties.put ("User","Root") Properties.put ("Password","Root") Df.write.mode (savemode.overwrite) JDBC ("jdbc:mysql://localhost:3306/test","Test", properties)} } Iv. load and save operations. Objectsaveandloadtest {def main (args:array[string]): Unit={val conf=New
Label:
With data analysis using MapReduce or spark application, using hive SQL or spark SQL can save us a lot of code effort, while hive SQL or spark The various types of UDFs built into
With data analysis using MapReduce or spark application, using hive SQL or spark SQL can save us a lot of code effort, while hive SQL or spark The various types of UDFs built into SQL i
returns the year of a date, timestamp, date string, and timestamp string. SQL code
Eg:
SelectYear('2017-06-12')FromTest
Eg: Select Year ('1970-06-12 ') from test
Month Function
Syntax: Month (ARG)
The month function returns the month portion of a date, timestamp,
str)throwsException {returnstr.length (); }}, Datatypes.integertype); DataFrame Df_group= Sqlcontext.sql ("Select date,s,zzq123 (date) as zzq123 from Tmp_req");//UDF If no name is specified, the random namedf_group.show (); //1, register a complex user-defined aggregate functionSQLCONTEXT.UDF (). Register ("Zzq_agg",NewStringlen ());//Zzq_agg, simulating a similar count aggregation functionDataFrame
, which is the exact month. It's number 16th, over 15. All results for the next month 1stROUND (Sysdate)------------01-jun-16==============================================================Sql> Select round (sysdate+22, ' month ') from dual;----is now May 16, 22 days after July 7, 7th No 15, so the result is 16-07-01ROUND (Sysdate)------------01-jul-16==============================================================Sql
Ggregate Functions (Transact-SQL) aggregation functionsAvg: Averaging pointsCount: Number of calculationsMax: Ask for maximum valueMin: Find minimum valueSum: SumFind Average heightSelect AVG (Shengao) from studentAs: Added column nameSelect AVG (Shengao) as average height from studentFind the numberSelect COUNT (*) from student where Xingbie = ' Male 'Ask for maximumSelect MAX (Shengao) from studentTo find
Spark SQL requires several "tables" to be present, either from hive or from a temporary table. If the table is from hive, its schema (column name, column type, and so on) has been determined at the time of creation, normally we can parse the data in the table directly from spark SQL, and if "table" comes from "temporal
Tags: SHUF implementation. So data operator class yarn spark SQL Boost performanceSelection of storage formats:Do you take row or column-type storage? The number of times a column store is written, and the loss time is much faster when queriedselection of compression formats:Consider the compression speed and the compressed file of the partition compression can be less storage space, improve data transfer s
Operating EnvironmentCluster Environment: CDH5.3.0The specific jar versions are as follows:Spark version: 1.2.0-cdh5.3.0Hive Version: 0.13.1-cdh5.3.0Hadoop version: 2.5.0-cdh5.3.0Simple Java version of Spark SQL sample
Spark SQL directly queries JSON-formatted data
Custom
Products A table of changes in the price of goods, orders, records each purchase of goods and datesMatch orders and products based on a non-equivalent join in Spark SQL, counting the prices of the items in each orderSlow-changing commodity price listWangzai milk, there was a price change.scala> val products = sc.parallelize(Array( | ("旺仔牛奶", "2017-01-01", "2018-01-01", 4), | ("旺仔牛奶", "2018-01-02
The pitfall encountered by date functions such as new date in JavaScript in safari, datesafari
Recently, when I was working on mobile Web, Chrome debugging was successful on the PC, but a strange problem occurred when I tested the iPhone. After a series of debugging, it is found that there is a problem with the date.
/** Spark SQL Source Analysis series Article */In the world of SQL, in addition to the commonly used processing functions provided by the official, extensible external custom function interface is generally provided, which has become a fact of the standard.In the previous article on the core process of
Tags: word common sql interval according to name Val Sina ICACurdate () or current_date () returns the current dateCurtime () or Current_time () returns the current timeDate_add (date,interval int keyword) returns the date plus the result of the interval int (int must be formatted according to the keyword), such as: Selectdate_add (Current_date,interval 6 MONTH);
Common SQL Server date comparison and date query statements
In SQL Server, you may need to obtain the current date and calculate some other dates. For example, your program may need to determine the first or last day of a month. Most of you probably know how to divide a
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.