Implementing a SQL execution engine with Scala-(top)

Source: Internet
Author: User

Preface

In real-time computing, the raw data is usually collected from the queue, which is usually a Java bean in memory, and when the data is collected, the data is usually landed on the database for subsequent ETL use. To give a simple example, for a game, to count a game, a server's login registration

Events, the Java bean corresponding to the original data might be the same:

 Public class Event {    private  String userName;     Private String game;     Private String server;     Private String event;}
Event

When the amount of data is too large, there is usually no way to do a few real-time statistical operations, such as statistics according to the game and server groups to count the number of landings, the corresponding SQL roughly as follows:

Select Count (user_namefromgroupbywhere='Login  '

When there is a SQL execution engine, can be in memory for a batch of collected data to perform SQL computation, no doubt can calculate the results in real-time, in addition, because SQL is real-time input, the program can be more flexible.

For example, a collection of data can be replaced into a list<map<string,object>> form of data structure, through the SQL execution engine, execute a specific SQL, to obtain the results (also a list<map< string,object>> form of data structure), demo as follows

----------------[Username:user1,game:lol,server:s1,event:login][Username:user2,game:dota2,server:s2,event:register][Username:user3,game:lol,server:s2,event:login][Username:user4,game:dota2,server:s3,event:register][Username:user5,game:lol,server:s10,event:login][Username:user6,game:dota2,server:s1,event:login][Username:user7,game:lol,server:s1,event:login][Username:user8,game:lol,server:s1,event:login][Username:user9,game:lol,server:s1,event:login]---------------- Select Count(*) asLoginnum, Game,server fromEventGroup  byGame,serverwhereEvent='Login' ----------------[Loginnum:1,game:lol,server:s2][loginnum:4,game:lol,server:s1][Loginnum:1,game:lol,server:s10][loginnum:1,game:dota2,server:s1]----------------

parsing

This SQL execution engine only supports a very small subset of the SQL syntax, so I'm more inclined to call it a Sql-like DSL (Domain specific language-specific domain language), a lot of discussion about DSLs, I recommend two books, One is Uncle Martin's domain specific Language, and the other one is DSL in

Action Scala was chosen because of the built-in support for DSLs in the Scala language, which makes it easy to implement one of your own parser, which allows you to parse your DSL script (here is the SQL statement) and get the intermediate result you want. Usually we call the intermediate result an AST (Abstract syntax tree), similar to

Select {...} from {...} GROUP by {...} where {...} Order by{...} limit {...} form of the SQL statement, I convert it to the following type of AST.

The parser's entry is

def select:parser[selectstmt] = "Select" ~> projectionstatements ~ fromstatements ~ opt (groupstatements) ~ opt (whereEx PR) ~ opt (orderbyexpr) ~ opt (limit) ~ opt (";") ^^ {    case P ~ F ~ g ~ W ~ o ~ l ~ End = Selec TSTMT (P, F, W, G, O, L)  }

Among them, fromstatements,groupstatements,whereexpr, etc. there is a separate parser, through the parser combinators (parser combo) already provided in Scala, for example (~>,~,opt () ...) And so on, a separate parser can be combined to get a more complex parser, similar to Lego bricks, you write an analytic

Parsera, can only parse a particular piece of text, the pattern of this paragraph of text we use Patterna to express. By combining the sub-rep1sep (",", Parsera), you get a new parser that the parser can parse Partern = Patterna[,patterna][,patterna][,patterna] ...

For example, the GROUP BY clause in an SQL statement, regardless of having syntax, is the approximate format for group by [TableName.] Coulumn1,[tablename.] Coulumn1,[tablename.] Coulumn1 visible [TableName.] Coulumn1 This format of text, can be the basic pattern, so you can write a parser to parse the text in this format:

def selectident:parser[sqlproj] = {    ~ opt ("." ~> ident) ^^ {      case table ~ Some (b:s Tring) = fieldident (Option (table), b)      case Column ~ None = fieldident (none, column) c9/>}  }

This function in the ident worth of identifiers, opt () that can or does not, then the parser parsing the text can have the following form: identifier. identifier | identifier, then the parser that resolves the group by phrase can be obtained by combining the REP1SEP:

def groupstatements:parser[sqlgroupby] = "group" ~> "by" ~> Rep1sep (Selectident, ",") ^^ {    CA Se keys = sqlgroupby (keys)  }

The other part of the SQL sentence parsing is probably the case, the entire project code on GitHub . The next article, after getting the AST, how to execute, get the desired result.

Implementing a SQL execution engine with Scala-(top)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.