Preface
In real-time computing, the raw data is usually collected from the queue, which is usually a Java bean in memory, and when the data is collected, the data is usually landed on the database for subsequent ETL use. To give a simple example, for a game, to count a game, a server's login registration
Events, the Java bean corresponding to the original data might be the same:
Public class Event { private String userName; Private String game; Private String server; Private String event;}
Event
When the amount of data is too large, there is usually no way to do a few real-time statistical operations, such as statistics according to the game and server groups to count the number of landings, the corresponding SQL roughly as follows:
Select Count (user_namefromgroupbywhere='Login '
When there is a SQL execution engine, can be in memory for a batch of collected data to perform SQL computation, no doubt can calculate the results in real-time, in addition, because SQL is real-time input, the program can be more flexible.
For example, a collection of data can be replaced into a list<map<string,object>> form of data structure, through the SQL execution engine, execute a specific SQL, to obtain the results (also a list<map< string,object>> form of data structure), demo as follows
----------------[Username:user1,game:lol,server:s1,event:login][Username:user2,game:dota2,server:s2,event:register][Username:user3,game:lol,server:s2,event:login][Username:user4,game:dota2,server:s3,event:register][Username:user5,game:lol,server:s10,event:login][Username:user6,game:dota2,server:s1,event:login][Username:user7,game:lol,server:s1,event:login][Username:user8,game:lol,server:s1,event:login][Username:user9,game:lol,server:s1,event:login]---------------- Select Count(*) asLoginnum, Game,server fromEventGroup byGame,serverwhereEvent='Login' ----------------[Loginnum:1,game:lol,server:s2][loginnum:4,game:lol,server:s1][Loginnum:1,game:lol,server:s10][loginnum:1,game:dota2,server:s1]----------------
parsing
This SQL execution engine only supports a very small subset of the SQL syntax, so I'm more inclined to call it a Sql-like DSL (Domain specific language-specific domain language), a lot of discussion about DSLs, I recommend two books, One is Uncle Martin's domain specific Language, and the other one is DSL in
Action Scala was chosen because of the built-in support for DSLs in the Scala language, which makes it easy to implement one of your own parser, which allows you to parse your DSL script (here is the SQL statement) and get the intermediate result you want. Usually we call the intermediate result an AST (Abstract syntax tree), similar to
Select {...} from {...} GROUP by {...} where {...} Order by{...} limit {...} form of the SQL statement, I convert it to the following type of AST.
The parser's entry is
def select:parser[selectstmt] = "Select" ~> projectionstatements ~ fromstatements ~ opt (groupstatements) ~ opt (whereEx PR) ~ opt (orderbyexpr) ~ opt (limit) ~ opt (";") ^^ { case P ~ F ~ g ~ W ~ o ~ l ~ End = Selec TSTMT (P, F, W, G, O, L) }
Among them, fromstatements,groupstatements,whereexpr, etc. there is a separate parser, through the parser combinators (parser combo) already provided in Scala, for example (~>,~,opt () ...) And so on, a separate parser can be combined to get a more complex parser, similar to Lego bricks, you write an analytic
Parsera, can only parse a particular piece of text, the pattern of this paragraph of text we use Patterna to express. By combining the sub-rep1sep (",", Parsera), you get a new parser that the parser can parse Partern = Patterna[,patterna][,patterna][,patterna] ...
For example, the GROUP BY clause in an SQL statement, regardless of having syntax, is the approximate format for group by [TableName.] Coulumn1,[tablename.] Coulumn1,[tablename.] Coulumn1 visible [TableName.] Coulumn1 This format of text, can be the basic pattern, so you can write a parser to parse the text in this format:
def selectident:parser[sqlproj] = { ~ opt ("." ~> ident) ^^ { case table ~ Some (b:s Tring) = fieldident (Option (table), b) case Column ~ None = fieldident (none, column) c9/>} }
This function in the ident worth of identifiers, opt () that can or does not, then the parser parsing the text can have the following form: identifier. identifier | identifier, then the parser that resolves the group by phrase can be obtained by combining the REP1SEP:
def groupstatements:parser[sqlgroupby] = "group" ~> "by" ~> Rep1sep (Selectident, ",") ^^ { CA Se keys = sqlgroupby (keys) }
The other part of the SQL sentence parsing is probably the case, the entire project code on GitHub . The next article, after getting the AST, how to execute, get the desired result.
Implementing a SQL execution engine with Scala-(top)