C + + Process set Experiment item two: Making a simple SQL system with regular expressions

Last Update:2018-05-13 Source: Internet

Author: User

Tags first string

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article will be as simple as possible to summarize how to build up the framework of this SQL system.

A regular Expression parsing statement

You first need to use the C + + Regex library:

#include <regex>

It is recommended to learn the basic syntax of regular expressions in the novice tutorials.

Then, create a new expression. Assume that the statement you are parsing now is create TABLE (Col1,col2,...) to filename

Regex R ("CREATE TABLE \ \ ((. +) \ \) to ([^]+)");

Note that in C + + you also want to escape a backslash, so the general like \w, \ (such symbols are to be written in \\w, \ \ (The form.

It then matches a string object with Regex_match.

string cmd="CREATE TABLE (name,score) to Rec.txt"; smatch m; if (Regex_match (cmd,m,r)) {    //... }else"notmatch! " << Endl;

If the content of CMD matches the R successfully, then the result will be saved inside M.

So how to use the content of M? For example, the contents of the output m (for the auto inside for this usage is also c++11)

 for (Auto x:m) cout << m << Endl; // Result: // CREATE TABLE (name,score) to Rec.txt     //That is M.STR (0), matching the entire formula //name,score                               //That is M.STR (1), that is, the information saved with n parentheses , which are saved in 1~n position in M //rec.txt/                                  m.str (2)

After you save M.str (1), how do you deal with it? What if this is a long string of Col1,col2,col3,..., col99? This is going to use the regex_search.

Regex C ("([^,]+)"); string cols=m.str (1);  while // Make sure the file path information for M is saved     // save M.str (0) info    ... cols=M.suffix (). STR ();       // m.str (0): col1     // M.suffix ():, Col2,col3,..., col99}

Regex_search will look for the first string that matches the regular expression in the entire strings. Then, the previous part of the string is saved in M.prefix (), and the later part is saved in M.suffix (). As long as you have found the string to intercept, you can then search in the back of the string.

Let me give you an example:

Regex R ("glim"); string s="starlightglimmer"; smatch m;regex_search (s,m,r); // m.str (0) = = "Glim" // M.prefix () = = "Starlight" // M.suffix () = = "Mer"

As a result, it is easy to analyze various statements with the flexibility to use regular expressions. Even if there are multiple optional commands, such as SELECT * from table [WHERE col = name] [ORDER by Col DESC] [to file], you can split the preceding required command with the following three optional arguments into four regular expressions, then use the Regex _search completes the analysis of various commands.

II. Data structure and sequencing

Note that this part is not so generic because of the differences in data structures. First, a table is two-dimensional, which allows you to use a vector vector to store the table. Specifically I do this:

struct column {    vector<string> item;     string name;}; class Table {public:    vector<column> col;    string TableName;    Table (string);};

Table is actually used as a struct, so it is public.

The number of rows in a table can be learned from one of its joins, and you can require the table name to be the same as the file name to save a string variable.

The string of the constructor is convenient for setting tablename, and you can also take it out and write it as a struct.

In addition, in memory, you do not have to store any static table: After all, it is read from a file, just make sure that the table you want to manipulate has been recorded in a vector<string> tablelist place by the CREATE table.

So, how do you operate on a row? You can usually use subscript to operate, but how to solve the sort? Here's a thought:

For example, these columns exist: Name, score, note

To sort the entire table score, you can use Pair<int to string> such a data structure.

 // table t (...)  //  ...  string  tarcol= " score   " ;vector  <pair<int , string  >> tar;  for   if  (C.name==tarcol" { for  (int  I =0 ; i<c.item.size (); I++) {Tar.push_back (make        _pair (I,c.item[i])); }    }}

Then use the <algorithm> comes with the sort, with the custom CMP function as well:

BOOL cmp (constconst pair &b) {    int res = A.second.compare ( B.second);     if 0 return true ;     Else return false ;}

Then you get a sort of pair sequence. The number of a pair is an index.

Put the current table row by row, according to the order specified by the index into a temporary new table (for example, the sequence of the first index number is 2, the second row of the content into the table).

Finally, copy the new table to the source table and you're done.

Third, the file read

Can consider with FStream, first with getline, and then flow input. Specific operation:

#include <fstream>intReadtable (Table &t) {Ifstreaminch; Chara[ -]; stringLine ;    Column TMPC; inch. Open ((T.tablename +". txt"). C_STR ()); if(!inch. Is_open ()) {        //printf ("Doc is not exist...\n");        return-1; }    inch. Getline (A,299); Line=A;    Smatch m;  while(Regex_search (line, M, Dividespace)) {Tmpc.name= M.str (1);        T.col.push_back (TMPC); Line=m.suffix (). STR (); }     while(!inch. EOF ()) {         for(Auto &C:t.col) {            inch>>Line ;        C.item.push_back (line); }    }     for(Auto &c:t.col) c.item.pop_back ();//Delete Invalid line    inch. Close (); return 1;}

It is important to note that the general reading will be read more than one line, this time to remove the multi-read line. In addition, regular expressions are left to the reader to write for themselves.

The output is naturally easy. This way, you can read the table from the file each time, save the table, and then exit.

Iv. drawing of Table dividing line

This is not a difficult problem, after all, just use a vector<int> to record the length of the longest string in the next column, you can easily draw.

But here's a pit: Under Linux, a Chinese character occupies three lengths and then actually shows only two. Therefore, when calculating the length, some processing is needed.

Specifically, this can be trickery:

int callength (string  s) {    double len=0;      for (Auto ch:s) {        if(0<=ch| | ch<=127) len+=1;         Else len+=2.0/3;    }     return (int) len;

Above. Thanks for reading.

C + + process set Experiment Item Two: Make a simple SQL system with regular expression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More