Because I was early in the design and optimization of Oracle as the main RDBMS, so almost even if the single table more than 5000w, more than 3 million of the table to do any complex statistical and wind control calculation has not been a performance problem. Now full MySQL as the main line or open source RDBMS as the main line, as the system gradually to the development of SaaS, have to consider the subsequent large data volume when the statistical performance issues, today collated under the current more mainstream RDBMS-based architecture of open source DW/DSS engine mainly as follows.
As for why using SQL-based approach, mainly from the development cost considerations, because the human cost is one of the main cost of software companies, and most of the developers are almost familiar with SQL, and the SQL community and ecological support is abundant (in fact, in addition to the Hadoop community in the beginning, In fact, slowly also have to provide a class of SQL interface, such as Impala, Spark, and so on, in fact, is a trick of the amateur, I guess the original goal may be to get an open source DW engine bar, but eventually to the commercialization, in addition to the Java and Linux community, It seems that there really is no real nonprofit community, because there are not so many experts willing to volunteer.
At the moment, the more mainstream RDBMS-based open source Dw/dss (Community Edition) engine has the following main features:
Infinidb
Infobright
PostgreSQL
MonetDB
MARIADB Columnstore
Greenplum-db
Open source DW/DSS engine list for RDBMS architecture