Nothing recently. Do a database project, requires the implementation of a simple database, to meet a number of specific queries, here is the main introduction of our implementation process, the code in the Ithub, can be seen here. It is said that Python is running slowly, but because of the time and the workload, we have chosen to implement Python efficiently.
First, the basic requirements
1, Design storage mode
The measured data size is 1.5GB, and the largest table has 6,001,215 records. Minimize the number of I/O, reduce disk occupancy space.
2. Implement and optimize group By,order by
Group by aggregation and sequencing of large tables to improve query efficiency
3, implementation and optimization of top K
High efficiency to achieve top K
4. No requirement for insert
Second, the overall design
Because the request is mainly to optimize the query, insert deletion is not considered, can be used in the column-type database, play the advantages of the column-type database.
The following are the main frameworks (ignoring the process of building and inserting data), the blue arrows in the figure indicate the generation stream, and the orange arrow indicates the execution stream.
The whole process is in fact the first to build the index, from the bottom to execute the query, details in the detailed design of the story.