Title Description
The existing data is as follows:
K1-1-A-X0001=/common/gom/r00/xml/gom0101.xmlK3-2-B-W4565=/common/gom/r00/xml/gom0404.xmlK4-1-B-K0090=/common/gom/r00/xml/gom0403.xmlK2-3-A-W0004=/common/gom/r00/xml/gom0103.xml......
Where the first column is the ID, no duplicates, and the second column is the address. The data is for 100,000 articles.
Please design data structures and algorithms to provide query services. For example: input k4-1-b-k0090 get the corresponding address/common/gom/r00/xml/gom0403.xml.
Goal
The query is the fastest and consumes the least memory.
Requirements
Do not use relational databases, use file systems for storage, or unlimited programming languages.
Program and Code
I put all the code on GitHub:
Https://github.com/longjingjun/Programming_Challenge_Query
Use Eclipse to open this project.
Scenario 1
Use the property file in Java to store data, and use the properties class in Java to read the data.
-Filebuilder.java: Used to generate 100,000 data
-Finder.java: Query Code implementation
Scenario 2
Use MONGO db to store data and then query with MONGO DB interface.
-Mongodbbuilder.java: Used to insert 100,000 data into MongoDB
-Mongodbfinder.java: Query code implementation.
Note: Running this code requires the installation of MONGO DB. Please download the installation package here:
http://www.mongodb.org/
Scenario 3
With random files, the first row puts the total number of data, followed by storing the data in order from small to large. Queries are queries using the binary lookup algorithm.
-Ramdomfilebuilder.java: Generate 100,000 data according to the design and save to the file.
-Ramdomfinder.java: Query code implementation.
Scenario 4
Use B-tree to implement queries. I have no code for this scenario.
Tests and conclusions
100,000 data, the test results of the first three methods are as follows:
Implementation scenarios |
Test wheel |
Test results (MS) |
|
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
Normal prop-1 |
1 |
274 |
265 |
248 |
289 |
295 |
265 |
278 |
218 |
|
|
2 |
273 |
275 |
270 |
273 |
265 |
286 |
280 |
210 |
|
|
3 |
265 |
262 |
292 |
270 |
274 |
280 |
247 |
203 |
|
Normal prop-2 |
1 |
133 |
136 |
135 |
132 |
133 |
139 |
156 |
138 |
|
|
2 |
145 |
148 |
136 |
135 |
141 |
134 |
135 |
137 |
|
|
3 |
134 |
136 |
136 |
139 |
129 |
142 |
143 |
139 |
|
Mongodb |
1 |
122 |
As data increases, the time to query grows. |
|
2 |
127 |
|
|
|
|
|
|
|
|
|
3 |
115 |
|
|
|
|
|
|
|
|
Random File |
1 |
1 |
|
|
|
|
|
|
|
|
|
2 |
2 |
|
|
|
|
|
|
|
|
|
3 |
1 |
|
|
|
|
|
|
|
|
Another round of testing was conducted specifically for the B-tree implementation and the random file scheme, with 1 million test data. The test results are as follows:
1 million piece of data |
|
|
|
|
|
1 |
4 |
7 |
10 |
B-tree |
121 |
31 |
93 |
53 |
|
125 |
29 |
105 |
52 |
|
119 |
30 |
96 |
49 |
Random file |
1 |
3 |
2 |
2 |
|
2 |
2 |
2 |
1 |
|
1 |
2 |
2 |
1 |
Based on the test results, the quickest solution is to use random files. At the same time, this scheme has a limitation: the data stored in each row is fixed, and how to design each row data size is a very important problem.
Programming Challenges: Querying