Recently, I participated in a company project and planned to quickly respond to large-scale queries on the online platform. The estimated total data volume is about 2-3 billion records, the database concurrency is about 1500 per second, and the concurrency is about 3000 per second after one year, after a difficult choice between Redis and mongodb, I decided to use mongodb, mainly depending on its parallel scalability and Map/Reduce on GridFS. The estimated number of concurrent queries per second during peak hours is between-after the project is launched.
In fact, I personally like Redis, and its concurrent query capability and speed beyond memcached are very exciting. However, its persistence and cluster scalability are not suitable for business needs, so I finally chose mongodb.
The following is the code and result of the mongodb test. Although the company uses CentOS, as I am a supporter of FreeBSD, I tested the results on FreeBSD and CentOS.
The database writing program is copied online, and the query program is self-written.
Write database program
#! /Usr/bin/env python
From pymongo import Connection
Import time, datetime
Connection = Connection ('1970. 0.0.1 ', 127)
Db = connection ['hawaii']
# Time Recorder
Def func_time (func ):
Def _ wrapper (* args, ** kwargs ):
Start = time. time ()
Func (* args, ** kwargs)
Print func. _ name __, 'run: ', time. time ()-start
Return _ wrapper
@ Func_time
Def insert (num ):
Posts = db. userinfo
For x in range (num ):
Post = {"_ id": str (x ),
"Author": str (x) + "Mike ",
"Text": "My first blog post! ",
"Tags": ["mongodb", "python", "pymongo"],
"Date": datetime. datetime. utcnow ()}
Posts. insert (post)
If _ name _ = "_ main __":
# Set the cycle to 5 million times
Num = 5000000
Insert (num)
Query Program
#! /Usr/bin/env python
From pymongo import Connection
Import time, datetime
Import random
Connection = Connection ('1970. 0.0.1 ', 127)
Db = connection ['hawaii']
Def func_time (func ):
Def _ wrapper (* args, ** kwargs ):
Start = time. time ()
Func (* args, ** kwargs)
Print func. _ name __, 'run: ', time. time ()-start
Return _ wrapper
# @ Func_time
Def randy ():
Rand = random. randint (1,5000000)
Return rand
@ Func_time
Def mread (num ):
Find = db. userinfo
For I in range (num ):
Rand = randy ()
# Random number query
Find. find ({"author": str (rand) + "Mike "})
If _ name _ = "_ main __":
# Set the cycle to 1 million times
Num = 1000000
Mread (num)
Delete a program
#! /Usr/bin/env python
From pymongo import Connection
Import time, datetime
Connection = Connection ('1970. 0.0.1 ', 127)
Db = connection ['hawaii']
Def func_time (func ):
Def _ wrapper (* args, ** kwargs ):
Start = time. time ()
Func (* args, ** kwargs)
Print func. _ name __, 'run: ', time. time ()-start
Return _ wrapper
@ Func_time
Def remove ():
Posts = db. userinfo
Print 'count before remove: ', posts. count ();
Posts. remove ({});
Print 'count after remove: ', posts. count ();
If _ name _ = "_ main __":
Remove ()
Result set
|
Insert 5 million |
Random Number query 1 million |
Delete 5 million |
CPU usage |
| CentOS |
394 s |
28 s |
224 s |
25-30% |
| FreeBSD |
431 s |
18 s |
278 s |
20-22% |
CentOS insertion and deletion won; FreeBSD played the advantage of UFS2 and won the read. Because it is used as a query server, fast reading speed is an advantage, but I am not a leader. If I say no, I will eventually get CentOS.
During the test, we have been using mongostat monitoring. The number of concurrent jobs is similar to that of the two systems. The insert concurrent query is also tested, but the result is similar. The sum of the concurrency values is 15000-25000 per second. The performance is still very good.
However, it is true that the insertion performance decreases significantly in the case of large data volumes. CentOS Tests 50 million data insertion, which takes nearly 2 hours. It takes more than 6300 seconds. The data insertion speed is almost 5 million slower than that of 50%. However, the query speed is almost the same.
The test results are provided as a reference for beginners.
However, this test is not fair. FreeBSD configuration is worse.
CentOS 16 GB memory, Xeon5606 two 8 cores. Dell brand machine.
FreeBSD 8 GB memory, Xeon5506 one 4-core. There is no brand 1U.
In the same environment, I think FreeBSD has better performance.