This article mainly introduces how to test the competition conditions for accessing the same data in Python, and explores the problem of priority access in the case of multithreading or multi-process, if you have multiple processes or threads accessing the same data, the competition condition is a threat. This article discusses how to test the competitive conditions.
Incrmnt
You work in a hot new company named "Incrmnt". The company only does one thing and does a good job.
You can display a global counter and a plus sign. you can click the plus sign to add one to the counter. This is too simple and addictive. There is no doubt that this is the next big thing.
Investors are scrambling to enter the board, but you have a big problem.
Competitive conditions
In your internal test, Abraham and Belinda were so excited that everyone clicked the plus button 100 times. Your server log shows 200 requests, but the counter shows 173. Obviously, some requests are not added.
Let's leave the news that "Incrmnt has become a zombie" behind you and check the code (all the code used in this article can be found on Github ).
# incrmnt.pyimport db def increment(): count = db.get_count() new_count = count + 1 db.set_count(new_count) return new_count
Your Web server uses multiple processes to process traffic requests, so this function can be executed simultaneously in different threads. If you do not have a good time, it will happen:
# Concurrent execution of thread 1 and thread 2 in different processes # place them side by side here for demonstration purposes # Separate them vertically, to explain what code is executed at each time point # Thread 1 (Thread 1) # Thread 2 (Thread 2) def increment (): # get_count returns 0 count = db. get_count () # get_count returns 0 again count = db. get_count () new_count = count + 1 # set_count called with 1 db. set_count (new_count) new_count = count + 1 # set_count called with 1 again db. set_count (new_count)
Therefore, although the count is increased twice, only 1 is increased.
You know that you can modify this code to ensure thread security, but before you do that, you still want to write a test to prove that the competition exists.
Repeat competition
Ideally, the test should reproduce the above scenario as much as possible. The key factors for competition are:
? The two get_count calls must be executed before the two set_count calls, so that the counts in the two threads have the same value.
When set_count is called, it does not matter when it is executed, as long as they are all called after get_count.
For simplicity, we try to reproduce this nested scenario. Here the whole Thread 2 is executed after the first get_count call of Thread 1:
# Thread 1 # Thread 2def increment(): # get_count returns 0 count = db.get_count() def increment(): # get_count returns 0 again count = db.get_count() # set_count called with 1 new_count = count + 1 db.set_count(new_count) # set_count called with 1 again new_count = count + 1 db.set_count(new_count)
Before_after is a library that provides tools to help reproduce this situation. It can insert arbitrary code before or after a function.
Before_after depends on the mock library, which is used to supplement some functions. If you are not familiar with mock, I suggest reading some excellent documents. A particularly important part of this document is Where To Patch.
We hope that after Thread 1 calls get_count, all Thread 2 will be executed, and Thread 1 will be resumed.
The test code is as follows:
# test_incrmnt.py import unittest import before_after import dbimport incrmnt class TestIncrmnt(unittest.TestCase): def setUp(self): db.reset_db() def test_increment_race(self): # after a call to get_count, call increment with before_after.after('incrmnt.db.get_count', incrmnt.increment): # start off the race with a call to increment incrmnt.increment() count = db.get_count() self.assertEqual(count, 2)
After the first get_count call, we use the before_after context manager to insert another increment call.
By default, before_after calls the after function only once. In this special case, this is very useful, because otherwise the stack will overflow (increment calls get_count, get_coun t also calls increment, and increment calls get_count ...).
This test failed because the count is equal to 1 rather than 2. Now we have a failed test that replays the competition condition and fix it together.
Prevent competition
We will use a simple lock mechanism to reduce competition. This is obviously not an ideal solution. a better solution is to store data using atomic updates-but this method can better demonstrate the role of before_after in testing multi-threaded applications.
Add a new function in incrmnt. py:
# incrmnt.py def locking_increment(): with db.get_lock(): return increment()
It ensures that only one thread reads and writes the count at a time. If a thread tries to obtain the lock while the lock is maintained by another thread, the CouldNotLock exception will be thrown.
Now we add such a test:
# test_incrmnt.py def test_locking_increment_race(self): def erroring_locking_increment(): # Trying to get a lock when the other thread has it will cause a # CouldNotLock exception - catch it here or the test will fail with self.assertRaises(db.CouldNotLock): incrmnt.locking_increment() with before_after.after('incrmnt.db.get_count', erroring_locking_increment): incrmnt.locking_increment() count = db.get_count() self.assertEqual(count, 1)
Now, at the same time, only one thread can increase the count.
Reduce Competition
We have another problem here. through the above method, if two requests conflict, one will not be registered. To alleviate this problem, we can reconnect the increment to the server (a simple method is to use something similar to funcy retry ):
# incrmnt.py def retrying_locking_increment(): @retry(tries=5, errors=db.CouldNotLock) def _increment(): return locking_increment() return _increment()
When we need more large-scale operations than this method, we can transfer increment to our database as an atomic update or transaction, let them take responsibility wherever they are away from our applications.
Summary
There is no competition for Incrmnt now, so people can happily click for a whole day without worrying that they will not be included.
This is a simple example, but before_after can be used for more complex competition conditions to ensure that your function can correctly handle all situations. Being able to test and reproduce the competitive conditions in a single-threaded environment is critical, allowing you to be more sure that you are dealing with the competition conditions correctly.