Concurrency problems: access to large data volumes and access to concurrent data volumes

Source: Internet
Author: User

Concurrency problems: access to large data volumes and access to concurrent data volumes

Today, I suddenly noticed this problem. I have read a lot from the Internet and have benefited a lot. Record it and review it later ~


I have encountered this problem in my work before. Two users operate on one record at the same time. User A queries A record and user B deletes this record, user A saves some values of A query record to other tables. This bug has also been plagued for A long time, because user A's method is particularly complicated and the execution time is relatively long, so the probability of this problem is still very high. The solution is to check this record before saving it. It is a lot better to solve this problem from the aspect of code logic, but it is always regarded as a permanent cure. After reading this article, I think there are better solutions.


Part1:

Requests with large concurrency and large data volume are generally divided into several situations:

1. A large number of users simultaneously search for and update different functional pages of the system

2. A large number of users simultaneously query the large data volume of the same table on the same page of the system

3. A large number of users simultaneously update the same page and table of the system.

 

In the first case, the general solution is as follows:

I. Server-level processing

1. Adjust the length of the IIS 7 Application pool queue

Changed from the default 1000 to 65535.

IIS Manager> ApplicationPools> Advanced Settings

Queue Length: 65535

2. Adjust the appConcurrentRequestLimit settings of IIS 7

Changed from the default 5000 to 100000.

C: \ windows \ system32 \ inetsrv \ appcmd.exe set config/section: serverRuntime/appConcurrentRequestLimit: 100000

You can view the settings in % systemroot % \ System32 \ inetsrv \ config \ applicationHost. config:

[Html]View plaincopy
  1. <ServerRuntime appConcurrentRequestLimit = "100000" type = "parmname" text = "parmname"/>

3. Adjust processModel> requestQueueLimit settings in machine. config.

Changed from the default 5000 to 100000.

[Html]View plaincopy
  1. <Configuration>
  2. <System. web>
  3. <ProcessModel requestQueueLimit = "100000" type = "codeph" text = "/codeph"/>

4. Modify the registry and adjust the number of TCPIP connections supported by IIS 7.

Changed from the default 5000 to 100000.

Reg add HKLM \ System \ CurrentControlSet \ Services \ HTTP \ Parameteris/v MaxConnections/t REG_DWORD/d 100000

After completing the above four settings, you can basically support 0.1 million simultaneous requests. If the access volume reaches more than 0.1 million, you can consider deploying programs and databases to multiple servers by functional module to share the access pressure. In addition, you can consider hardware and software load balancing. Hardware Server Load balancer can be directly implemented through smart switches. It has strong processing capabilities and has nothing to do with the system. However, it is expensive and difficult to configure. It cannot distinguish between the practice system and the status. Therefore, hardware Server Load balancer is suitable for a large number of devices, large traffic volumes, and simple applications. Software load balancing is based on systems and applications, which can better distribute loads according to the system and application conditions. High cost effectiveness. PCL load balancing software, Linux LVS software.

 

II. Database-level processing

When two users access a page at the same time, one user may update the records deleted by another user. Alternatively, the content of this record is modified by another user during the time when a user loads the page and clicks the delete button. Therefore, we need to consider the database lock issue.

The following three concurrent control policies are available:

ØDo nothing-If the concurrent user modifies the same record, the final submitted result will take effect (default action)

ØOptimistic Concurrency)-Assume that a concurrency conflict only occurs occasionally and does not occur in most cases. In case of a conflict, simply inform the user that the changes made by the user cannot be saved, because other users have modified the same record

ØParallel Concurrency (Pessimistic Concurrency)-Assume that concurrency conflicts often occur, and users cannot tolerate being notified that their modifications cannot be saved because of others' concurrent behaviors. Then, when a user starts to edit a record and locks the record, this prevents other users from editing or deleting the record until the record is completed and submitting their own changes.

When multiple users attempt to modify data at the same time, a control mechanism should be established to prevent the modifications of one user from adversely affecting the modifications made by other users simultaneously. The system that handles this situation is called "concurrency control ".

Concurrency Control Type

Generally, there are three common methods to manage concurrency in a database:

  • Concurrent Concurrency Control-This row is unavailable to users during the period from the time the record is retrieved to the time when the record is updated in the database.
  • Open Concurrency Control-This row is unavailable to other users only when the data is actually updated. The update checks the row in the database and determines whether any changes have been made. If you try to update a modified record, a concurrency conflict occurs.
  • The last update takes effect.-This row is unavailable to other users only when the data is actually updated. However, it does not compare updates with initial records, but only writes records, which may overwrite the changes made by other users since the last refresh record.
Concurrent write

Concurrent concurrency is usually used for two purposes. First, in some cases, there are a large number of contention for the same record. The cost of storing locks on data is less than the cost of rolling back and changing when a concurrency conflict occurs.

In the case that the record cannot be changed during the transaction process, the parallel concurrency is also very useful. The inventory application is a good example. Assume that a company representative is checking inventory for a potential customer. You usually need to lock the record until the order is generated, which usually marks the item as "ordered" and removes it from the available stock. If no order is generated, the lock is released so that other users who check the inventory can obtain an accurate available inventory count.

However, concurrent control cannot be performed in the disconnected structure. The connection can only be opened for reading or updating data, so the lock cannot be kept for a long time. In addition, applications that keep the lock for a long time cannot be scaled.

Open concurrency

In open concurrency, the lock is set and kept only when the database is accessed. These locks prevent other users from updating records at the same time. In addition to the exact Update time, data is always available. For more information, see open concurrency.

When an update attempt is made, the initial version of the changed row is compared with the existing row in the database. If the two are different, the update will fail and cause a concurrent error. At this time, you will use the created business logic to coordinate the two lines.

The last update takes effect.

When "the last update takes effect" is used, the initial data is not checked, but the update is only written to the database. Obviously, the following situations may occur:

  • User A obtains A record from the database.
  • User B obtains the same record from the database, modifies it, and writes the updated record back to the database.
  • User A modifies the "old" record and writes it back to the database.

In the above case, user A will never see the changes made by user B. If you plan to use the "Last Update effective" method of concurrency control, make sure this situation is acceptable.

Concurrency Control in ADO. NET and Visual Studio. NET

Because the data structure is based on disconnected data, ADO. NET and Visual Studio. NET use open concurrency. Therefore, you need to add business logic to solve problems with open concurrency.

If you choose to use open concurrency, you can use two common methods to determine whether the change has occurred: Version method (actual version number or date timestamp) and save all value methods.

Version Number Method

In the version number method, the record to be updated must have a column containing the date timestamp or version number. When this record is read, The timestamp or version number is saved on the client. Then, the value is partially updated.

One way to process concurrency is to update only when the value in the WHERE clause matches the value in the record. The SQL representation of this method is:

UPDATE Table1 SET Column1 = @newvalue1, Column2 = @newvalue2WHERE DateTimeStamp = @origDateTimeStamp

Alternatively, you can use the version number for comparison:

UPDATE Table1 SET Column1 = @newvalue1, Column2 = @newvalue2WHERE RowVersion = @origRowVersionValue

If the date and time stamp or version number match, it indicates that the record in the data storage area has not been changed, and the record can be safely updated using the new value in the dataset. If not, an error is returned. You can write code to implement this type of concurrent check in Visual Studio. NET. You must also write code to respond to any update conflicts. To ensure the accuracy of the date timestamp or version number, you need to set a trigger on the table to update the date timestamp or version number when a row is changed.

Save all value methods

An alternative to a date timestamp or version number is to retrieve copies of all fields when reading a record. The DataSet object in ADO. NET maintains two versions of each modification record: the initial version (the version that was initially read from the data source) and the modified version (indicating the user update ). When you try to write records back to the data source, the initial values in the Data row will be compared with the records in the data source. If they match, it indicates that the database records have not been changed after being read. In this case, the changed values in the dataset are successfully written to the database.

For the data adapter's four commands (DELETE, INSERT, SELECT, and UPDATE), each command has a set of parameters. Each Command has parameters used for the initial value and the current value (or modified value.

 

 

For the second case:

Because it is a large concurrent request, the first case can also be used. In addition, because it is to retrieve large data volumes, the query efficiency needs to be considered.

1. Index A table based on query Conditions

2. Optimize Query statements

3. You can use cache for data query.

 

Solution to the third case:

You can also use the processing method in the first case. In addition, you can consider using the following method to update the same table:

1. Save the data to the cache. When the data reaches a certain amount, it is updated to the database.

2. divide a table by indexes (Table shards and partitions). For example, for a table that stores people's information across the country, the data volume is large. If multiple tables are divided by province, people's information across the country is stored in the corresponding table by province, and then queried and updated by province. This reduces the number of concurrent and large data volumes.


Part2:


How to handle massive concurrent data operation File Cache, database cache, optimize SQL, data shunting, horizontal and vertical division of database tables, and optimize the code structure! Summary 1. why do the following data inconsistencies occur when multiple users are locked for concurrent database operations? Update A and B are lost, read and modify the same data, the Modification result of one user destroys the Modification result of another user. For example, the ticket booking system dirty reads the data modified by user A, and then user B reads the data again, however, for some reason, user A canceled the modification of the data and restored the original value. In this case, the data obtained by user B is inconsistent with the data in the database and cannot be read repeatedly by user, then user B reads the data and modifies it. When user A reads the data again, it finds that the two values are inconsistent. The main method of concurrency control is blocking, the lock is to prohibit the user from performing some operations within a period of time to avoid data inconsistency. The binary lock category has two methods: 1. from a database system perspective: divided into exclusive locks (that is, exclusive locks), shared locks and update locks MS-SQL servers use the following resource lock modes. Lock mode description sharing (S) is used for operations without changing or updating data (read-only operations), such as SELECT statements. Update (U) is used in updatable resources. It prevents common deadlocks when multiple sessions are reading, locking, and subsequent resource updates. Arrange it (X) for data modification operations, such as INSERT, UPDATE, or DELETE. Make sure that multiple updates are not performed for the same resource at the same time. Intention locks are used to establish a lock hierarchy. The intention lock type IS: Intention sharing (IS), intention ranking (IX), and intention ranking sharing (SIX ). The schema lock is used to perform operations dependent on the table schema. The schema lock types are: schema modification (Sch-M) and schema stability (Sch-S ). Large-capacity Update (BU) is used to copy data to a table in large capacity and specify the TABLOCK prompt. The shared lock sharing (S) Lock allows concurrent transactions to read (SELECT) a resource. When a shared (S) lock exists on the resource, no other transactions can modify the data. Once the data has been read, the shared (S) lock on the resource is released immediately, unless the transaction isolation level is set to repeated read or higher, or use the lock prompt to keep the share (S) Lock within the transaction lifecycle. Update lock Update (U) Lock can prevent normal deadlocks. Generally, the update mode is composed of a transaction. The transaction reads the record, obtains the share (S) lock of the resource (page or row), and then modifies the row, this operation requires that the lock be converted to an exclusive (X) Lock. If two transactions obtain the Shared Mode Lock on the resource and attempt to update the data at the same time, a transaction attempts to convert the lock to the lock (X. The conversion from the sharing mode to the exclusive lock must wait for a while, because the exclusive lock of a transaction is incompatible with the Sharing Mode Lock of other transactions; a lock wait occurs. The second transaction attempts to obtain the row lock (X) for update. Because both transactions need to be converted to the exclusive (X) lock, and each transaction waits for another transaction to release the share mode lock, a deadlock occurs. To avoid this potential deadlock problem, use the update (U) Lock. Only one transaction can obtain the resource Update (U) Lock at a time. If the transaction modifies the resource, the update (U) Lock is converted to the row (X) Lock. Otherwise, the lock is converted to a shared lock. The exclusive lock locks (X) to prevent concurrent transactions from accessing resources. Other transactions cannot read or modify the data locked by the lock (X. Intention lock intention lock indicates that SQL Server needs to obtain the share (S) lock or arrange it (X) Lock on some underlying resources in the hierarchy. For example, a table-level share intention lock indicates that the transaction intends to place the share (S) lock on the page or row of the table. Setting the intention lock at the table level can prevent another transaction from getting the row lock (X) on the table containing that page. Intention locks can improve performance, because SQL Server only checks intention locks at the table level to determine whether transactions can safely obtain the locks on the table. Instead of checking the locks on each row or page in the table to determine whether the transaction can lock the entire table. Intention locks include intention sharing (IS), intention arranging it (IX), and intention sharing (SIX ). Lock mode description intention sharing (IS) by placing the S lock on each resource, it indicates that the transaction intention IS to read some (not all) of the underlying resources in the hierarchy. By placing the X lock on each resource, the intention of the transaction is to modify some (rather than all) underlying resources in the hierarchy. Ix is the superset of IS. By placing an IX lock on each resource, SIX shares with the intention to indicate that the transaction intends to read all the underlying resources in the hierarchy and modify some (rather than all) of the underlying resources. Allow concurrent IS locks on top-level resources. For example, the table's SIX lock places a SIX lock on the table (the concurrency IS allowed), and the IX lock on the current modification page (the X lock on the modified row ). Although each resource can have only one SIX lock for a period of time, to prevent other transactions from updating resources, however, other transactions can read the underlying resources in the hierarchy by obtaining the table-level IS lock. Exclusive lock: only the lock operation is allowed by the program. Other operations on the lock operation will not be accepted. When the data update command is executed, SQL Server automatically uses the exclusive lock. An exclusive lock cannot be applied to an object when other locks exist. Shared lock: the shared lock can be read by other users, but other users cannot modify it. When executing Select, SQL Server will apply a shared lock to the object. Update lock: When SQL Server is preparing to update data, it first locks the data object so that the data cannot be modified but can be read. When SQL Server determines that it wants to update data, it will automatically replace the update lock with an exclusive lock. When other locks exist on the object, it cannot be updated. 2. From the programmer's perspective: Optimistic locks and pessimistic locks. Optimistic lock: it relies entirely on the database to manage the lock. Pessimistic lock: the programmer manages the lock processing on data or objects by himself. The MS-SQLSERVER uses the lock to implement pessimistic concurrency control among multiple users who execute modifications in the database at the same time. the granularity of the three locks is the size of the blocked target, and the granularity of the lock is small, the concurrency is high, but the overhead is large, if the lock granularity is large, the concurrency is low, but the overhead is small. SQL Server supports the lock granularity, which can be divided into rows, pages, keys, key ranges, indexes, tables, or databases to obtain the lock Resource description. Used to lock a row in a table. The row lock in the key index. Used to protect the key range in a serializable transaction. Page 8 KB data page or index page. A group of eight adjacent data pages or index pages in the extended Disk Area. The entire table includes all data and indexes. DB database. 4. The length of lock holding time is the length of time required to protect the requested resources. The duration of the shared lock used to protect read operations depends on the transaction isolation level. When the default transaction isolation level of read committed is adopted, the shared lock is only controlled during page reading. During the scan, the lock is released only when the lock is obtained on the next page within the scan. If you specify the HOLDLOCK prompt or set the transaction isolation level to repeatable read or SERIALIZABLE, the lock will not be released until the transaction ends. Based on the concurrency options set for the cursor, the cursor can obtain the scroll lock in the sharing mode to protect the extraction. The scroll lock is released only when the cursor is extracted or closed for the next time (whichever comes first. However, if HOLDLOCK is specified, the rolling lock is released until the transaction ends. The exclusive lock used to protect updates will not be released until the transaction ends. If a connection attempts to obtain a lock and the lock conflicts with the lock controlled by another connection, the connection attempting to obtain the lock will be blocked: release the conflicting lock and the connection obtains the requested lock. The connection timeout interval has expired. No timeout interval by default, however, some applications set the timeout interval to prevent the user from waiting for the lock in SQL Server for an indefinite period. 1. Processing the deadlock and setting the deadlock priority deadlock means that multiple users apply for different locks, SET DEADLOCK_PRIORITY can be used to control the session response mode in the case of a deadlock. If both processes lock data and wait until other processes release their own locks, each process can release its own locks, that is, deadlock occurs. 2. Process timeout and set the lock timeout duration. @ LOCK_TIMEOUT the current lock timeout setting for the current session is returned. The unit is SET LOCK_TIMEOUT in milliseconds. This setting allows the application to SET the maximum time for the statement to wait for the resource to be blocked. When the waiting time of a statement is greater than the LOCK_TIMEOUT setting, the system automatically cancels the blocking statement, in the following example, the system Returns Error 1222 that exceeds the lock request timeout period. The lock timeout period is set to 1,800 Ms. SET LOCK_TIMEOUT 1800 3) SET the transaction isolation level. 4) use the table-level locking prompt for SELECT, INSERT, UPDATE, and DELETE statements. 5) to configure the index lock granularity, you can use the sp_indexoption system stored procedure to set the lock granularity for the index. 6. view the lock information. 1. Execute EXEC SP_LOCK to report the lock information. 2. Press Ctrl + 2 in the query analyzer. see the lock information. 7. Precautions for use: how to avoid deadlock. 1. When using transactions, shorten the logical processing process of the transaction as much as possible and commit or roll back the transaction as soon as possible. 2. Set the deadlock timeout parameter to a reasonable range, for example, 3 minutes to 10 minutes. If the timeout period is exceeded, the operation is automatically abandoned, avoid process suspension; 3. Optimize the program to check and avoid deadlock; 4. all scripts and SP should be carefully tested before the version is correct. 5. All the SP servers must handle errors (via @ error) 6. Generally, do not modify the default transaction level of SQL SERVER. It is not recommended to force lock to solve the problem. How can we lock a row-table database?
1234567891011121314151617 1. How to lock a row in a table SETTRANSACTION ISOLATION LEVEL READ UNCOMMITTED SELECT* FROMtable ROWLOCK WHEREid = 1 2. Lock a table in the database SELECT* FROMtable WITH (HOLDLOCK) Lock statement:sybase:updateTablesetcol1=col1 where1=0 ;MSSQL:selectcol1 fromTable (tablockx)where1=0 ;oracle:LOCKTABLETableINEXCLUSIVE MODE ;

 

No one else can operate after the lock, until the lock user is unlocked. Several examples of unlocking with commit or rollback help you to deepen your impression of table1 (A, B, C) a B A1 b1 c1a2 b2 c2a3 b3 c3 1) create two new connections with the exclusive lock?
123456789101112 Execute the following statement in the first connection:begintranupdatetable1setA=’aa’whereB=’b2′Waitfor delay '00: 00: 30'-wait 30 secondscommittranExecute the following statement in the second connectionbegintranselect* fromtable1whereB=’b2′committran

 

If you execute the preceding two statements at the same time, the select query must wait until the update statement is executed. That is, you must wait 30 seconds. 2) What is the shared lock?
123456789101112131415 Execute the following statement in the first connection:begintranselect* fromTable1 holdlock-holdlock artificial lockwhereB=’b2′Waitfor delay '00: 00: 30'-wait 30 secondscommittran Execute the following statement in the second connectionbegintranselectA,C fromtable1whereB=’b2′updatetable1setA=’aa’whereB=’b2′committran

 

If the preceding two statements are executed at the same time, the select query in the second connection can be executed, and the update statement can be executed only after the first transaction releases the shared lock and converts it to the exclusive lock. That is, it takes 30 seconds. 3) What is the deadlock?
12345678910111213141516171819202122232425 Add table2 (D, E)D Ed1 e1d2 e2Execute the following statement in the first connection:begintranupdatetable1setA=’aa’whereB=’b2′waitfor delay ’00:00:30′updatetable2setD=’d5′whereE=’e1′committran Execute the following statement in the second connectionbegintranupdatetable2setD=’d5′whereE=’e1′waitfor delay ’00:00:10′updatetable1setA=’aa’whereB=’b2′committran

 

At the same time, the system detects a deadlock and terminates the process. Additionally, the table-level locks supported by SQL Server indicate that HOLDLOCK holds a shared lock until the entire transaction is completed, the lock object should be released immediately when it is not needed. It is equal to the SERIALIZABLE transaction isolation level. When the NOLOCK statement is executed, no shared lock is issued and dirty reads are allowed, equal to the read uncommitted transaction isolation level. PAGLOCK uses multiple page locks when a table lock is used. READPAST allows SQL server to skip any locked rows and execute transactions, applicable to read uncommitted transaction isolation level, which only skips the RID lock and does not skip pages. region and table lock ROWLOCK forcibly uses the row lock TABLOCKX to forcibly use the exclusive table lock, this lock prevents any other transactions from using this table during the transaction.







JAVA high concurrency issues, big data, frequent I/O operations

We recommend that you use cache processing. Based on the data volume you mentioned, the redis-based Cache can fully meet the requirements, and the access speed can be +. In addition, the hashMap to be used is ConcurrentHashMap or other, the page shows whether to perform incremental queries or directly requery all data. whether to use netty or mina for receiving socket data requires careful consideration. With such a large concurrency demand, you can fully consider distributed clusters. It is estimated that this is only the goal of the leaders.

What books should I read when I learn a large amount of data and handle high concurrency?

Learn to test it first. It is not a test of business functions, but a system test. My personal knowledge and experience are as follows: 1. Use a single machine for testing. Use a tool to generate a large number of concurrent requests to attack the server until the server is slow or even close to crashing. 3. Find the System Bottleneck and optimize it to solve the bottleneck, and then repeat the test. Then you will find new bottlenecks and solve them again. Go through steps 1-3 until all aspects are basically balanced. 4. When the problem cannot be solved by a single machine, Server Load balancer is considered, distributed solutions are considered, and 1-3 steps are used for analysis and testing.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.