Detailed analysis on the misuse of multiple threads based on an application

Source: Internet
Author: User
Tags getstream ssl connection

I. Requirements and initial implementation
A simple windows Service: the client connects to the mail server, downloads the mail (including attachments), saves it in. eml format, and deletes the mail on the server. The pseudocode is roughly as follows:

Copy codeThe Code is as follows: public void Process ()
{
Var recordCount = 1000; // number of records retrieved each time
While (true)
{
Using (var client = new Pop3Client ())
{
// 1. Establish a connection and perform Identity Authentication
Client. Connect (server, port, useSSL );
Client. Authenticate (userName, pwd );

Var messageCount = client. GetMessageCount (); // number of existing emails in the mailbox
If (messageCount> recordCount)
{
MessageCount = recordCount;
}
If (messageCount <1)
{
Break;
}
Var listAllMsg = new List <Message> (messageCount); // used to temporarily Save the retrieved email

// 2. Retrieve the email and fill it in the list. A maximum of recordCount emails can be entered at a time.
For (int I = 1; I <= messageCount; I ++) // The email index starts from 1. The index range is [1, messageCount].
{
ListAllMsg. Add (client. GetMessage (I); // retrieve the email to the list
}

// 3. traverse and save it to the client in the format of. eml
Foreach (var message in listAllMsg)
{
Var emlInfo = new System. IO. FileInfo (string. Format ("{0}. eml", Guid. NewGuid (). ToString ("n ")));
Message. SaveToFile (emlInfo); // save the file in. eml format.
}

// 4. traverse and delete
Int messageNumber = 1;
Foreach (var message in listAllMsg)
{
Client. DeleteMessage (messageNumber); // DELETE an email (in essence, only the DELETE tag is added before the connection is closed and is not deleted)
MessageNumber ++;
}

// 5. Disconnect and delete the instance.
Client. Disconnect ();

If (messageCount <recordCount)
{
Break;
}
}
}
}

The open-source component Mail. Net (in fact, this is the union of OpenSMTP. Net and OpenPop) is used to receive emails during development. It is very easy to call interfaces. After the code is written, it is found that the basic function is satisfied. Based on the Principle of faster and more efficient on a stable basis, the performance is optimized.

Ii. Performance Tuning and BUG Analysis
For the moment, no matter whether the time-consuming operations here are computing-intensive or I/O-intensive, one may not be able to resist the impulse of multi-thread asynchronous parallel operations if a set needs to be processed in a traversal order. Conditional Asynchronization is as Asynchronous as possible, and conditional Asynchronization is not required. when conditions are created, Asynchronization is required. Taking full advantage of multithreading, the powerful processing capabilities of the server are fully utilized, and many multi-threaded programs are confidently written, this business logic is relatively simple and exception handling is easier to control (if there is a problem, there are compensation measures, and it can be improved in later processing ), theoretically, the number of emails to be checked per day is not too large, and it will not take a long time to become the killer of CPU and memory. Such multi-thread asynchronous service implementation should be acceptable. According to the analysis, it is obvious that this is a typical IO-intensive application that frequently accesses the network. Of course, I/O processing is required.

1. receive emails
As shown in the example code of Mail. Net, an index starting from 1 is required for obtaining the Mail, which must be ordered. If multiple requests are initiated asynchronously, how can this index be passed in? I am a little hesitant to start this article. If we use synchronization structures such as Lock or Interlocked, we obviously lose the advantage of multithreading. I guess it may not be as fast as sequential synchronization.

For analysis, let's write some code to see how efficient it is.

Write an Asynchronous Method to pass the integer parameter, and extract the change in the total number of emails through Interlocked control. After each Asynchronous Method is obtained, add the Message to the listAllMsg list through Lock.

There are not many test emails on the email server. test to get one or two emails. Well, it's good. If the email is successfully extracted, the initial adjustments will be rewarded.

2. Save the email
The optimization process is as follows: traverse and save. the implementation code of eml is changed to multithreading and the message is changed. the SaveToFile storage operation is processed in parallel. After testing, one or two emails are saved, and the CPU does not see much higher, the storage efficiency seems to be slightly improved, and it is a little improved.

3. delete an email
Optimization again: Like multi-thread saving, the code for traversing and deleting emails is modified, and the deletion operation is also processed in parallel through multiple threads. Good, good, very good. At this time, what Thread, ThreadPool, Cr, TPL, EAP, APM are in my mind, give it all you know you can use, pick the best one with the best efficiency, it looks very technical, wow haha.

Then, I quickly wrote an asynchronous deletion method to start testing. If there are not many emails, for example, three or two emails, they can work normally and seem pretty fast.

Now I am ready to celebrate.

4. Cause Analysis of bugs
From the independent effects of 1, 2, and 3, it seems that every thread can run independently without mutual communication or data sharing, and asynchronous multithreading technology is used, the saved files are saved and deleted quickly. It seems that the mail processing will enter the optimal state. However, the integrated debugging test is finally extracted, saved, and deleted. After running for a while to view logs, the tragedy occurred:

When there are a large number of test emails, for example, about 20 or 30 messages, the log shows a PopServerException. It seems that there are still some garbled characters, and each garbled code seems to be different. Then, test three or two messages, it is found that sometimes it works normally, and sometimes it throws a PopServerException or garbled code. The error stack is located at the place where the email is deleted.

I am kao. What is the problem? Isn't it related to the email server? Why is it always a PopServerException?

Is there a problem with the asynchronous deletion method? Asynchronous deletion: The index number is 1. Well, what is the index problem? Still not sure.

Can you find out the cause of the exception thrown by the multi‑thread delete operation? You already know why? OK. The following content is meaningless to you.

Let's talk about my troubleshooting process.

I initially suspected that there was a problem with the mail deletion method, but it was still reliable after reading the log. It is estimated that the email encoding is incorrect at the time of deletion, and it is unlikely that the same Email synchronization code will be thrown if the three operations are checked, saved, and deleted. I'm not at ease. I tested several emails separately several times, including attachments, html plain text, and synchronization code.

Think twice about it. Open the Mail. NET source code and view the SendCommand method in the Pop3Client class of Mail. Net from the emessage method Trace. DeleteMessage:

Copy codeThe Code is as follows: public void DeleteMessage (int messageNumber)
{
AssertDisposed ();

ValidateMessageNumber (messageNumber );

If (State! = ConnectionState. Transaction)
Throw new InvalidUseException ("You cannot delete any messages without authenticating yourself towards the server first ");

SendCommand ("DELE" + messageNumber );
}

The last line of SendCommand needs to submit a DELE command and follow in to see how it is implemented:Copy codeThe Code is as follows: private void SendCommand (string command)
{
// Convert the command with CRLF afterwards as per RFC to a byte array which we can write
Byte [] commandBytes = Encoding. ASCII. GetBytes (command + "\ r \ n ");

// Write the command to the server
OutputStream. Write (commandBytes, 0, commandBytes. Length );
OutputStream. Flush (); // Flush the content as we now wait for a response

// Read the response from the server. The response shocould be in ASCII
LastServerResponse = StreamUtility. ReadLineAsAscii (InputStream );

IsOkResponse (LastServerResponse );
}

Note the attributes of InputStream and OutputStream. Their definitions are as follows (magical private modifier attributes, which are rarely written in this way ):Copy codeThe Code is as follows: // <summary>
/// This is the stream used to read off the server response to a command
/// </Summary>
Private Stream InputStream {get; set ;}

/// <Summary>
/// This is the stream used to write commands to the server
/// </Summary>
Private Stream OutputStream {get; set ;}

The value assigned to it is to call the public void Connect (Stream inputStream, Stream outputStream) method in the Pop3Client class. The Connect method is called as follows:Copy codeThe Code is as follows: // <summary>
/// Connects to a remote POP3 server
/// </Summary>
/// <Param name = "hostname"> The <paramref name = "hostname"/> of the POP3 server </param>
/// <Param name = "port"> The port of the POP3 server </param>
/// <Param name = "useSsl"> True if SSL shocould be used. False if plain TCP shocould be used. </param>
/// <Param name = "receiveTimeout"> Timeout in milliseconds before a socket shocould time out from reading. Set to 0 or-1 to specify infinite timeout. </param>
/// <Param name = "sendTimeout"> Timeout in milliseconds before a socket shocould time out from sending. Set to 0 or-1 to specify infinite timeout. </param>
/// <Param name = "certificateValidator"> If you want to validate the certificate in a SSL connection, pass a reference to your validator. supply <see langword = "null"/> if default shocould be used. </param>
/// <Exception cref = "PopServerNotAvailableException"> If the server did not send an OK message when a connection was established </exception>
/// <Exception cref = "PopServerNotFoundException"> If it was not possible to connect to the server </exception>
/// <Exception cref = "ArgumentNullException"> If <paramref name = "hostname"/> is <see langword = "null"/> </exception>
/// <Exception cref = "ArgumentOutOfRangeException"> If port is not in the range [<see cref = "IPEndPoint. minPort "/>, <see cref =" IPEndPoint. maxPort "/> or if any of the timeouts is less than-1. </exception>
Public void Connect (string hostname, int port, bool useSsl, int receiveTimeout, int sendTimeout, RemoteCertificateValidationCallback certificateValidator)
{
AssertDisposed ();

If (hostname = null)
Throw new ArgumentNullException ("hostname ");

If (hostname. Length = 0)
Throw new ArgumentException ("hostname cannot be empty", "hostname ");

If (port> IPEndPoint. MaxPort | port <IPEndPoint. MinPort)
Throw new ArgumentOutOfRangeException ("port ");

If (receiveTimeout <-1)
Throw new ArgumentOutOfRangeException ("receiveTimeout ");

If (sendTimeout <-1)
Throw new ArgumentOutOfRangeException ("sendTimeout ");

If (State! = ConnectionState. Disconnected)
Throw new InvalidUseException ("You cannot ask to connect to a POP3 server, when we are already connected to one. Disconnect first .");

TcpClient clientSocket = new TcpClient ();
ClientSocket. ReceiveTimeout = receiveTimeout;
ClientSocket. SendTimeout = sendTimeout;

Try
{
ClientSocket. Connect (hostname, port );
}
Catch (SocketException e)
{
// Close the socket-we are not connected, so no need to close stream underneath
ClientSocket. Close ();

DefaultLogger. Log. LogError ("Connect ():" + e. Message );
Throw new PopServerNotFoundException ("Server not found", e );
}

Stream stream;
If (useSsl)
{
// If we want to use SSL, open a new SSLStream on top of the open TCP stream.
// We also want to close the TCP stream when the SSL stream is closed
// If a validator was passed to us, use it.
SslStream sslStream;
If (certificateValidator = null)
{
SslStream = new SslStream (clientSocket. GetStream (), false );
}
Else
{
SslStream = new SslStream (clientSocket. GetStream (), false, certificateValidator );
}
SslStream. ReadTimeout = receiveTimeout;
SslStream. WriteTimeout = sendTimeout;

// Authenticate the server
SslStream. AuthenticateAsClient (hostname );

Stream = sslStream;
}
Else
{
// If we do not want to use SSL, use plain TCP
Stream = clientSocket. GetStream ();
}

// Now do the connect with the same stream being used to read and write
Connect (stream, stream); // In/OutputStream attribute Initialization
}

I suddenly saw the TcpClient object. Isn't this a Socket-based POP3 protocol operation instruction implemented through Socket programming? There is no doubt that you need to initiate a TCP connection. What are three handshakes? send commands to operate the server... All of a sudden.

We know that a TCP connection is a Session, and sending commands (such as obtaining and deleting) need to communicate with the mail server through a TCP connection. If multiple threads send commands (such as obtaining (TOP or RETR), deleting (DELE) to the server in a session, these command operations are not thread-safe, in this case, the OutputStream and InputStream data do not match and fight each other. This is probably because the logs we see contain garbled characters. Speaking of thread security, I suddenly realized that there should also be problems in checking emails. To verify my ideas, I checked the source code of the GetMessage method:

Copy codeThe Code is as follows:

Public Message GetMessage (int messageNumber)
{
AssertDisposed ();

ValidateMessageNumber (messageNumber );

If (State! = ConnectionState. Transaction)
Throw new InvalidUseException ("Cannot fetch a message, when the user has not been authenticated yet ");

Byte [] messageContent = GetMessageAsBytes (messageNumber );

Return new Message (messageContent );
}

The internal GetMessageAsBytes method still follows the SendCommand method:Copy codeThe Code is as follows: if (askOnlyForHeaders)
{
// 0 is the number of lines of the message body to fetch, therefore it is set to zero to fetch only headers
SendCommand ("TOP" + messageNumber + "0 ");
}
Else
{
// Ask for the full message
SendCommand ("RETR" + messageNumber );
}

According to my trail, the garbled characters that throw exceptions in the test are from LastServerResponse (This is the last response the server sent back when a command was issued to it ), in the IsOKResponse method, a PopServerException is thrown if it does not start with "+ OK:Copy codeThe Code is as follows: // <summary>
/// Tests a string to see if it is a "+ OK" string. <br/>
/// An "+ OK" string shocould be returned by a compliant POP3
/// Server if the request cocould be served. <br/>
/// <Br/>
/// The method does only check if it starts with "+ OK ".
/// </Summary>
/// <Param name = "response"> The string to examine </param>
/// <Exception cref = "PopServerException"> Thrown if server did not respond with "+ OK" message </exception>
Private static void IsOkResponse (string response)
{
If (response = null)
Throw new PopServerException ("The stream used to retrieve responses from was closed ");

If (response. StartsWith ("+ OK", StringComparison. OrdinalIgnoreCase ))
Return;

Throw new PopServerException ("The server did not respond with a + OK response. The response was: \" "+ response + "\"");
}

After analysis, we finally learned that the biggest trap is that the Pop3Client is NOT thread-safe. I finally found the reason, hahaha. At this moment, I was as excited and excited as I saw the goddess. I almost forgot the wrong code and wrote it myself.

After a moment, I finally calmed down and reflected on my own low-level mistakes and fainted. How can I forget TCP and thread security? Ah ah, so tired, I feel no longer using the class library.

By the way, save. in eml, The SaveToFile method of the Message object does not need to communicate with the mail server, so there is no exception in asynchronous storage (the binary array RawMessage does not match the data ), its source code is as follows:

Copy codeThe Code is as follows: // <summary>
/// Save this <see cref = "Message"/> to a file. <br/>
/// <Br/>
/// Can be loaded at a later time using the <see cref = "LoadFromFile"/> method.
/// </Summary>
/// <Param name = "file"> The File location to save the <see cref = "Message"/> to. Existent files will be overwritten. </param>
/// <Exception cref = "ArgumentNullException"> If <paramref name = "file"/> is <see langword = "null"/> </exception>
/// <Exception> Other exceptions relevant to file saving might be thrown as well </exception>
Public void SaveToFile (FileInfo file)
{
If (file = null)
Throw new ArgumentNullException ("file ");

File. WriteAllBytes (file. FullName, RawMessage );
}

Let's take a closer look at how this bug was generated: TCP and thread security were not sufficiently sensitive and vigilant. When we saw the for loop, we performed performance tuning and the test data was insufficient, accidentally struck the thunder. In the final analysis, the cause of the error is improper selection of asynchronous scenarios due to poor thread security considerations. There are many other improper use cases. A typical case is misuse of database connections. I have read an article about misuse of database connection objects. For example, I also summarized the article "how to resolve the problem of shutting down database connections? Can I not close the database, so I'm very impressed. It is still worth mentioning that multithreading may not be suitable for using a Pop3Client or SqlConnection to share a connection to access the network, especially when intensive communication with the server, even with the multi-threaded technology, performance is not necessarily improved.

We often use Libray or. NET clients, such as FastDFS, Memcached, RabbitMQ, Redis, MongDB, and Zookeeper, all of which need to access the network and the server to communicate and parse the Protocol. After analyzing the source code of several clients, remember FastDFS, both Memcached and Redis clients have a Pool implementation, so they have no thread security risks. Based on your personal experience, you must stay in awe when using them. Maybe your language and class library programming experience is very friendly. API instructions are easy to understand and can be called easily, however, it is not as easy to use it. It is best to get through the source code to understand the general implementation ideas, otherwise, if you are not familiar with the internal implementation principle, it is likely to fall into the trap without knowing it. When we refactor or optimize the use of multithreading technology, we must not ignore a profound problem, that is, we must be aware that it is suitable for asynchronous processing scenarios, just as we know it is suitable for cache scenarios, I even think it is more important to understand this than to write code. In addition, we need to be cautious when restructuring or tuning. The data on which the test depends must be fully prepared. This has been proved many times in actual work and is particularly impressive to me. Many business systems can run well when the data volume is small, but it is easy to encounter a variety of inexplicable Problems in High-concurrency environments, as described in this article, during the multi-thread asynchronous retrieval and deletion of mails, only one or two emails with small content and attachments are available on the mail server. asynchronous retrieval and deletion are normal without any exception logs, however, if there is a large amount of data, exception logs may occur, such as troubleshooting, debugging, checking the source code, and then troubleshooting ...... this article is now available.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.