How to compare strings effectively in Go

Source: Internet
Author: User
When optimizing software, string comparisons may be somewhat different from what you think. In particular, it involves splitting loops across goroutines, finding a faster hashing algorithm, or some sound more scientific way. When we make such changes, there is a sense of accomplishment. However, string comparisons are often the biggest bottleneck in information passing (in a pipeline). The following code snippet is often used, but it is the worst solution (see benchmarks below) and causes a real problem. "' Gostrings. ToLower (name) = = Strings. ToLower (Othername) "is a very straightforward notation. Convert the string to lowercase and then compare. To understand why this is a poor solution, you need to know how the string is represented, and how ' ToLower ' works. But first, let's talk about the main usage scenarios in string comparisons, and when we use the ' = = ' operator, we get the fastest and most optimized solution. Typically APIs or similar software typically consider these usage scenarios. We use ' ToLower ' to call Eature-complete. [^ Note 1]. [^ Note 1] This is the when we drop in ToLower and call it feature-complete. On Go, the string is a series of * immutable * runes. Rune is a term for Go that represents a code point. You can get more information about strings, bytes, runes and characters in [Go blog] (https://blog.golang.org/strings). ' ToLower ' is a standard library function that loops through each rune in a string to convert to lowercase, and then returns a new string. So the above code traverses the entire string before comparing it. This is very relevant to the length of the string. The following pseudo-code probably shows the complexity of the code snippet above. Note: Because the string is immutable, ' strings. ToLower ' allocates a new memory space for two strings. This adds to the complexity of the time, but it's not our concern right now. To simplify the presentation, the following pseudo-code considers the string to be mutable. "' go//Pseudo codefunc compareinsensitive (A, b string) bool {//loop over string A and convert every rune to lowercase f OrI: = 0; I < Len (a); i++ {A[i] = Unicode. ToLower (A[i])}//loop over string B and convert every rune to lowercase for i: = 0; I < Len (b); i++ {B[i] = Unicode. ToLower (B[i])}//loop over both A and B and return False if there are a mismatch for I: = 0; I < Len (a); i++ {if a[i]! = B[i] {return false}} return true} ' time complexity is O (n) ' n ' is ' Len (a) + len (b) + len (a) ' See the example below: ' Gocomparei Nsensitive ("Fizzbuzz", "Buzzfizz") "means we need to loop 24 times to determine two completely different string mismatches. This is very inefficient and we can compare ' Unicode ' by comparison. ToLower (a[0]) ' and ' Unicode. ToLower (B[0]) ' (pseudo-code) to differentiate these strings. Therefore, this situation needs to be taken into account. To optimize, we can remove the two loops in front of ' compareinsensitive ' and compare each character in the corresponding position. If the runes are not equal, we convert to lowercase and then compare. If it is still not equal, we end the loop and think that the two strings are not equal. If they are equal, continue to compare the next rune until the end or find an unequal place. Now rewrite the code "go//Pseudo codefunc compareinsensitive (A, b string) bool {//a quick optimization. If the strings has a different//length then they certainly is not the same if Len (a)! = Len (b) {return false} fo R I: = 0; I < Len (a); i++ {//If the characters already match then we don ' t nEed to//alter their case. We can continue to the next rune if a[i] = = B[i] {continue} if Unicode. ToLower (A[i])! = Unicode. ToLower (B[i]) {//The lowercase characters do not match so these//is considered a mismatch, break and return FALSE RET Urn false}}//The string length has been traversed without a mismatch//therefore the match return true} ' new function is more efficient 。 The upper bound is the length of a string and not the length of the two string. What do you think of our comparison? Cycle comparisons are up to 8 times. Even if the first character is different, it loops only once. Our optimization makes the comparison operation reduce by about 20 times times! Fortunately, there is such a function in the ' Strings ' package. Called ' Strings. Equalfold '. # # Performance Test '/when both strings is equalBenchmarkEqualFoldBothEqual-8 20000000 124 ns/opbenchmarktolowerbothequal-8 10000000 339 ns/op//When both strings is equal until the last runeBenchmarkEqualFoldLastRuneNotEqual-8 20000000 129 ns/o PBenchmarkToLowerLastRuneNotEqual-8 10000000 346 ns/op//when both strings is DistinctBenchmarkEqualFoldFirstRuneNotEqual-8 300000000 11.2 ns/opbenchmarktolowerfirstrunenotequal-8 10000000 333 ns/op//when both strings has a different case at Rune 0BEnchmarkEqualFoldFirstRuneDifferentCase-8 20000000 ns/opbenchmarktolowerfirstrunedifferentcase-8 10000000 433 NS /op//when both strings has a different case in the MiddleBenchmarkEqualFoldMiddleRuneDifferentCase-8 20000000 123 NS/OPB EnchmarkToLowerMiddleRuneDifferentCase-8 10000000 428 Ns/op "when the first character of a string is different, the difference is staggering (30x). Because you do not need to loop to compare two strings, you return false only once in a loop. In each case ' equalfold ' is a few levels better than the starting comparison. # # is that important? You may think that 400 nanoseconds is not important. In most cases you may be right. However, some small optimizations are as simple as other processes. In this case, it is much simpler than the original approach. Qualified engineers use these small, optimized processes in their daily work. They don't wait until it becomes a problem to optimize the software, they write optimized software from the start. Even the best engineers would not have been able to write optimized software from 0. It's impossible to imagine every extreme case and optimize it. Also, when we provide the software to the user, we can not predict the user's behavior. In any case, adding these simple processes to your daily routine can help extend the lifecycle of your software and prevent potentially unnecessary bottlenecks in the future. Even if the bottleneck does not affect you, you will not waste your effort.

Via:https://www.digitalocean.com/community/questions/how-to-efficiently-compare-strings-in-go

Author: blockloop Translator: tyler2018 proofreading: polaris1119

This article by GCTT original compilation, go language Chinese network honor launches

This article was originally translated by GCTT and the Go Language Chinese network. Also want to join the ranks of translators, for open source to do some of their own contribution? Welcome to join Gctt!
Translation work and translations are published only for the purpose of learning and communication, translation work in accordance with the provisions of the CC-BY-NC-SA agreement, if our work has violated your interests, please contact us promptly.
Welcome to the CC-BY-NC-SA agreement, please mark and keep the original/translation link and author/translator information in the text.
The article only represents the author's knowledge and views, if there are different points of view, please line up downstairs to spit groove

618 Reads
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.