Use VBA in Excel to filter data

Source: Internet
Author: User
Tags vlookup function

Order Information:
Excel VBA application development can be ordered from the second bookstore from the basics to practices

Excel and VBAProgramDesign, latest news, expected to go public in September 

1. Problem Origin

Before waking up in the morning, you were woken up by phone. a middle school student asked you a question about excel. As a so-called Excel expert, you are often subject to such harassment. This is probably the case. Some of the rows in a large Excel file are repeated. That is to say, two rows are exactly the same, while some rows are not repeated, the problem is to find out all rows that do not repeat or repeat. You probably thought about it. You can use "Vlookup" to find it and then sort it again. You need to give it a try and then tell him how to use it. So you tell him, call you in 20 minutes.

2. Solution

First, open Excel and enter some test data, which is like this:

There are two "Zhang San" and "Li Si", and only one of the others needs to be divided out. First, enter 1 in column B, and then fill it down. In column C, enter "Vlookup (A1, $ A $1: $ B $, 2, false)" [①]. If yes, then 1 is returned. If it cannot be found, it will be empty. The result is that all column C is changed to 1, because you can find it by yourself. Therefore, the range to be searched must be removed from the row.

You have found several other functions, such as "match" and "Index", which cannot be tried. Then you can use the IF function to write the IF function. Enter 4th rows first. The parameters and reference areas will be processed later. Maybe Excel is smart enough to fill out the reference areas you need.

You entered the following if function:

If (or (Vlookup (A4, A1: B3, 2, false), Vlookup (A4, A5: B7, 2, false), 1, 0)

It's really complicated. Excel should open a small window and then useCodeInput such a judgment logic. The IF function can be nested in Layer 7. I really don't know what Microsoft engineers think [2]. You press enter while muttering, the result is "# N/A", that is, "value unavailable". You know that if the Vlookup function cannot find the required value, the system returns the error value # N/, the expression has this stuff, so no matter what calculation, the result is it.

Select "error check" and "show calculation steps" from the tool menu to confirm your guess. The error value # N/A returned by the second Vlookup function is passed to the end.

At this time, your classmate's phone number came. You told him that you needed to write a small program and you decided to use the simple VBA to solve the problem.

3. VBA program

Open the VBA editor and insert a module. You may think about the following code:

Sub selectdouble ()

Dim I as long, J as long

For I = 1 to 7 Step 1

For j = 1 to 7 Step 1

'Different rows

If I <> J then

If range ("A" & I). value = range ("A" & J). value then

Range ("e" & I). value = 1

End if

End if

Next J

Next I

End sub

Click "run". The duplicate items indicate 1, and no duplicates are empty. Then you can sort them. You are very satisfied that you have entered a line of comment. You dialed the phone number of your classmate and told him that you could do it. Then he called you and read the program to him to tell him where to change it. God knows what language he learned when he went to school. It's not basic. You have to explain what dim means. After some tossing, he finally entered the code into the computer at the other end of the phone. As a telecom employee, he can chat by phone 24 hours a day. It's just a poor phone bill. You sigh and should wash your face and brush your teeth.

4. Efficiency

After washing your face, brush your teeth, make a cup of coffee, return to the computer, and call again. What you think is a "good news" that you have finished, but you have heard that you have crashed for 0.1 seconds. You think it should be that the program is still executing or an endless loop. You asked him about the amount of data, and you know there are more than 9000 records. Fortunately, you want.

You checked the code and there was no endless loop. Maybe there was something wrong with your input. You changed the loop to 1 to 10000, then picked up the cup and swallowed a cup of coffee, it depends on the future and waits for the calculation result. A few minutes later, it's still not over. You think it's a bit strange. You press Ctrl + break, pause the program, place the mouse on the I variable, and display I or 24, tnnd, you know that the range function is too slow. Forget it. You can call to tell your classmates that it may take several hours to complete the calculation. You drank a cup of coffee and said to yourself, it was faster than manual screening.

But won't there be less than 10 thousand records? The built-in functions such as Vlookup in Excel can be computed in the twinkling of an eye.

4.1. Use an array

The array is faster than the range function. You have modified the program and defined two arrays. First, read all the data into the first array and then operate on the array. For repeated, write the corresponding part of the second array as 1. After calculation, write the result back based on the second array. The program code is as follows:

Sub selectdouble2 ()

Dim I as long, J as long

Dim Max as long

Dim A () as string, B () as long

Max = 10000

Redim A (max) as string

Redim B (max) as long

For I = 1 to Max Step 1

A (I) = range ("A" & I). Value

Next I

For I = 1 to Max Step 1

For j = 1 to Max Step 1

'Different rows

If I <> J then

If a (I) = a (j) then

B (I) = 1

End if

End if

Next J

Next I

For I = 1 to Max Step 1

Range ("F" & I). value = B (I)


End sub

After you run the command, it may take less than five minutes for 10000 records. You are satisfied with the increase in efficiency by several orders of magnitude. You have not forgotten to set a Max Variable. In this way, there will be fewer changes to the Code during use.

4.2. Use built-in functions

You think of the Vlookup function. Yes. Why does Vlookup execute so fast? Of course, it is because it is compiled, not written in VBA [③]. Why don't you use this function? In VBA, you can use the application. function name to call the built-in function of Excel. The modified code is as follows:

Sub selectdouble3 ()

Dim I as long, J as long, a, B

For I = 2 to 9999 Step 1

A = application. Vlookup (range ("A" & I), range ("A1: B" & (I-1), 2, false)

B = application. Vlookup (range ("A" & I), range ("A" & (I + 1) & ": B1000"), 2, false)

If iserror (A) and iserror (B) then

Range ("G" & I). value = 0

End if

Next I

End sub

The code is very short, but it is a little complicated and annoying. The loop is from 2 to 9999. To prevent the Range of the Vlookup function from being invalid, these two lines need to be processed manually. The iserror function checks the return value. If both return values are incorrect, this behavior is a single row with no duplicates, and the flag is 0. The program execution speed is similar to the above, at least you don't feel the difference.

4.3. Continue hack

At this point, you are still not satisfied with the use of arrays, the amount of data is too high, the memory is too tight, the use of the Vlookup function, the code feels ugly [4]. You don't know why you think about binary search, so you should sort the data before searching. You sorted the data in Excel. The problem now is that we need to loop twice, and the complexity is N * n. If ......, If you want to sort the order, you only need to check whether the current value is the same as the next one. If it is the same, mark the current and next positions, add 2 to the loop variable, and skip the next one. If it is different, you can continue the comparison by adding 1 to the loop variable. The Code is as follows:

Sub selectdouble4 ()

Dim I as long, Max as long

Max = 10000

I = 1


If range ("A" & I). value = range ("A" & (I + 1). value then

Range ("I" & I). value = 1

Range ("I" & (I + 1). value = 1

I = I + 2


I = I + 1

End if

Loop While I <Max

End sub

The complexity of this program is only n. The execution speed is of course the fastest among all the programs you write today, and the memory usage is also the smallest. You are very satisfied with the smile of the thief.

5. Summary

You opened the log and started to write down the problem solving process today.

You think, well, if you just want to change the range Function to solve the problem, the speed will not be substantially improved. Speed improvement. First, sorting is the key. fast search and search are based on sorted content, such as binary search. Why should the database be indexed, whether or not the index has a great impact on the search speed. The truth is the same. Second, there is no backtracking during the search, and the searched content is skipped directly. This matches the string.AlgorithmLike the KMP algorithm [⑤], the idea is the same. Well, if there are not two or more identical content, you can use a loop to trace back, in addition, different numbers can be identified as different numbers. You suddenly feel confident and forget the fact that you have been unemployed for half a year.


[①] Searches for the specified value in the first column of the table or value array, and returns the value at the specified column in the current row of the table or array. When the comparison value is in the first column of the data table, you can use the Vlookup function to replace the hlookup function. For detailed usage, see Excel help.

[2] as a programmer, you always think that if functions and so on are a waste of time and a waste of time. How can I understand 7-layer if functions? But the function is simple. You don't want to scare your classmates by telling them to write a program to solve the problem.

[③] Tian knows what Microsoft uses to write the code, maybe C, maybe C ++, and certainly not basic or C #, when writing it, C # is not yet born.

[4] Maybe you haven't written it.

[⑤] Although you are not from a division, you have also learned data structures and algorithms.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.