Code _vbs to remove all duplicate rows from a text file in VBScript implementation

Source: Internet
Author: User

Ask:
Hello, Scripting Guy! How do I remove all duplicate rows from a text file?

--SW

For:
Hello, SW. You know, being a Scripting Guy means starting to find the ultimate solution to a given problem endlessly. (or at least when our manager asks why we never seem to really accomplish anything, we tell him this: "Boss, the never-ending search process takes time!") "That's why we're glad to see your problem. Not long ago we answered a similar question about removing duplicate names from a text file. The solution we think of is simple and effective, but we're not sure that's the best solution. Now, thank you for your question, we can try to solve this problem again. It is up to you to decide whether this solution is better/faster/more convenient than what we have previously provided.

First, suppose you have a text file in which each row represents a separate record. This seems unlikely, but perhaps your file resembles the following:

This is one of the lines in the text file.
This is one of the lines in the text file.
This is another the text file.
This is one of the lines in the text file.
This is yet another the text file.
This is another the text file.
This is another the text file.
This is one of the lines in the text file.

You need a script that can drop all duplicate rows and provide output similar to the following:

This is one of the lines in the text file.
This is another the text file.
This is yet another the text file.

SW, you have found the right place:

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &h0001

Set objconnection = CreateObject ("ADODB. Connection ")
Set objRecordSet = CreateObject ("ADODB. Recordset ")

strPathToTextFile = "C:\Scripts\"
strfile = "Test.txt"

Objconnection.open "Provider=Microsoft.Jet.OLEDB.4.0" & _
"Data source=" & strPathToTextFile & ";" & _
"Extended properties=" "text; Hdr=no; Fmt=delimited "" "

Objrecordset.open "Select DISTINCT * from" & strfile, _
Objconnection, adOpenStatic, adLockOptimistic, adCmdText

Do Until objrecordset.eof
WScript.Echo ObjRecordSet.Fields.Item (0). Value
Objrecordset.movenext
Loop

We found this script interesting because we used the ActiveX data Object (ADO) and treated the text file as a database. We don't spend too much time detailing how to treat a text file as a database; If you want to know more about this, our Scripting Clinic column explains this topic in depth. Now, all we have to say is that we're going to use text file C:\Scripts\Test.txt, which we're going to represent by assigning values to variables strPathToTextFile and strfile:

strPathToTextFile = "C:\Scripts\"
strfile = "Test.txt"

So how does this get us to remove duplicate rows? Yes, there is a database query called Select DISTINCT, and with select DISTINCT you can select all the different (or unique) records in the table. Let's say you have a simple database with the following records:


Red
Red
Blue
Red

If you use the Select DISTINCT query, you get a recordset that includes only unique records:

Red
Blue

No doubt, you would think: "Wow!" Returning a unique record is almost as similar as deleting a duplicate record. "We admit that it does – well, please wait a moment: your idea is absolutely right. Our text files are built like a database table, and each line in a text file represents a field in a record. If you run a Select DISTINCT query on this text file, we will only get a unique row. In fact, we'll get the recordset as shown below:

This is one of the lines in the text file.
This is another the text file.
This is yet another the text file.

This is exactly the message we want to return. You have pointed this out for us, which is good!

After retrieving the recordset, we then use the following code to echo the unique line back to the screen:

Do Until objrecordset.eof
WScript.Echo ObjRecordset.Fields.Item (0). Value
Objrecordset.movenext
Loop

If we prefer, you can also use FileSystemObject to open a text file, and then replace the existing content with only the unique row, which is the same as removing all duplicate rows from a text file. (If we can do this with some sort of Update query, the effect will be good, but ADO is read-only when working with text files.) )

So is this the final conclusion of deleting duplicates from a text file, whether it's a name or an entire line? Alas, who knows: after all, the never-ending search process takes time! (Actually, we found that it only took about 2-3 days.) Then we start to feel bored and continue to do other things. )

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.