Php array code for eliminating duplicate data for millions of data

Source: Internet
Author: User
In normal work, you often receive notifications about website emails, text messages, and emails sent to website members. the user list is generally provided by other colleagues, which will inevitably be repeated, to avoid repeated sending, I need to sort out the user list they provide before sending information, the following uses the uid list to describe how I use php arrays to remove duplicates. If you get a uid list with more than one million lines, the format is as follows:

The code is as follows:


10001000
10001001
10001002
......
10001000
......
10001111


In fact, the php array feature is used to sort the values. let's take a look at the definition of the php array: the array in PHP is actually an ordered ING. Valuing is a type that associates values with keys. This type has been optimized in many aspects, so it can be considered as a real array, or a list (vector), a hash list (an implementation of ING), a dictionary, a set, stack, queue, and more possibilities. The value of an array element can also be another array. Tree structure and multi-dimensional array are also allowed.

In the php array, the key (keys) is also called an index and is unique. we can use this feature for weight sorting. the sample code is as follows:

The code is as follows:


// Define an array to store the results after deduplication
$ Result = array ();
// Read the uid list file
$ Fp = fopen('test.txt ', 'r ');

While (! Feof ($ fp ))
{
$ Uid = fgets ($ fp );
$ Uid = trim ($ uid );
$ Uid = trim ($ uid, "\ r ");
$ Uid = trim ($ uid, "\ n ");

If ($ uid = '')
{
Continue;
}
// Use uid as the key to check whether the value exists
If (empty ($ result [$ uid])
{
$ Result [$ uid] = 1;
}
}

Fclose ($ fp );

// Save the result to a file
$ Content = '';
Foreach ($ result as $ k => $ v)
{
$ Content. = $ k. "\ n ";
}
$ Fp = fopen('result.txt ', 'w ');
Fwrite ($ fp, $ content );
Fclose ($ fp );
?>


With over 20 lines of code, you can sort out more than one million pieces of data, which is very efficient and practical. You can use this method to remove duplicate numbers or emails.

In addition, this method can also be used for removing duplicates from two files. if you have two uid list files in the same format as the uid list above, the example program is as follows:

The code is as follows:


// Define an array to store the results after deduplication
$ Result = array ();
// Read the first uid list file and put it in $ result_1
$ Fp = fopen('test_1.txt ', 'r ');
While (! Feof ($ fp ))
{
$ Uid = fgets ($ fp );
$ Uid = trim ($ uid );
$ Uid = trim ($ uid, "\ r ");
$ Uid = trim ($ uid, "\ n ");
If ($ uid = '')
{
Continue;
}
// Write $ result with uid as the key. if there is a duplicate, it will overwrite
$ Result [$ uid] = 1;
}
Fclose ($ fp );
// Read the second uid list file and perform deduplication
$ Fp = fopen('test_2.txt ', 'r ');
While (! Feof ($ fp ))
{
$ Uid = fgets ($ fp );
$ Uid = trim ($ uid );
$ Uid = trim ($ uid, "\ r ");
$ Uid = trim ($ uid, "\ n ");
If ($ uid = '')
{
Continue;
}
// Use uid as the key to check whether the value exists
If (empty ($ result [$ uid])
{
$ Result [$ uid] = 1;
}
}
Fclose ($ fp );
// The Results saved in $ result are after the deduplication, which can be output to the file. the code is omitted.
?>


If you think about it, it is not difficult to find that the array feature can solve more problems in our work.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.