Now there is a batch of address book data (more than 10000 people's address book) that needs to find out the duplicate part of the address book of each two people (that is, who and who have already done the same) we need to compare the address book of all people. for example, we need to find the address book of four people ABCDE and find ABACADAEBCBDBECDCE...
Now there is a batch of address book data (more than 10000 people's address book) that needs to find out the duplicate part of the address book of each two people (that is, who and who have already done the same) compare the address book of all users.
For example, if there are four ABCDE contacts, find the number of address book duplicates between AB ac ad AE BC BD BE CD CE DE.
If the phone number is repeated, the two address books are duplicated.
This is a data table with more than 10000 contacts.
The json stored in the list Field is the address book content.
A person's address book contains 100 to 1000 entries.
What I try to do now is to retrieve the address book of all people and compare the first person with the rest people (foreach, nested foreach) then compare the second person with the rest, and so on.
Script Code
Then the script ran for more than 20 hours before it ran about half.Memory, CPU usage is also relatively high, script efficiency is too low
Is there a better way to find out the duplicates of this batch of data, or how to optimize the script?
Thank you!
Reply content:
Now there is a batch of address book data (more than 10000 people's address book) that needs to find out the duplicate part of the address book of each two people (that is, who and who have already done the same) compare the address book of all users.
For example, if there are four ABCDE contacts, find the number of address book duplicates between AB ac ad AE BC BD BE CD CE DE.
If the phone number is repeated, the two address books are duplicated.
This is a data table with more than 10000 contacts.
The json stored in the list Field is the address book content.
A person's address book contains 100 to 1000 entries.
What I try to do now is to retrieve the address book of all people and compare the first person with the rest people (foreach, nested foreach) then compare the second person with the rest, and so on.
Script Code
Then the script ran for more than 20 hours before it ran about half.Memory, CPU usage is also relatively high, script efficiency is too low
Is there a better way to find out the duplicates of this batch of data, or how to optimize the script?
Thank you!
$ Data = array ('id' => 1, 'name' => 1), array ('id' => 2, 'name' => 2 ), array ('id' => 3, 'name' => 3), array ('id' => 1, 'name' => 2 )); $ ret = array (); # The data is traversed once, and the duplicate key is calculated as the key to create data. if value + 1 exists, if not, set it to 1 foreach ($ data as $ k => $ v) {$ _ id = $ v ['id']; $ _ name = $ v ['name']; if (array_key_exists ($ _ id, $ ret) {$ ret [$ _ id] ++ ;} else {$ ret [$ _ id] = 1 ;}# traverse the result foreach ($ ret as $ k =>$ v) {echo "{$ k} appears {$ v} Times \ n" ;}# print_r ($ id)