1 million recorded text files, remove the top 10 duplicates.
1.1 million recorded text files, remove the top 10 duplicates.
Sample text:
098
123
234
789
......
234
678
654
123
Seeking Ideas
Share to: more
------Solution--------------------
Import into the table, and then use SQL statistics to not know if it is feasible. You can try it.
------Solution--------------------
Explode//Read split array
array_count_values//count repetitions
arsort//sort, get results
------Solution--------------------
Can be the text block processing, record the results, estimated one-time reading, memory also can't eat ...
------Solution--------------------
$fp = fopen (' file ', ' R ');
while ($buf = fgets ($fp)) {
$res [$buf]++;
}
Fclose ($FP);
Arsort ($res);
$res = Array_keys (array_slice ($res, 0, 10));
Print_r ($res);
When 1 million records are only half the case, there is no difference from the algorithm below
$a = file (' files ');
$res = array_count_values ($a);
Arsort ($res);
$res = Array_keys (array_slice ($res, 0, 10));
Print_r ($res);