Beginner Novice Dish Chicken encounter a problem, statistics a 2G size file each word occurrence frequency, modified memory limit or total error allowed memory size of xxxx bytes exhausted, light measurement head count or character number can produce results, How to optimize it?
Ini_set ("Memory_limit", "1"), function Calcwordfrequence ($sFilePatch) {$aWordsInFile = array (); $aOneLineWords = array (); $sOneLineWords = ""; $fp = fopen ($sFilePatch, "R"), while (!feof ($fp)) {$sOneLineWords = Fgets ($fp); $aOneLineWords = str _word_count ($sOneLineWords, 1), foreach ($aOneLineWords as $v) {Array_push ($aWordsInFile, $v);}} Fclose ($fp); $aRes = Array_count_values ($aWordsInFile); Arsort ($aRes); return $aRes;} Echo calcwordfrequence ("2013.mp4");
Reply to discussion (solution)
This problem can not be solved, 2G size of the file hardware almost open the computer to consume light memory. Do a distributed design on storage.
This problem can not be solved, 2G size of the file hardware almost open the computer to consume light memory. Do a distributed design on storage.
Is there a way to separate this file from the code into a few partial statistics or the one with the most frequency output?
Use the split command to cut files into small files and count them.
Only text files have the concept of a line
The 2013.mp4 you're testing is obviously not a text file.
If the file does not appear \ n, or appears by the back of your $sOneLineWords = Fgets ($FP); It's going to drain the memory.
If you are a text file such as a log, you can use the PHP splfileobject () class, specifically for the operation of large files, previously used this analysis Nginx access logs, more than 5 G.