When the memory size is smaller than the file size, you can quickly locate a large memory file in a large file.
For example, a file
ABC 56
DEF 100
RET 300
...
There are two columns in the file, and the first column is non-repeated. column 2nd indicates the number of times (as a number ).
If the file size is 2 GB or larger and the memory size is only 1 GB, how can we quickly locate the line "ABC 56.
Please give us a clear solution.
Reply to discussion (solution)
What do you mean?
If you want to quickly find a line to open a file, you can use vi or more to open the file;
Enter/ABC and press enter.
Fopen, and then fscanf.
Read a row at a time. Memory will not be a limiting factor.
Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
Do you have any good comments?
If you create a hash table, do you need to hash the file content first?
It can be processed by other tools, and it may not be necessary to use algorithms.
For example, awk:
Awk '/ABC \ t56/{print NR}' file
You can obtain the row number that matches the row.
It is recommended that lz talk about specific requirements. There are many solutions if only the row number is obtained.
However, if there are other requirements, doing so like awk may not be the best solution.
Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
I don't know if you have any good comments. why don't you have to read a row first and then hash it?
Reading a row is too slow. you can read one row.
Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
I don't know if you have any good comments. why don't you have to read a row first and then hash it?
Reading a row is too slow. you can read one row.
Yes. The Read Block meets your requirements.
Refer:
Http://www.fantxi.com/blog/archives/php-read-large-file/
Http://sjolzy.cn/php-large-file-read-operation.html
If you create a hash table, do you need to hash the file content first?
It can be processed by other tools, and it may not be necessary to use algorithms.
For example, awk:
Awk '/ABC \ t56/{print NR}' file
You can obtain the row number that matches the row.
It is recommended that lz talk about specific requirements. There are many solutions if only the row number is obtained.
However, if there are other requirements, doing so like awk may not be the best solution.
How can we quickly find the requirement? For example, I want to know the numbers behind ABC, or the numbers behind DEF...
Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
I don't know if you have any good comments. why don't you have to read a row first and then hash it?
Reading a row is too slow. you can read one row.
How can I read a piece of memory? Can you give an example?