When the memory size is smaller than the file size, you can quickly find and locate a row in a large file.

Source: Internet
Author: User
When the memory size is smaller than the file size, you can quickly locate a large memory file in a large file.

For example, a file
ABC 56
DEF 100
RET 300
...

There are two columns in the file, and the first column is non-repeated. column 2nd indicates the number of times (as a number ).

If the file size is 2 GB or larger and the memory size is only 1 GB, how can we quickly locate the line "ABC 56.

Please give us a clear solution.


Reply to discussion (solution)

What do you mean?
If you want to quickly find a line to open a file, you can use vi or more to open the file;
Enter/ABC and press enter.

Fopen, and then fscanf.
Read a row at a time. Memory will not be a limiting factor.

Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
Do you have any good comments?

If you create a hash table, do you need to hash the file content first?

It can be processed by other tools, and it may not be necessary to use algorithms.
For example, awk:
Awk '/ABC \ t56/{print NR}' file
You can obtain the row number that matches the row.

It is recommended that lz talk about specific requirements. There are many solutions if only the row number is obtained.
However, if there are other requirements, doing so like awk may not be the best solution.

Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
I don't know if you have any good comments. why don't you have to read a row first and then hash it?

Reading a row is too slow. you can read one row.


Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
I don't know if you have any good comments. why don't you have to read a row first and then hash it?

Reading a row is too slow. you can read one row.

Yes. The Read Block meets your requirements.

Refer:
Http://www.fantxi.com/blog/archives/php-read-large-file/

Http://sjolzy.cn/php-large-file-read-operation.html

If you create a hash table, do you need to hash the file content first?

It can be processed by other tools, and it may not be necessary to use algorithms.
For example, awk:
Awk '/ABC \ t56/{print NR}' file
You can obtain the row number that matches the row.

It is recommended that lz talk about specific requirements. There are many solutions if only the row number is obtained.
However, if there are other requirements, doing so like awk may not be the best solution.

How can we quickly find the requirement? For example, I want to know the numbers behind ABC, or the numbers behind DEF...


Does anyone know?
If it is a row or a row, the read efficiency will not work.
Is there any faster way?
My idea is to create a hash table and use the hash collision principle to remove duplicates based on the hash algorithm.
I don't know if you have any good comments. why don't you have to read a row first and then hash it?

Reading a row is too slow. you can read one row.

How can I read a piece of memory? Can you give an example?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.