Quickly locate a row in a large file if the memory is less than the file size

Source: Internet
Author: User
Large Memory files

For example, there is a file
ABC 56
DEF 100
RET 300
...

The file has 2 columns, the first column is non-repeating, and the 2nd column represents the number of times (as a number).

If the file size is 2G or larger and the memory is only 1G, how to quickly navigate to the "ABC 56" line.

Ask the Danale to give a clear solution.


Reply to discussion (solution)

Don't understand what you mean?
If you open a file to quickly find a line, you can use VI or more to open the file;
Then enter:/ABC return to the good

fopen, then fscanf.
It's good to read one line at a time. Memory is not a limiting factor.

Does anyone know?
If it is a line of reading, then efficiency will not be.
Is there a faster way?
My idea is to build a hash table and then use that hash collision principle to drain the weight based on the hashing algorithm.
I wonder if you have any good ideas.

If you build a hash table, do you want to hash the contents of the file first?

Can use other tools to deal with, not necessarily must use the algorithm.
For example, awk:
awk '/abc\t56/{print NR} ' file
You can get the line number of the matching row.

It is suggested that LZ say the specific needs, if only to get the line number, the scheme is many.
But if there are other needs, it is not necessarily the best option for awk to do so.

Does anyone know?
If it is a line of reading, then efficiency will not be.
Is there a faster way?
My idea is to build a hash table and then use that hash collision principle to drain the weight based on the hashing algorithm.
I don't know if you have any good ideas. Don't you have to read the line first and hash it out?

Too slow to read a line


Does anyone know?
If it is a line of reading, then efficiency will not be.
Is there a faster way?
My idea is to build a hash table and then use that hash collision principle to drain the weight based on the hashing algorithm.
I don't know if you have any good ideas. Don't you have to read the line first and hash it out?

Too slow to read a line

Yes, reading blocks is better than you need.

Landlord can refer to:
http://www.fantxi.com/blog/archives/php-read-large-file/

Http://sjolzy.cn/php-large-file-read-operation.html

If you build a hash table, do you want to hash the contents of the file first?

Can use other tools to deal with, not necessarily must use the algorithm.
For example, awk:
awk '/abc\t56/{print NR} ' file
You can get the line number of the matching row.

It is suggested that LZ say the specific needs, if only to get the line number, the scheme is many.
But if there are other needs, it is not necessarily the best option for awk to do so.

How can demand be quickly found? For example, I want to know the number behind ABC, or the number behind def ...


Does anyone know?
If it is a line of reading, then efficiency will not be.
Is there a faster way?
My idea is to build a hash table and then use that hash collision principle to drain the weight based on the hashing algorithm.
I don't know if you have any good ideas. Don't you have to read the line first and hash it out?

Too slow to read a line

How to read a piece of memory? Can you give me an example?

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.