I wrote a function a few days ago and loaded the file into the memory for search (throwing it into the database is a little slow, so I don't want to do that ).
The data is formatted and formatted, separated by '|' and indexed by column:
For example, we index the following data:
1 | 2 | 3
# Comment
A | bc
You can choose to use the first index:
>>> Index_file ('1 | 2 | 3', 'a | B | C', '# a comment line'), vertical_sep,'/', 0)
{'1': ['1', '2', '3'], 'A': ['A', 'B', 'C']}
Use the first and third columns as the index:
>>> Index_file ('1 | 2 | 3', 'a | B | C', '# a comment line'), vertical_sep)
{'A/C': ['A', 'B', 'C'], '000000': ['1', '2', '3']}
The only problem is that the index and data are bound together. When you need to create multiple indexes for a file, there will be multiple data copies. However, this can also be easily modified.
View Source Code https://github.com/lbaby/javalearn/blob/master/python3/idx.py on github
Writing this function also encountered a problem: when the list is used as the key of dict and the keys () method of dict is used, the in operator is actually a linear query. StackOverflow problems are explained here:
Http://stackoverflow.com/questions/10205969/why-in-operator-with-tuple-as-a-key-in-python-so-slow
From the wandering thoughts