Binary trees (binary tree) and hash table (hash table) are basic data structures, but how do we choose between them? What is the difference between them? What are the pros and cons?
The answer to this question is not one or two words can be made clear, the reason is that in different circumstances, the basis of choice must be different. Let's start by reviewing these two data structures:
The hash table uses hash function to assign the input data to the slot corresponding to the hash table. Suppose there is a hash table with size 100, and the data we enter is from 0~99, we want to store the input data in a hash table. In theory, the time complexity of the hash table insert and find operation is O (1).
The binary tree follows the principle of the right subtree greater than the root node and the Suzi node to insert and save the data. If the tree is balanced , then the time complexity of the insert and find operations for each element is O (log (n)), n is the number of nodes in the tree, and log (n) is usually the depth of the tree. Of course, for unbalanced situations, it takes more complex data structures to process trees (red and black trees, etc.).
The above seems to conclude that the hash table is better than the binary tree, but it is not always the case. The hash table has several prominent drawbacks:
When more numbers are inserted, the likelihood of a hash table conflict is greater. For conflicts, a hash table usually has two solutions: the first is linear exploration, which is equivalent to creating a single-linked list after a conflicting slot, in which case the insert and find and delete operations will take up to O (n), and the hash table requires more space for storage. The second method is open addressing, and he does not need more space, but in the worst case (for example, all input data is map to an index) the time complexity will also reach O (n).
Therefore, it is best to estimate the size of the input data before deciding to establish a hash table. Otherwise, the process of resize a hash table will be a very time consuming process. For example, if the length of your hash table now is 100, but now there is a 101th number to insert. At this point, not only the hash table length may be extended to 150, and all the numbers after the expansion will need to be re-rehash.
The elements in the hash table are not sorted. However, in some cases, we want the data to be stored in order.
On the other hand, we discuss binary trees:
The binary tree does not have a conflict (collision), which means we can ensure that the insertion and lookup operations of the binary tree are always the time complexity of O (log (n)).
The space occupied by the binary tree is consistent with the input input data. So we don't need to pre-allocate a fixed space for a two-fork tree. Therefore, you do not need to know the size of the input data in advance.
All the elements are sorted in the tree.
Summary
If you know the size of the input data beforehand and have enough space to store the hash table, and you don't need to sort the data, the hash table is always good. Because the hash table requires only constant time in the insert, find, and delete operations.
On the other hand, if the data is continuously added, you do not know the size of the data beforehand, then the two fork tree is a compromise choice.
Reference:
Hash table vs Binary search Tree
Comparison and selection of advantages and disadvantages of binary tree and hash table