This article is not very in-depth, but there are a lot of examples, learning Python advanced data structure of a good basic tutorial. The concept of
The concept of data Structures
Data structures is well understood and is the structure used to organize data together. In other words, the data structure is what is used to store a series of associated data. There are four types of built-in data structures in Python, list, Tuple, dictionary, and set. Most applications do not require other types of data structures, but there are a number of advanced data structures to choose from, such as collection, Array, HEAPQ, bisect, Weakref, copy, and Pprint. This article describes the use of these data structures to see how they help our applications.
The use of four built-in data structures is simple and there are many resources on the web, so this article will not discuss them. The
1. Collections
Collections module contains useful tools beyond the built-in types, such as counter, Defaultdict, ordereddict , Deque and Nametuple. where counter, deque, and defaultdict are the most commonly used classes.
1.1 Counter ()
If you want to count the number of occurrences of a word in a given sequence, such operations can be used for Counter. Let's see how to count the number of item occurrences in a list:
From collections import Counter
li = ["Dog", "Cat", "Mouse", "Dog", "a", "cat", "Dog"]
a = Counter (LI)
print A # Counter ({' Dog ': 3, 42:2, ' Cat ': 2, ' Mouse ': 1})
To count the number of different words in a list, you can do so by:
From collections import Counter
li = ["Dog", "Cat", "Mouse", "Dog", "a", "cat", "Dog"]
a = Counter (LI)
print A # Counter ({' Dog ': 3, 42:2, ' Cat ': 2, ' Mouse ': 1})
print len (set (LI)) # 4
If you need to group the results, you can do this:
From collections import Counter
li = ["Dog", "Cat", "Mouse", "Dog", "Cat", "Dog"]
a = Counter (LI)
PR int a # Counter ({' Dog ': 3, ' Cat ': 2, ' Mouse ': 1})
print ' {0}: {1} '. Format (A.values (), A.keys ()) # [1, 3, 2]: [ ' Mouse ', ' Dog ', ' cat ']
print (A.most_common (3)) # [(' Dog ', 3], (' Cat ', 2), (' Mouse ', 1)]
The following code fragment finds the most frequently occurring word in a string and prints the number of occurrences.
Import refrom collections import counter string = "" Lorem ipsum dolor sit amet, consectetur adipiscing elit. nunc ut elit id mi ultricies adipiscing. nulla facilisi. praesent pulvinar, sapien vel feugiat Vestibulum, nulla dui pretium orci, non ultricies elit lacus quis ante. lorem ipsum dolor sit amet, Consectetur adipiscing elit. aliquam pretium ullamcorper urna quis iaculis. etiam ac massa sed turpis tempor luctus. curabitur sed nibh eu elit mollis congue. praesent ipsum d Iam, consectetur vitae ornare a, aliquam a nunc. In id magna pellentesque tellus posuere adipiscing. sed non mi metus, at lacinia augue. sed magna nisi , ornare in mollis in, mollis sed nunc. Etiam
At justo in leo congue mollis. nullam in neque eget metus hendrerit scelerisque
eu non enim. ut malesuada lacus eu nulla bibendum id euismod urna sodales. "" " words = Re.findall (R ' \w+ ', string) #This finds words in the document Lower_ Words = [word.lower () for word in words] #lower all the words Word_counts = counter (lower_words) #counts the number each time a word appears print word_counts # Counter ({' Elit ': 5, ' sed ': 5, ' in ': 5, ' adipiscing ': 4, ', mollis ' ': 4, ' EU ': 3, # ' id ': 3, ' nunc ': 3, ' Consectetur ' the: 3, ' non ': 3, ' ipsum ': 3, ' nulla ': 3, ' pretium ': # 2, ' lacus ': 2, ' ornare ': 2, ' at ': 2, ' praesent ': 2, ' quis ': 2, ' sit ': 2, ' congue ': 2, ' amet ': 2, # ' etiam ': 2, ' Urna ': 2, ' a ': 2, ' Magna ' : 2, ' Lorem ': 2, ' aliquam ': 2, ' ut ': 2, ' ultricies ': 2, ' mi ' : 2, # ' dolor ': 2, ' metus ': 2, ' ' ac ': 1, ' Bibendum ' ' Posuere ': 1, ' enim ': 1, ' ante ': 1, ' sodales ': 1, ' tellus ': 1, # ' vitae ': 1, ' DUI ': 1, ' diam ': 1, ' pellentesque ': 1, ' massa ': 1, ' Vel ': 1, ' nullam ': 1, ' feugiat ': 1, ' luctus ': 1, ' the # ' ' Pulvinar ': 1, ' iaculis ': 1, ' hendrerit ': 1, ' orci ': 1, ' turpis ': 1, ' Nibh ': 1, ' scelerisque ': 1, ' ullamcorper ': 1, # ' eget ': 1, ' neque ': 1, ' euismod ': 1, ' Curabitur ': 1, ' Leo ': 1, ' Sapien ': 1, ' Facilisi ': 1, ' vestibulum ': 1, ' nisi ': 1, # ' Justo ': 1, ' augue ': 1, ' Tempor ': 1, ' lacinia ': 1, ' Malesuada ': 1})
1.2 Deque
Deque is a two-terminal queue (double-ended queue) that is extended by the queue structure, which can be added or deleted at both ends of the queue. So it is also called the head-tail linked list, although there is another special data structure implementation called the name. The
Deque supports thread-safe, optimized append and pop operations, and can be nearly O (1) time complex at both ends of the queue. Although the list also supports similar operations, however, it is very good for a fixed-length list, and when you encounter a pop (0) and insert (0, V) that change the length of the list and change its element position, the complexity becomes O (n).
To see the relevant comparison results:
Import time from collections import deque num = 100000 Def append ( c): for i in range (num): C.append (i) Def appendleft (c): if isinstance (c, deque): for i in range (num): c.appendleft (i) else: for i in range (num): c.insert (0, i) Def pop (c): for i in range (num ): c.pop () Def popleft (c): if isinstance (c, deque): for i in rAnge (num): c.popleft () else: for i in range (num): c.pop (0) For container in [deque, list]: for operation in [append, appendleft,
pop, popleft]: c = container (range (num)) start = time.time () operation (c) elapsed = time.time () - start print "Completed {0}/{1} in {2} seconds: {3} ops/sec ". Format ( container.__name__, operation.__name__, elapsed, num / elapsed) # Completed deque/append in 0.0250000953674 seconds: 3999984.74127 ops/sec # COMPLETED DEQUE/APPENDLEFT IN 0.0199999809265 SECONDS: 5000004.76838 OPS/SEC #  COMPLETED DEQUE/POP IN 0.0209999084473 SECONDS: 4761925.52225 OPS/SEC #  COMPLETED DEQUE/POPLEFT IN 0.0199999809265 SECONDS: 5000004.76838 OPS/SEC #  COMPLETED LIST/APPEND IN 0.0220000743866 SECONDS: 4545439.17637 OPS/SEC #  COMPLETED LIST/APPENDLEFT IN 21.3209998608 SECONDS: 4690.21155917 OPS/SEC #  COMPLETED LIST/POP IN 0.0240001678467 SECONDS: 4166637.52682 OPS/SEC # completed list/popleft in 4.01799988747 seconds: 24888.0046791 ops/sec
Another example is to perform basic queue operations:
From collections import deque
q = deque (range (5))
Q.append (5)
Q.appendleft (6)
print Q
print Q.pop () print
q.popleft () print
q.rotate (3) Print
q print
q.rotate ( -1)
print q
# deque ([6, 0, 1, 2, 3, 4, 5]
# 5 # 6
# None
# deque ([2, 3, 4, 0, 1])
# none
# deque ([3, 4, 0, 1, 2])
Translator Note: Rotate is the rotation operation of the queue, right rotate (positive argument) is to move the elements on the right-hand side to the left, while the left-hand rotate (negative) is the opposite.
1.3 defaultdict
This type is identical to the normal dictionary except for the operation of a nonexistent key. When a nonexistent key operation is found, its default_factory is invoked, provides a default value, and stores the key value. Other parameters are consistent with the normal Dictionary method Dict (), and a defaultdict instance has the same operation as the built-in Dict.
The Defaultdict object is useful when you want to use it to store trace data. For example, suppose you want to track the position of a word in a string, so you can do this:
From collections import Defaultdict
s = "The quick brown fox jumps over the lazy dog"
words = S.split ( )
location = defaultdict (list)
for M, N in Enumerate (words):
location[n].append (m)
print Locatio n
# defaultdict (, {' Brown ': [2], ' lazy ': [7], ' over ': [5], ' Fox ': [3],
# ' dog ': [8], ' Quick ': [1], ' the ': [0, 6], ' Jumps ': [4]})
Choosing lists or sets with defaultdict depends on your purpose, using list to save the order in which you insert elements, and using set does not care about the insertion order of elements, it helps eliminate duplicate elements.
From collections import Defaultdict
s = "The quick brown fox jumps over the lazy dog"
words = S.split ( )
location = defaultdict (set)
for M, N in Enumerate (words):
location[n].add (m)
print Location
# defaultdict (, {' Brown ': Set ([2]), ' lazy ': Set ([7]),
# ' over ': Set ([5]), ' Fox ': Set ([3]), ' Dog ': Set ([8]), ' Quick ': Set ([1]),
# ' the ': set ([0, 6]), ' jumps ': Set ([4])})
Another way to create a multidict:
s = "The quick brown fox jumps over the lazy dog" d = {} words = s.split () for key, value in enumerate (words): d.setdefault (key, []). Append (value) print d # {0: [' the '], 1: [' Quick '], 2: [' Brown '], 3: [' fox '], 4: [' jumps '], 5: [' over '], 6: [' the '], 7: [' lazy '], 8: [' dog '} A more complex example: Class example (dict): def __getitem__ (Self, item): try:            RETURN DICT.__GETITEM__ (Self, item ) except keyerror: value = self[item] = type (self) return value A = example () a[1][2][3] = 4 a[1][3][3] = 5 a[1][2][' test '] = 6 print a # {1: {2: {' Test ': 6, 3: 4}, 3: {3: 5}}}
2. Array
The array module defines a new object type that is much like a list, except that it qualifies the type to fit only one type of element. The type of the array element is determined when it is created and used.
If your program needs to optimize the use of memory, and you are sure that you want the data stored in the list to be the same type, it is appropriate to use the array module. For example, if you need to store 10 million integers, if you use a list, you need at least 160MB of storage space, but if you use array, you only need 40MB. But while it's possible to save space, there's little basic operation on the array that can be faster than the list.
When you use array for calculations, you need to pay special attention to the actions that create the list. For example, when using a list comprehension, the entire array is converted to list, which expands the storage space. A viable alternative is to use a builder expression to create a new array. Look at the code:
Import Array
a = Array.array ("i", [1,2,3,4,5])
B = Array.array (A.typecode, (2*x for x in a))
because using array is to save space, so more inclined to use in-place operations. A more efficient approach is to use enumerate:
import array
a = Array.array ("i", [1,2,3,4,5]) for
I, X in Enumerate (a):
a [I] = 2*x
for a larger array, this in-place modification can increase the speed of at least 15% faster than creating a new array with the generator.
So when do you use array? is when you are considering the factors of calculation, you need to get an array of the same type of elements as the C language.
Import arrayfrom timeit import timer Def arraytest (): a = array.array ("I", [1, 2, 3, 4, 5]) b =
Array.array (a.typecode, (2 * x for x in a)) Def enumeratetest (): a = array.array ("I", [1, 2, 3, 4, 5]) for i, x in enumerate (a): a[i] = 2 * x if __name__== ' __main__ ': m = timer (" Arraytest () ", " "From __main__ import arraytest") n = timer ("Enumeratetest ()", "From __main__ import enumeratetest") Print m.timeit () # 5.22479210582 print n.timeit () # 4.34367196717
The
3. HEAPQ
HEAPQ module uses a heap-implemented priority queue. A heap is a simple ordered list, and it is placed in the relevant rules of the heap. The
Heap is a tree-shaped data structure in which there is a sequential relationship between the child nodes on the tree and the parent node. A binary heap (binary heap) can be identified by an organized list or array structure in which the number of child nodes of element N is 2*n+1 and 2*n+2 (subscript starting at 0). In short, all functions in this module assume that the sequence is ordered, so the first element in the sequence (Seq[0]) is the smallest, and the other parts of the sequence form a binary tree, and the Seq[i node's subnodes are seq[2*i+1] and seq[2*i+2]. When you modify a sequence, the correlation function always ensures that the child node is greater than or equal to the parent node.
Import HEAPQ
heap = []
for value in [A, A, M]:
heapq.heappush (heap, value)
while heap:
print Heapq.heappop (heap)
HEAPQ module has two functions nlargest () and Nsmallest (), as the name suggests, let's look at their usage.
Import HEAPQ
Nums = [1, 8, 2, 7, -4, 2]
print (Heapq.nlargest (3, Nums)) # prints [[a], Panax Notoginseng]
print (HEAPQ. Nsmallest (3, Nums)) # prints [-4, 1, 2]
Two functions can also use a more complex data structure through a key parameter, such as:
IMPORT HEAPQ portfolio = [{' name ': ' IBM ', ' shares ': 100, ' price ': 91.1}, {' name ': ' AAPL ', ' shares ': 50, ' price ': 543.22}, {' name ': ' FB ', ' Shares ': 200, ' price ': 21.09}, {' name ': ' HPQ ', ', ' shares ': 35, ' the ' price ': 31.75}, {' name ': ' YHOO ', ' shares ': 45, ' price ': 16.35}, {' name ': ' ACME ', ' Shares ': 75, ' price ': 115.65}] Cheap = heapq.nsmallest (3, portfolio, key= lambda s: s[' price ']) expensive = heapq.nlargest (3, portfolio, key=lambda s: s[' price '] print cheap # [{' price ': 16.35, ' name ': ' YHOO ', ' Shares ': 45}, # {' price ': 21.09, ' name ': ' FB ', ' shares ': 200}, {' price ' : 31.75, ' name ': ' HPQ ', ' shares ': 35}] print expensive # [{' Price ': &Nbsp;543.22, ' name ': ' AAPL ', ' shares ': 50}, {' price ': 115.65, ' name ': ' ACME ', # ' shares ': 75}, {' price ': 91.1, ' name ': ' IBM ', ' shares ' 100}]
to see how to implement a queue example that sorts according to a given priority, and that each pop operation returns the highest priority element.
IMPORT HEAPQ Class item: def __init__ (self, name): self.name = name     DEF __REPR__ ( Self): return ' Item ({!r}) '. Format (self.name) class priorityqueue: def __init__ (self):
self._queue = [] self._index = 0 def push (self, item, priority): heapq.heappush (self._queue, (-priority, self._index, item))
self._index += 1 def pop (self): return heapq.heappop (Self._queue) [-1] q = PrioriTyqueue () Q.push (item (' foo '), 1) Q.push (item (' Bar '), 5) Q.push (item (' Spam '), 4) Q.push (item (' Grok ') , 1) Print q.pop () # item (' Bar ') Print q.pop () # item (' spam ') print q.pop () # item (' foo ') Print q.pop () # item (' Grok ')
4. bisect
The Bisect module can provide support for keeping list element sequences. It uses a two-point method to do most of the work. It keeps the list in order while inserting elements into a list. In some cases, this is more efficient than repeating a list, and for a larger list it is more efficient to maintain order for each step of the operation than to sort it.
Let's say you have a range collection:
A = [(0, 100), (150, 220), (500, 1000)]
If I want to add a range (250, 400), I might do this:
Import bisect
A = [(0, 100), (150, 220), (500, 1000)]
Bisect.insort_right (A, (250,400))
Print a # [(0, 100), (150, 220), (250, 400), (500, 1000)]
We can use the bisect () function to find the insertion point:
Import bisect
a = [(0, M), (1000)]
Bisect.insort_right (A, (250,400))
bisect.in Sort_right (A, (399, 450)) Print a # [(0, M), (M, v), (g, (), (), ()
]
print Bisect.bisect (A, 550, 1200)) # 5
Bisect (sequence, item) => Index Returns the insertion point where the element should be, but the sequence is not modified.
Import bisect
a = [(0, M), (1000)]
Bisect.insort_right (A, (250,400))
bisect.in Sort_right (A, (399, 450)) Print a # [(0, M), (M, v), (g, (), (), ()
]
print Bisect.bisect (A, 1200)) # 5
Bisect.insort_right (A, (1200))
print a # [(0, 100), (150, 220), (250, 400), (399, 450), 500 1000), (550, 1200)]
The new element is inserted into the 5th position.
The
5. Weakref
Weakref module can help us create Python references without blocking the object's destruction operations. This section contains the basic usage of weak reference and introduces a proxy class.
Before we begin, we need to understand what strong reference is. Strong reference is a pointer to the number of references to an object, the lifecycle, and the timing of the destruction. Strong reference as you can see, this is when you assign an object to a variable:
>>> a = [1,2,3]
>>> b = A
in this case, This list has two strong reference, respectively, A and B. The list will not be destroyed until both of these references are released.
class foo (object): def __init__ (self) : self.obj = none print ' created ' def __del__ (self): print ' destroyed ' def show (self): print self.obj def store (self, obj): self.obj = obj A = Foo () # created b = a del a del b # destroyed
Weak Reference is a reference counter to an object that does not have an impact. When an object exists weak reference, it does not affect the object's revocation. This means that if an object is left with only weak reference, then it will be destroyed.
You can use the WEAKREF.REF function to create an object's weak reference. This function call needs to pass a strong reference as the first argument to the function and return a weak reference.
>
>> Import Weakref
>>> a = Foo ()
created
>>> b = Weakref.ref (a)
>>> b
A temporary strong reference can be created from the weak reference, which is B () in the following example:
>>> a = B ()
True
>>> b (). Show ()
None
Please note that when we delete the strong reference, the object will be destroyed immediately.
>>> del A
Destroyed
If you attempt to use an object through weak reference after the object is destroyed, none is returned:
>>> B () is None
True
If you use Weakref.proxy, you can provide more transparent optional operations relative to Weakref.ref. It also uses a strong reference as the first argument and returns a weak reference,proxy more like a strong reference, but throws an exception when the object does not exist.
>>> a = Foo ()
created
>>> b = Weakref.proxy (a)
>>> b.store (' fish ')
> >> b.show ()
fish
>>> del a
destroyed
>>> b.show ()
Traceback (most recent Call last):
File "", Line 1, in?
Referenceerror:weakly-referenced object no longer exists
Complete Example:
Reference counters are used by the Python garbage collector, and when an object's application counter becomes 0, it is reclaimed by the garbage collector.
It is a good idea to use weak reference for expensive objects or to avoid circular references (although the garbage collector often does this).
Import weakref
Import GC
class MyObject (object):
def my_method (self):
print ' My_method w As called! '
obj = MyObject ()
r = weakref.ref (obj)
gc.collect ()
assert R () is obj #r () allows your to access the Object Referenced:it ' s there.
obj = 1 #Let ' s change what obj references to
gc.collect ()
assert R () are None #There is no object left:it Was GC ' ed.
Tip: Only class instances, functions, methods, sets, frozen sets, files, generators, type objects, and certain object defined in the library module Types (for example, sockets, arrays, and regular expression patterns) support Weakref. Built-in functions and most of the built-in types such as lists, dictionaries, strings, and numbers are not supported.
6. Copy ()
provides function operations for replicated objects through shallow or deep copy syntax. The difference between
Shallow and deep copying is the manipulation of mixed objects, which are objects that contain other types of objects, such as list or other class instances.
for shallow copy, it creates a new mixed object and inserts references from other objects in the original object into the new object.
for deep copy, it creates a new object and recursively duplicates other objects in the source object and inserts the new object.
The general knowledge of assignment operations simply points the heart variable to the source object.
Import copy a = [1,2,3] b = [4,5] C = [A,B] # normal assignment d = c Print id (c) == id (d) # true - d is the same object as c Print id (C[0]) == id (d[0]) # true - d[0] is the same object as c[0] # shallow copy d = copy.copy (c) Print id (c) == id (d) # false - d is now a new object Print id (c[0]) == id (d[0]) # true - d[0] is the same object as c[0] # Deep copy D = copy.deepcopy (c) Print id (c) == id (d) # false - d is now a new object Print id ( C[0]) == id (d[0]) # False - d[0] is now a New object
The
Shallow copy () action creates a new container that contains references to objects in the original object. Objects created by
Deep Copy (Deepcopy ()) contain references to new objects that are copied.
A complex example:
assumes that I have two classes, named manager and graph, that each graph contains a reference to its manager. And each manager has a collection of graph to manage, now we have two tasks to do:
1) to replicate a graph instance, using Deepcopy, but its manager points to the manager of the original graph.
2 Duplicates a manager, creates a new manager completely, but copies all of the original graph.
Import weakref, copy Class graph (object): def __init__ (Self, manager=none): self.manager = none if manager is none else weakref.ref (manager) def __deepcopy
__ (self, memodict): manager = self.manager ()
return graph (Memodict.get (ID (manager), manager)) Class manager (object): def __init__ (self, graphs=[]): self.graphs = graphs for g in self.graphs: g.manager = weakref.ref (self) A = manager ([Graph (), graph ()]) b = copy.deepcopy(a) If [g.manager () is b for g in b.graphs]: Print true # true If copy.deepcopy (A.graphs[0]). Manager () is a: print true # true
7. Pprint ()
The Pprint module provides a more elegant way to print data structures, and if you need to print a more complex, deeper-layered dictionary or JSON object, use Pprint to provide better print results.
Suppose you need to print a matrix, when you use normal print, you can only print out a normal list, but if you use Pprint, you can play a pretty matrix structure.
If
Import Pprint
Matrix = [[1,2,3], [4,5,6], [7,8,9]]
A = Pprint. Prettyprinter (WIDTH=20)
A.pprint (Matrix)
# [[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]]
The extra knowledge
Some of the basic data structures
1. Single Chain list
Class node: def __init__ (self):
self.data = none self.nextnode = none def set_and_return_next (self): self.nextnode = node () return Self.nextnode def getnext (self): return self.nextnode def getdata (self):
return self.data def setdata (self, d): self.data = d Class linkedlist: def buildlist (Self, array):
Self.head = node () self.head.setdata (array[0]) self.temp = self.head for i in array[1:]: self.temp = self.temp.set_and_return_next () self.temp.setdata (i) self.tail = self.temp return self.head
def printlist (self): tempnode = self.head while (tempnode!=self.tail): print (Tempnode.getdata ()) tempnode = tempnode.getnext () print ( Self.tail.getData ()) myarray = [3, 5, 4, 6, 2, 6, 7, 8, 9,  10, 21] mylist = linkedlist () mylist.buildlist (myarray) mylist.printlist ()
2. Plim algorithm implemented in Python
Translator Note: Plim algorithm (prims algorithm) is an algorithm for searching the minimum spanning tree in a weighted connected graph in graph theory.
From collections import defaultdict From heapq import heapify, heappop, heappush Def prim ( nodes, edges ): conn = Defaultdict ( list ) for n1,n2,c in edges: conn[ n1 ].append ( (C, N1, N2) ) conn[ n2 ].append ( (C, N2, N1) )
mst = [] used = set ( nodes[ 0 ] ) usable_edges = conn[ nodes[0] ][:] heapify ( usable_edges ) while usable_edges: cost, n1, n2 = heappop ( usable_edges ) &nbSp; if n2 not in used: used.add ( n2 ) mst.append ( ( n1, n2, cost ) ) for e in conn[ n2 ]: if e[ 2 ] not in used: heappush ( usable_edges, e ) return MST #test nodes = list ("ABCDEFG") edges = [ ("A", "B", 7), ("a", "D", 5), ("B", "C", 8), ( "B", "D", 9), ("B", "E", 7), ("C", "E", 5), ("D", "E", 15), ("D", "F", 6), ( "E", "F", 8), ("E", "G", 9), ("F", "G", )] print "prim:", prim ( nodes, edges )