Collections of data in Python may be held in many ways. But depending on what we want to do with a collection, some approaches are better than others.
In this project we shall look briefly at simple lists, linked lists and finally binary trees. In each case we want to be able to do the following operations efficiently.
It usually helps if the collection is sorted so that searching can be optimized. In general, searching for items is much more frequent that inserting or removing items.
To keep things simple we will assume that our collections are sorted. That means we must be able to compare items with the operators less than, equal to, and greater than.
Additionally, to keep things simple we will use examples where
The easiest way to make a collection a list is to simply use the built-in Python list. This collection integers is sorted.
>>> collection = [3,6,9,17]
>>> collection
[3, 6, 9, 17]
We can insert a new value with the append method
>>> collection.append(25)
>>> collection
[3, 6, 9, 17, 25]
We can check to see if a given item is in the collection.
>>> 9 in collection
True
>>> 8 in collection
False
And if Python did not have the in operator, we could create a function like the following. Notice that this function is recursive. It could also be written with a while or for loop. But this version can serve as an introduction to more complex structures later.
>>> def contains(collection, value) :
... if not collection : return False # collection empty
... elif value == collection[0] : return True # value is in the first slot
... else : return contains(collection[1:], value) # value is somewhere further?
...
>>> contains(collection, 17)
True
>>> contains(collection, 10)
False
Or, if we have know the list is sorted we can stop early
>>> def contains(collection, value) :
... if not collection : return False
... elif value == collection[0] : return True
... elif value < collection[0] : return False # only if sorted
... else : return contains(collection[1:], value)
To remove a value we can splice the list around it. Here we shall use a for loop to find the split point.
>>> def remove(collection, value) :
... for i in range(len(collection)) :
... if value == collection[i] :
... return collection[0:i]+collection[i+1:] # remove and rejoin
Our linked lists will be built with nodes containing both a value and a pointer to the next node.
We can make our linked list using a simple 2 element Python list for each node. The first element of a node is its value and second element is the remaining nodes. The last node in the linked list will have the value None in the second element. Here is an example.
>>> from lists import sampleList, showList, contains
>>> print sampleList
[10, [20, [30, [40, None]]]]
>>> print showList(sampleList)
10 --> 20 --> 30 --> 40
>>> print contains(sampleList, 25)
False
Here is a function contains() to find if a value is in a linked list.
15 def contains(l, value) :
16 if l == None : return False
17 elif l[0] == value : return True
18 else : return contains(l[1],value)
>>> print contains(sampleList, 30)
True
The traditional way to insert a node into a sorted list is to manipulate the pointers. There are 3 possible situations. Will the new node go at the beginning of the list, at the end of the list or somewhere in the middle? Here is some code from lists.py
20 def insert1 (nod0, value) :
21 if nod0 == None : return [value, None]
22 elif nod0[0] > value : return [value, nod0[1]] # insert at begining
23 else :
24 nod = nod0
25 while nod[1] :
26 if nod[1][0] >= value : # next node in the chain?
27 ins = (value, nod[1]) # new node built
28 nod[1] = ins # squeeze it in
29 break
30 else : nod = nod[1]
31 if nod[1] == None : nod[1] = (value, None) # becomes the tail
32 return nod0
Using a more functional approach makes the code much tighter with just a single line for each of the three conditions. It is a bit trickier though.
It is important that you understand what is happening in line 37. It replaces lines 23-32 in the first version. In successive calls from line 37 the list is traversed until the insertion point is found. Then a new sub-list is made headed by the new value and followed by the rest of the original list. And finally, all the nodes that preceded the insertion point copied back to the beginning. What we then have is a new list that share a tail of nodes with the original. Take your time.
34 def insert2(l, value) :
35 if l == None : return [value, None]
36 elif l[0] > value : return [value, l]
37 else : return [l[0], insert2(l[1], value)]
If we were to print the successive return values they would look like this:
Input value to either insert or remove: 35
[35, [40, None]]
[30, [35, [40, None]]]
[20, [30, [35, [40, None]]]]
[15, [20, [30, [35, [40, None]]]]]
[10, [15, [20, [30, [35, [40, None]]]]]]
10 --> 15 --> 20 --> 30 --> 35 --> 40
Deleting a node can also be done two ways.
The first approach in deleting node N is to modify the previous node’s pointer to skip over N. In lines 40 and 41 of delete1() the special cases (first and last) nodes are handled.
39 def delete1 (nod0, value) :
40 if nod0 == None : return None
41 elif nod0[0] == value : return nod0[1] # if first node, just skip
42 else :
43 nod = nod0
44 while nod[1] :
45 if nod[1][0] == value : # next node in the chain?
46 nod[1] = nod[1][1] # skip the next
47 break
48 else : nod = nod[1]
49 return nod0
delete2() is a simpler approac. It basically builds a new list with the remaining nodes still in the proper order. If we know the the values are unique and sorted then the resulting new linked list can actually just merge with the original past the point of deletion. This is very similar to insert2() above.
51 def delete2(l, value) :
52 if l == None : return None
53 elif l[0] != value : return [l[0], delete2(l[1],value)]
54 else : return delete2(l[1], value)
The function showList() prints the list with the node values are seperated by "-->", as in the example above.
07 def showList(l, accum="") :
08 if l == None : return accum
09 else :
10 if accum : sep = " --> "
11 else : sep = ""
12 accum += sep + str(l[0])
13 return showList(l[1], accum)
The module lists.py can also be run stand-alone to experiment with the recursive versions of insert and delete
$ python lists.py
10 --> 20 --> 30 --> 40
Input value to either insert or remove: 25
10 --> 20 --> 25 --> 30 --> 40
Input value to either insert or remove: 25
10 --> 20 --> 30 --> 40
Input value to either insert or remove: 10
20 --> 30 --> 40
With the functional versions of insert(), contains() and delete(), we no longer must modify existing nodes. The 2 element lists can be replaced with 2 element tuples. This is a good idea for two reasons. One, tuples use much less memory than lists and two, not being able to modify nodes is a very good thing. Making modifications to list structures can lead to subtle bugs, especially if lists are shared by concurrent threads.
The python programs lists.py and tuples.py can be viewed seperately. They are quite similar.
The zip file. downloads the entire package
Binary trees are basically two dimensional linked lists. Each node has a value and pointers to two sub-trees, one to the left and one the right. Both sub-trees may either be the value None or be the root node to another binary tree. The left subtree contains only nodes whose values are less than the value of the parent node. The right subtree contains only nodes whose values are greater. As this is true for every node, the whole network of nodes is sorted.
As we did above, we will use the node’s value to identify it. So "node 70" is the node containing the value of 70. Of course, just like in linked lists, values in the real world don't have to be just integers. They can take any form.
A binary tree is a recursive data structure. Here is an example. Notice that in every left subtree values are less than any nodes above and those on the right are greater.
You can see how nodes 20 and 70 have no right hand subtree and how nodes 10, 40, 60 and 90 have no subtrees at all.
The tree is balanced. Although there are nine nodes in the tree each node can be reached from the root in three steps or less.
The python representation of this tree is the following.
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))
As with linked lists, we want to be able to do the following:
The code for finding a value is simple. Function contains() starts at the root node and traverses the tree with each step either to the left or right. If it lands on the node with the target value, it returns True. Reaching the end returns False. The final value is passed back through the series of returns and finally pops out at the top.
34 def contains(tree, target) :
35 if tree == None : return False
36 (value,left,right) = tree
37 if value == target : return True
38 elif value > target : return contains(left,target)
39 else : return contains(right,target)
40
And here it is in action…
>>> from bintree import contains, mytree
>>> print(mytree)
(50, (30, (20, (10, None, None), None), (40, None, None)),(80, (70, (60, None, None), None), (90, None, None)))
>>> contains(mytree, 60)
True
>>> contains(mytree, 65)
False
Inserting a new node somewhere in the tree requires first finding the right place for the insertion. Then we build a linked list of new nodes taking us back to a new root and keeping as much of the original as we can. With each return using lines 25,27,28,30,32 we create a new node linked to those below it.
24 def insert(tree, newValue) :
25 if not tree : return (newValue,None,None)
26 (value,left,right) = tree
27 if value == None : return (newValue,None,None)
28 if value == newValue : return tree # already there
29 elif newValue < value :
30 return (value, insert(left,newValue), right)
31 else :
32 return (value, left, insert(right,newValue))
33
Here is a diagram of the results of adding a value of 65. The nodes in yellow are from the original tree and you could check this by looking at their id's. The red nodes are new and lead back to the top to the tree. The original tree continues to exist. The new tree has a new root but merges with the original where it can.
So, we see that the original tree is still intact and referenced by oldtree
>>> mytree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))
>>>
>>> oldtree = mytree
>>> from bintree import insert
>>>
>>>
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, (65, None, None)), None), (90, None, None)))
>>> oldtree
(50, (30, (20, (10, None, None), None), (40, None, None)),
(80, (70, (60, None, None), None), (90, None, None)))
Deleting a node is somewhat more complex. The following shows the effect of deleteing node 40.
Basically, when the root of a subtree is deleted, one of other nodes must take its place. The new root node must maintain the conditions. The minimum node in the right subtree works fine since its value is still larger than anything in the left subtree and smaller than any other node on the right. An even simpler process can be used if either subtree is empty.
If we have found the node to delete (line 44) and if both left and right branches are present then we find the minimum value on the right side using the function minval. A new node is created (line 48) and a new branch built (in red) back to the top.
If either the left or right branch is empty then the other branch can simply take the place of the node deleted (lines 49,50). Lines 51 through 54 traverse down the tree to find the node to delete. Here is the code.
41
42
43 (value, left, right) = tree
44 if target == value : # delete this node
45 if left and right :
46 swp = minval(right)
47 #print("Swap val", swp)
48 return (swp, left, delete(right,swp))
49 if left == None : return right
50 if right == None : return left
51 elif target < value :
52 return (value, delete(left,target), right)
53 elif target > value :
54 return (value, left, delete(right,target))
57 def minval(tree) :
58 (val,left,right) = tree
59 if left == None : return val
60 else : return minval(left)
61
An optimumly balanced tree can be very time efficient. In a balanced tree with 1024 nodes every node is within 10 steps of the root (210 is 1024). Deletes and Inserts make new trees by only building new paths of length 10 of 11. But this efficiency can break down if the sequence of inserts is not random.
In the next picture 2 nodes have been inserted. They leave the tree somewhat out of balance. Instead of all nodes being within 3 step from the root, node 62 is five steps away
After rebalancing, all nodes are again within 3 steps from the root.
Below are the functions to make this happen. The idea is to convert the tree into a sorted Python list. This is done in values by simply traversing the tree left to right recursively. When the list is complete the optimum root for the binary tree is in the middle of the list. Take values to the left of our root and create a sub-tree for its left link. The values to the right create a sub-tree for its right link. Do this recursively and a balanced tree simply unfolds.
The function values builds the sorted Python list. Sorting happens automatically by the way the tree is traversed. Function balance2 is also recursive and a sub-root node is inserted at each iteration.
07
08 def values(tree) :
09 if not tree : return []
10 (value,left,right) = tree
11 return values(left) + [value] + values(right)
12
13 def balance(tree) :
14 return balance2(None, values(tree))
15
16 def balance2(btree, sorted) :
17 if not sorted : return btree
18 split = len(sorted)/2
19 btree = insert (btree,sorted[split]) # midpoint
20 btree = balance2(btree,sorted[0:split]) # left side
21 btree = balance2(btree,sorted[split+1:]) # right side
22 return btree
23
If the number of nodes is a power of 2 then the tree will be perfectly balanced. Otherwise there will be a partially filled in row at the bottom from right to left.
The program test.py uses bintree.py and display.py to generate graphics similar to those above (created by an earlier pygame version).
display.py, in turn, uses John Zelle's graphics.py to create windows. You need to have Tkinter installed but you no longer need pygame. The python modules in the project and graphics.py are compatible with both Python 2.7 and Python 3. You may use either.
$ python test.py (or python3 test.py)
This should hould paint a window with the tree above, 3 buttons at the bottom and text entry for a number. The Add/Del button will either insert the number or delete it. The Balance button will rebalance the tree, and the Exit button is pretty obvious.
Below is the code for the display module.
The window is fairly large (600 pixels). There are global lists for buttons and for items temporarily drawn. A single entry box lets you input a number for a node to either add or delelte.
1 # display.py
2 #
3 import sarg, random, math
4 from button import Button
5 from graphics import *
6 from sys import maxsize
7
8 pixels = sarg.Int("pixels", 600)
9
10 win = GraphWin("Binary Trees",pixels,pixels)
11 drawn = []
12 buttons = []
13 entry = Entry(Point(300, 540), 2)
The static items in the window are the 3 buttons near the bottom and the text entry box.
14
15 def drawStatic() :
16 buttons.append(Button(Point(150,570), "Balance"))
17 buttons.append(Button(Point(300,570), "Add/Del"))
18 buttons.append(Button(Point(450,570), "Exit"))
19 for but in buttons : but.draw(win)
20 entry.draw(win)
21 entry.setText("45")
22
The wait-for-user-input function. You may add a number to the entry box. Clicking the mouse within the rectanlge of a button will return the label of the button (line 29) or the value of the text entry. (line 28). Other clicks are ignored.
23 def waitForButton() :
24 pnt = win.getMouse()
25 val = entry.getText()
26 for but in buttons :
27 if but.inside(pnt) :
28 if but.text == \"Add/Del\": return val
29 else : return but.text
30 return waitForButton() \# keep trying
31
The functions below are pretty much boiler-plate and simple if you are familiar with the Zelle graphics module. (link needed)
32 def drawTree(tree, pos, width, stepdown, ids=None) :
33 for obj in drawn : obj.undraw()
34 drawNode(tree, pos, width, stepdown, ids)
35
36 def displayOff() :
37 win.close()
38
39 def drawText(color, x,y, text) :
40 txt = Text(Point(x,y), str(text))
41 txt.setFill(color)
42 txt.draw(win)
43 drawn.append(txt)
44
45 def drawLine(color, x,y, next_xy, width) :
46 x2,y2 = next_xy
47 p1 = Point(x,y); p2 = Point(x2,y2)
48 line = Line(p1,p2)
49 line.setFill(color)
50 line.draw(win)
51 drawn.append(line)
52
drawNode() is more interesting. It recursively draws the subtrees, top to down, left and right. Lines connecting nodes and the text of a node are drawn yellow if part of a previous tree (line 66) or red if newly created (67). ids is a list of Python ids to tag the previous nodes
53 def drawNode(node, pos, width, stepdown, ids) :
54 if node == None : return
55 x,y = pos; value,left,right = node
56 halfwidth = width/2
57 nextR = (x+halfwidth,y+stepdown)
58 if left :
59 nextL = (x-halfwidth,y+stepdown)
60 drawLine("white", x,y, nextL, width)
61 drawNode(left , nextL, halfwidth, stepdown, ids)
62 if right :
63 nextR = (x+halfwidth,y+stepdown)
64 drawLine("white", x,y, nextR, width)
65 drawNode(right, nextR, halfwidth, stepdown, ids)
66 if ids and id(node) in ids : color = "yellow"
67 else : color = "red"
68 drawText(color, x,y, str(value))
.
Download and unzip bintree.zip
These basic algorithms for manipulating binary trees are fairly straight forward and quite fast and efficient. They find uses in many places.
Finally, when possible, recursive data structures should always be handled with recursive algorithms. Modern languages, like Python, make this fairly straight-forward. And, although there is a learning curve, it is definitely worth the effort.
If you have comments or suggestions You can email me at mail me
Copyright © 2019-2021 Chris Meyers and Fred Obermann
* * *