Assumption: keys are linearly ordered.
Let and be nodes in the tree.
- If node is in the left subtree of , then
- If node is in the right subtree of , then
Every subtree of a BST is a BST.
in_order(x): if x != null: in_order(x.left) print(x.key) in_order(x.right)
- preorder: root, left, right
- inorder: left, root, right
- postorder: left, right, root
Running time: where is the tree size
tree_search(x, k): if x == null or k == x.key return x if k < x.key: return tree_search(x.left, k) return tree_serach(x.right, k)
, where is the height of the tree. is
tree_search(x, k): while x != null and k != x.key: if k < x.key: x = x.left else: x = x.right return x
x != null
tree_successor(x): if x.right != null: return tree_min(x) y = x.p while y != null and x == y.right: x = y y = y.p return y
Red-Black Trees §
- every node is either red or black
- the root is black
- every leaf (NIL) is black
- the children of a red node must be black
- for every node , all simple paths from to descendent leaves have the same number of black nodes.
properties 4 and 5 together ensure that the tree is “roughly balanced”.
AVL Trees §
Binary search trees that are balanced.
For every node with a left subtree and a right subtree , . The heights of subtrees and differ by at most 1.
Assumption: each node is stored on a separate disk page.
The branching factor of B trees can be large.
A node with keys has children.
A rooted tree:
- every node
- number of keys stored
- the keys
x.key <= x.key <= ... <= x.key[x.n](ordered)
- boolean value
- number of keys stored
- every internal node
x.n + 1pointers
x.c[x.n + 1]to its children
- the keys separate the ranges of keys for subtrees
- all leaves have the same depth, which is equal to the height
hof the tree
- a b-tree has a parameter called the minimum degree
- each node (except root) has at least keys. each internal node (except root) has at least children.
- each node has at msot keys
Consider a non-empty B-tree with minimum degree . The height of is .
# assumption: x.node has already been read from disk k.key b_tree_search(x, k): i = 1 while i <= x.n and k > x.key[i]: i++ if i <= x.n and k == x.key[i]: return (x, i) else if x.leaf: return null else: disk_read(x, c[i]) return b_tree_search(x.c[i], k)
reads from disk and running time.
Hash Tables §
Data structure that stores keys with additional satellite data
The universe is small §
Direct address table: array
- If key
kis not present,
T[k] = NIL
- If key
kis present and has value
T[k] = val
The universe is large; the set of present keys is small §
Search: time in the average case
T[0..m-1]where typically m \llt |U|
- hash function
- key hashes to the slot
values stored in
T[h(k)]. collisions unavoidable since .
chaining: elements that hash to the same value are placed into a linked list.
insert(T, x): insert
xat the head of list
search(T, k): search for
kin the list
delete(T, x): delete
xfrom the list
worst case for search: all keys hash to the same slot. searching will be .
[Assumption: simple, uniform hashing]
We insert keys into a hash table with slots
length of the list
Greedy Algorithms §
Interval Partitioning §
The depth of a set of intervals is the maximum number of intervals that contain any given time.
Number of classrooms needed ≥ Depth
- Consider intervals in increasing order of start time.
- Assign any compatible classroom.
Proof strategy: Find a structural bound asserting that every possible solution must have been at least a certain value, then show that the algorithm always achieves this bound.
Scheduling to Minimize Lateness §
- Satisfy all requestts
- Allow certain requests to run late
- Minimize the max lateness
- Single resource processing one job at a time
- Job requires time and is due at
- If starts at , it finishes at
- Schedule to minimize max lateness
“Earliest deadline first” is optimal.
greedy-is-optimal proof techniques:
- greedy always stays ahead
- exchange argument
Minimum Spanning Tree §
Spanning tree: Given a connected, undirected graph with real edge weights , a spanning tree contains a subset of that connects all vertices and whose total weight . is undirected (no root) and is, by definition, acyclic.
When has nodes, has edges
Minimum spanning tree (MST): a spanning tree of with lowest total weight
Properties of MSTs:
- every connected, undirected graph has at least one MST
- having distinct edge costs is a sufficient (but not necessary) condition for to have a unique MST
- between every pair of nodes in a spanning tree, there is exactly one path
A cut or of is a partition of the nodes into the two non-empty sets and
Cut property: For a cut in , the MST contains the min-cost crossing edge
Cycle property: For a cycle in the MST does not contain the max-cost edge in
Kruskal’s algorithm §
- Sort the edges in ascending order of cost
- Iterate over the edges in order
- if adding edge to creates a cycle, discard (cycle property)
- Else, insert into
We need to be able to efficiently check if adding an edge will form a cycle. We need to maintain a set for each connected component.
As an edge is considered:
- find the CCs containing and
- if they’re different there’s no path between and
- add edge
- merge the components
- if they’re the same there’s already a path between and . don’t add.
make_set: construct a singleton set
find(u): find the set containing
union(a, b): merge the two subsets
binto a single set
Best implemented with disjoint set forests.
- union by rank
- path compression
Kruskal takes for sorting and for find operations. grows very slowly (effectively a small constant).
Dynamic Programming §
Edit Distance §
Finally, for :
Optimal BST §
Randomized Algorithms §
Find the -th largest element in a given array.
def quicksort(S): if |S| <= 3: sort S return S choose a splitter x uniformly at random for each element S_i of S: if S_i <= x: put S_i in S_ if S_i > x: put S_i in S__ quicksort(S_) quicksort(S__) return S_ + S__
Maximum Flow §
A flow graph is a directed, weighted graph in which each edge has a capacity . source , sink .
- every vertex lies on some path from to
- no parallel edges
A flow is a real-valued function :
- [capacity] for each edge ,
- [conservation] for each ,
The value of flow is
Minimum cut: partition which minimizes the capacity across the two parts
Max flow = Min cut
For a flow and an - cut , the net-flow across the cut is equal to the net amount leaving .
Flow value lemma: the net flow across the cut is equal to the value of the flow.
The flow is bounded by cut capacity. For a flow and a cut , the value of the flow is at most the capacity of the cut.