Reasoning About Algorithms
COMP 382
BSTs §
Assumption: keys are linearly ordered.
Let and be nodes in the tree.
- If node is in the left subtree of , then
- If node is in the right subtree of , then
Every subtree of a BST is a BST.
Traversal §
in_order(x):
if x != null:
in_order(x.left)
print(x.key)
in_order(x.right)
- preorder: root, left, right
- inorder: left, root, right
- postorder: left, right, root
Running time: where is the tree size
Search §
tree_search(x, k):
if x == null or k == x.key
return x
if k < x.key:
return tree_search(x.left, k)
return tree_serach(x.right, k)
, where is the height of the tree. is
tree_search(x, k):
while x != null and k != x.key:
if k < x.key:
x = x.left
else:
x = x.right
return x
Successor §
Pre-condition: x != null
tree_successor(x):
if x.right != null:
return tree_min(x)
y = x.p
while y != null and x == y.right:
x = y
y = y.p
return y
Red-Black Trees §
- every node is either red or black
- the root is black
- every leaf (NIL) is black
- the children of a red node must be black
- for every node , all simple paths from to descendent leaves have the same number of black nodes.
properties 4 and 5 together ensure that the tree is “roughly balanced”.
AVL Trees §
Binary search trees that are balanced.
For every node with a left subtree and a right subtree , . The heights of subtrees and differ by at most 1.
Balance factor
B-Trees §
Assumption: each node is stored on a separate disk page.
The branching factor of B trees can be large.
A node with keys has children.
A rooted tree:
- every node
x
has attributes- number of keys stored
x.n
- the keys
x.key[1] <= x.key[2] <= ... <= x.key[x.n]
(ordered) - boolean value
x.leaf
- number of keys stored
- every internal node
x
hasx.n + 1
pointersx.c[1]
,x.c[2]
, …,x.c[x.n + 1]
to its children - the keys separate the ranges of keys for subtrees
- all leaves have the same depth, which is equal to the height
h
of the tree - a b-tree has a parameter called the minimum degree
- each node (except root) has at least keys. each internal node (except root) has at least children.
- each node has at msot keys
Consider a non-empty B-tree with minimum degree . The height of is .
# assumption: x.node has already been read from disk k.key
b_tree_search(x, k):
i = 1
while i <= x.n and k > x.key[i]:
i++
if i <= x.n and k == x.key[i]:
return (x, i)
else if x.leaf:
return null
else:
disk_read(x, c[i])
return b_tree_search(x.c[i], k)
reads from disk and running time.
Hash Tables §
Data structure that stores keys with additional satellite data
The universe is small §
Direct address table: array T[0..m-1]
.
- If key
k
is not present,T[k] = NIL
- If key
k
is present and has valueval
,T[k] = val
The universe is large; the set of present keys is small §
Memory:
Search: time in the average case
Components:
- table
T[0..m-1]
where typically m \llt |U| - hash function
- key hashes to the slot
values stored in T[h(k)]
. collisions unavoidable since .
chaining: elements that hash to the same value are placed into a linked list.
insert(T, x)
: insertx
at the head of listT[h(x.key)]
search(T, k)
: search fork
in the listT[h(k)]
delete(T, x)
: deletex
from the listT[h(x.key)]
worst case for search: all keys hash to the same slot. searching will be .
[Assumption: simple, uniform hashing]
We insert keys into a hash table with slots
length of the list T[j]
load factor
Greedy Algorithms §
Interval Partitioning §
The depth of a set of intervals is the maximum number of intervals that contain any given time.
Number of classrooms needed ≥ Depth
- Consider intervals in increasing order of start time.
- Assign any compatible classroom.
Proof strategy: Find a structural bound asserting that every possible solution must have been at least a certain value, then show that the algorithm always achieves this bound.
Scheduling to Minimize Lateness §
- Satisfy all requestts
- Allow certain requests to run late
- Minimize the max lateness
Formally:
- Single resource processing one job at a time
- Job requires time and is due at
- If starts at , it finishes at
- Lateness:
- Schedule to minimize max lateness
“Earliest deadline first” is optimal.
greedy-is-optimal proof techniques:
- greedy always stays ahead
- structural
- exchange argument
Minimum Spanning Tree §
Spanning tree: Given a connected, undirected graph with real edge weights , a spanning tree contains a subset of that connects all vertices and whose total weight . is undirected (no root) and is, by definition, acyclic.
When has nodes, has edges
Minimum spanning tree (MST): a spanning tree of with lowest total weight
Properties of MSTs:
- every connected, undirected graph has at least one MST
- having distinct edge costs is a sufficient (but not necessary) condition for to have a unique MST
- between every pair of nodes in a spanning tree, there is exactly one path
A cut or of is a partition of the nodes into the two non-empty sets and
Cut property: For a cut in , the MST contains the min-cost crossing edge
Cycle property: For a cycle in the MST does not contain the max-cost edge in
Kruskal’s algorithm §
- Sort the edges in ascending order of cost
- Iterate over the edges in order
- if adding edge to creates a cycle, discard (cycle property)
- Else, insert into
We need to be able to efficiently check if adding an edge will form a cycle. We need to maintain a set for each connected component.
As an edge is considered:
- find the CCs containing and
- if they’re different there’s no path between and
- add edge
- merge the components
- if they’re the same there’s already a path between and . don’t add.
Union-Find.
make_set
: construct a singleton setfind(u)
: find the set containingu
union(a, b)
: merge the two subsetsa
andb
into a single set
Best implemented with disjoint set forests.
- union by rank
- path compression
Kruskal takes for sorting and for find operations. grows very slowly (effectively a small constant).
Dynamic Programming §
Edit Distance §
LCS §
Finally, for :
Optimal BST §
Randomized Algorithms §
Selection §
Find the -th largest element in a given array.
Quicksort §
def quicksort(S):
if |S| <= 3:
sort S
return S
choose a splitter x uniformly at random
for each element S_i of S:
if S_i <= x: put S_i in S_
if S_i > x: put S_i in S__
quicksort(S_)
quicksort(S__)
return S_ + S__
Maximum Flow §
A flow graph is a directed, weighted graph in which each edge has a capacity . source , sink .
- every vertex lies on some path from to
- no parallel edges
A flow is a real-valued function :
- [capacity] for each edge ,
- [conservation] for each ,
The value of flow is
Minimum cut: partition which minimizes the capacity across the two parts
Max flow = Min cut
For a flow and an - cut , the net-flow across the cut is equal to the net amount leaving .
Flow value lemma: the net flow across the cut is equal to the value of the flow.
The flow is bounded by cut capacity. For a flow and a cut , the value of the flow is at most the capacity of the cut.
Computational Complexity §
A problem is a decision problem iff for any input its output is Yes
or No
.
A decision problem defines a language: an input iff the output for is Yes
.
- reduction by simple equivalence
- reduction by encoding
SAT: Given a CNF formula with clauses over a set of variables, determine if there exists a satisfying truth assignment.
3-SAT: Given a CNF formula with clauses each of length 3 over a set of variables, determine if there exists a satisfying truth assignment.
Transitivity:
Turing Machine §
- Init state
- Tape symbols
- Transition functions. Based on current state and symbol,
- Change state (optional)
- Change symbol (optional)
- Move tape head by up to square (optional)