Reasoning About Algorithms
COMP 382
BSTs §
Assumption: keys are linearly ordered.
Let and be nodes in the tree.
- If node is in the left subtree of , then
 - If node is in the right subtree of , then
 
Every subtree of a BST is a BST.
Traversal §
in_order(x):
 if x != null:
  in_order(x.left)
  print(x.key)
  in_order(x.right)
- preorder: root, left, right
 - inorder: left, root, right
 - postorder: left, right, root
 
Running time: where is the tree size
Search §
tree_search(x, k):
 if x == null or k == x.key
  return x
 if k < x.key:
  return tree_search(x.left, k)
 return tree_serach(x.right, k)
, where is the height of the tree. is
tree_search(x, k):
 while x != null and k != x.key:
  if k < x.key:
   x = x.left
  else:
   x = x.right
 return x
Successor §
Pre-condition: x != null
tree_successor(x):
 if x.right != null:
  return tree_min(x)
 y = x.p
 while y != null and x == y.right:
  x = y
  y = y.p
 return y
Red-Black Trees §
- every node is either red or black
 - the root is black
 - every leaf (NIL) is black
 - the children of a red node must be black
 - for every node , all simple paths from to descendent leaves have the same number of black nodes.
 
properties 4 and 5 together ensure that the tree is “roughly balanced”.
AVL Trees §
Binary search trees that are balanced.
For every node with a left subtree and a right subtree , . The heights of subtrees and differ by at most 1.
Balance factor
B-Trees §
Assumption: each node is stored on a separate disk page.
The branching factor of B trees can be large.
A node with keys has children.
A rooted tree:
- every node 
xhas attributes- number of keys stored 
x.n - the keys 
x.key[1] <= x.key[2] <= ... <= x.key[x.n](ordered) - boolean value 
x.leaf 
 - number of keys stored 
 - every internal node 
xhasx.n + 1pointersx.c[1],x.c[2], …,x.c[x.n + 1]to its children - the keys separate the ranges of keys for subtrees
 - all leaves have the same depth, which is equal to the height 
hof the tree - a b-tree has a parameter  called the minimum degree
- each node (except root) has at least keys. each internal node (except root) has at least children.
 - each node has at msot keys
 
 
Consider a non-empty B-tree with minimum degree . The height of is .
# assumption: x.node has already been read from disk k.key
b_tree_search(x, k):
 i = 1
 while i <= x.n and k > x.key[i]:
  i++
 if i <= x.n and k == x.key[i]:
  return (x, i)
 else if x.leaf:
  return null
 else:
  disk_read(x, c[i])
  return b_tree_search(x.c[i], k)
reads from disk and running time.
Hash Tables §
Data structure that stores keys with additional satellite data
The universe is small §
Direct address table: array T[0..m-1].
- If key 
kis not present,T[k] = NIL - If key 
kis present and has valueval,T[k] = val 
The universe is large; the set of present keys is small §
Memory:
Search: time in the average case
Components:
- table 
T[0..m-1]where typically m \llt |U| - hash function 
- key hashes to the slot
 
 
values stored in T[h(k)]. collisions unavoidable since .
chaining: elements that hash to the same value are placed into a linked list.
insert(T, x): insertxat the head of listT[h(x.key)]search(T, k): search forkin the listT[h(k)]delete(T, x): deletexfrom the listT[h(x.key)]
worst case for search: all keys hash to the same slot. searching will be .
[Assumption: simple, uniform hashing]
We insert keys into a hash table with slots
 length of the list T[j]
load factor
Greedy Algorithms §
Interval Partitioning §
The depth of a set of intervals is the maximum number of intervals that contain any given time.
Number of classrooms needed ≥ Depth
- Consider intervals in increasing order of start time.
 - Assign any compatible classroom.
 
Proof strategy: Find a structural bound asserting that every possible solution must have been at least a certain value, then show that the algorithm always achieves this bound.
Scheduling to Minimize Lateness §
- Satisfy all requestts
 - Allow certain requests to run late
 - Minimize the max lateness
 
Formally:
- Single resource processing one job at a time
 - Job requires time and is due at
 - If starts at , it finishes at
 - Lateness:
 - Schedule to minimize max lateness
 
“Earliest deadline first” is optimal.
greedy-is-optimal proof techniques:
- greedy always stays ahead
 - structural
 - exchange argument
 
Minimum Spanning Tree §
Spanning tree: Given a connected, undirected graph with real edge weights , a spanning tree contains a subset of that connects all vertices and whose total weight . is undirected (no root) and is, by definition, acyclic.
When has nodes, has edges
Minimum spanning tree (MST): a spanning tree of with lowest total weight
Properties of MSTs:
- every connected, undirected graph has at least one MST
 - having distinct edge costs is a sufficient (but not necessary) condition for to have a unique MST
 - between every pair of nodes in a spanning tree, there is exactly one path
 
A cut or of is a partition of the nodes into the two non-empty sets and
Cut property: For a cut in , the MST contains the min-cost crossing edge
Cycle property: For a cycle in the MST does not contain the max-cost edge in
Kruskal’s algorithm §
- Sort the edges in ascending order of cost
 - Iterate over the edges in order
- if adding edge to creates a cycle, discard (cycle property)
 - Else, insert into
 
 
We need to be able to efficiently check if adding an edge will form a cycle. We need to maintain a set for each connected component.
As an edge is considered:
- find the CCs containing and
 - if they’re different there’s no path between  and 
- add edge
 - merge the components
 
 - if they’re the same there’s already a path between and . don’t add.
 
Union-Find.
make_set: construct a singleton setfind(u): find the set containinguunion(a, b): merge the two subsetsaandbinto a single set
Best implemented with disjoint set forests.
- union by rank
 - path compression
 
Kruskal takes for sorting and for find operations. grows very slowly (effectively a small constant).
Dynamic Programming §
Edit Distance §
LCS §
Finally, for :
Optimal BST §
Randomized Algorithms §
Selection §
Find the -th largest element in a given array.
Quicksort §
def quicksort(S):
 if |S| <= 3:
  sort S
  return S
 choose a splitter x uniformly at random
 for each element S_i of S:
  if S_i <= x: put S_i in S_
  if S_i > x: put S_i in S__
 quicksort(S_)
 quicksort(S__)
 return S_ + S__
Maximum Flow §
A flow graph is a directed, weighted graph in which each edge has a capacity . source , sink .
- every vertex lies on some path from to
 - no parallel edges
 
A flow is a real-valued function :
- [capacity] for each edge ,
 - [conservation] for each ,
 
The value of flow is
Minimum cut: partition which minimizes the capacity across the two parts
Max flow = Min cut
For a flow and an - cut , the net-flow across the cut is equal to the net amount leaving .
Flow value lemma: the net flow across the cut is equal to the value of the flow.
The flow is bounded by cut capacity. For a flow and a cut , the value of the flow is at most the capacity of the cut.
Computational Complexity §
A problem is a decision problem iff for any input its output is Yes or No.
A decision problem  defines a language: an input  iff the output for  is Yes.
- reduction by simple equivalence
 - reduction by encoding
 
SAT: Given a CNF formula with clauses over a set of variables, determine if there exists a satisfying truth assignment.
3-SAT: Given a CNF formula with clauses each of length 3 over a set of variables, determine if there exists a satisfying truth assignment.
Transitivity:
Turing Machine §
- Init state
 - Tape symbols
 - Transition functions. Based on current state and symbol,
- Change state (optional)
 - Change symbol (optional)
 - Move tape head by up to square (optional)