Shreyas’ Notes

Reasoning About Algorithms

COMP 382

fall, sophomore year

BSTs §

Assumption: keys are linearly ordered.

Let xx and yy be nodes in the tree.

Every subtree of a BST is a BST.

Traversal §

in_order(x):
 if x != null:
  in_order(x.left)
  print(x.key)
  in_order(x.right)

Running time: Θ(n)\Theta(n) where nn is the tree size

tree_search(x, k):
 if x == null or k == x.key
  return x

 if k < x.key:
  return tree_search(x.left, k)

 return tree_serach(x.right, k)

O(h)O(h), where hh is the height of the tree. hh is

tree_search(x, k):
 while x != null and k != x.key:
  if k < x.key:
   x = x.left
  else:
   x = x.right

 return x

Successor §

Pre-condition: x != null

tree_successor(x):
 if x.right != null:
  return tree_min(x)

 y = x.p
 while y != null and x == y.right:
  x = y
  y = y.p
 return y

Red-Black Trees §

  1. every node is either red or black
  2. the root is black
  3. every leaf (NIL) is black
  4. the children of a red node must be black
  5. for every node uu, all simple paths from uu to descendent leaves have the same number of black nodes.

properties 4 and 5 together ensure that the tree is “roughly balanced”.

AVL Trees §

Binary search trees that are balanced.

For every node xx with a left subtree ll and a right subtree rr, height(r)height(l)1|\textrm{height}(r) - \textrm{height}(l)| \leq 1. The heights of subtrees ll and rr differ by at most 1.

Balance factor balance(x)=height(x.r)height(x.l)\textrm{balance}(x) = \textrm{height}(x.r) - \textrm{height}(x.l)

B-Trees §

Assumption: each node is stored on a separate disk page.

The branching factor of B trees can be large.

A node with nn keys has n+1n + 1 children.

A rooted tree:

Consider a non-empty B-tree TT with minimum degree tt. The height of TT is hh.

# assumption: x.node has already been read from disk k.key
b_tree_search(x, k):
 i = 1
 while i <= x.n and k > x.key[i]:
  i++

 if i <= x.n and k == x.key[i]:
  return (x, i)
 else if x.leaf:
  return null
 else:
  disk_read(x, c[i])
  return b_tree_search(x.c[i], k)

O(h)O(h) reads from disk and O(th)O(t \cdot h) running time.

Hash Tables §

Data structure that stores keys with additional satellite data

The universe is small §

Direct address table: array T[0..m-1].

The universe is large; the set kk of present keys is small §

Memory: Θ(k)\Theta(|k|)

Search: O(1)O(1) time in the average case

Components:

values stored in T[h(k)]. collisions unavoidable since m<Um < |U|.

chaining: elements that hash to the same value are placed into a linked list.

worst case for search: all nn keys hash to the same slot. searching will be Θ(n)\Theta(n).

[Assumption: simple, uniform hashing]

We insert keys k1,k2,,knk_1, k_2, \cdots, k_n into a hash table with mm slots

P(h(i)=j)=1mP(h(i) = j) = \frac{1}{m}

njn_j length of the list T[j]

load factor

Greedy Algorithms §

Interval Partitioning §

The depth of a set of intervals is the maximum number of intervals that contain any given time.

Number of classrooms needed ≥ Depth

Proof strategy: Find a structural bound asserting that every possible solution must have been at least a certain value, then show that the algorithm always achieves this bound.

Scheduling to Minimize Lateness §

Formally:

“Earliest deadline first” is optimal.

greedy-is-optimal proof techniques:

Minimum Spanning Tree §

Spanning tree: Given a connected, undirected graph G=(V,E)G = (V, E) with real edge weights cec_e, a spanning tree TT contains a subset of EE that connects all vertices and whose total weight w(T)=Σu,vTw(u,v)w(T) = \Sigma_{u, v \in T} w(u, v). TT is undirected (no root) and is, by definition, acyclic.

When GG has nn nodes, TT has n1n - 1 edges

Minimum spanning tree (MST): a spanning tree of GG with lowest total weight ww

Properties of MSTs:

A cut (S,VS)(S, V - S) or SS of GG is a partition of the nodes VV into the two non-empty sets SS and VSV - S

Cut property: For a cut SS in GG, the MST contains the min-cost crossing edge ee

Cycle property: For a cycle CC in GG the MST does not contain the max-cost edge in CC

Kruskal’s algorithm §

We need to be able to efficiently check if adding an edge will form a cycle. We need to maintain a set for each connected component.

As an edge e=(u,v)e = (u, v) is considered:

Union-Find.

Best implemented with disjoint set forests.

Kruskal takes O(mlogm)O(m \log m) for sorting and O(mα(m,n)O(m \alpha(m, n) for find operations. α(m,n)\alpha(m, n) grows very slowly (effectively a small constant).

Dynamic Programming §

Edit Distance §

Dist(Xx,Yy)=min(Dist(X,Y)+cost(xy)Dist(Xx,Y)+1Dist(X,Yy)+1)Dist(Xx, Yy) = \min\begin{pmatrix}Dist(X, Y) + cost\binom{x}{y} \\ Dist(Xx, Y) + 1 \\ Dist(X, Yy) + 1\end{pmatrix}

LCS §

Finally, for xyx \neq y:

LCSLength(Xx,Yy)=max(LCSLength(Xx,Y)LCSLength(X,Yy))LCSLength(Xx, Yy) = \max\begin{pmatrix}LCSLength(Xx, Y) \\ LCSLength(X, Yy)\end{pmatrix}

Optimal BST §

pi+qi=1\sum p_i + \sum q_i = 1

E[search cost]=pi(depthT(ki)+1)+qi(depthT(di)+1)E[\textrm{search cost}] = \sum p_i \cdot ({depth}_T(k_i) + 1) + \sum q_i \cdot ({depth}_T(d_i) + 1)

Randomized Algorithms §

Selection §

Find the kk-th largest element in a given array.

T(n)=(n1)+1ni=1n1T(i)T(n) = (n - 1) + \frac{1}{n} \sum_{i = 1}^{n - 1} T(i)

Quicksort §

def quicksort(S):
 if |S| <= 3:
  sort S
  return S

 choose a splitter x uniformly at random
 for each element S_i of S:
  if S_i <= x: put S_i in S_
  if S_i > x: put S_i in S__

 quicksort(S_)
 quicksort(S__)
 return S_ + S__

T(n)=(n1)+1ni=1n[T(i1)+T(ni)]T(n) = (n - 1) + \frac{1}{n} \sum_{i = 1}^n [T(i - 1) + T(n - i)]

Maximum Flow §

A flow graph G=(V,E)G = (V, E) is a directed, weighted graph in which each edge has a capacity c(e)0c(e) \geq 0. source ss, sink tt.

A flow is a real-valued function f:V×VRf: V \times V \rightarrow \mathbb{R}:

The value of flow ff is e out of sf(e)\sum_{e \textrm{ out of }s} f(e)

Minimum cut: partition which minimizes the capacity across the two parts

Max flow = Min cut

For a flow ff and an ss-tt cut (A,B)(A, B), the net-flow across the cut is equal to the net amount leaving AA.

e out of Af(e)=e into Af(e)\sum_{e \textrm{ out of }A} f(e) = \sum_{e \textrm{ into }A} f(e)

Flow value lemma: the net flow across the cut is equal to the value of the flow.

e out of Af(e)=e into Af(e)=v(f)\sum_{e \textrm{ out of }A} f(e) = \sum_{e \textrm{ into }A} f(e) = v(f)

The flow is bounded by cut capacity. For a flow ff and a cut (A,B)(A, B), the value of the flow is at most the capacity of the cut.

v(f)cap(A,B)v(f) \leq cap(A, B)

Computational Complexity §

A problem is a decision problem iff for any input its output is Yes or No.

A decision problem PP defines a language: an input xL(P)x \in L(P) iff the output for xx is Yes.

SAT: Given a CNF formula ϕ\phi with kk clauses over a set XX of nn variables, determine if there exists a satisfying truth assignment.

3-SAT: Given a CNF formula ϕ\phi with kk clauses each of length 3 over a set of nn variables, determine if there exists a satisfying truth assignment.

Transitivity: XYYZ    XZX \preceq Y \cap Y \preceq Z \implies X \preceq Z

Turing Machine §