Shreyas’ Notes

Reasoning About Algorithms

COMP 382

fall, sophomore year

BSTs §

Assumption: keys are linearly ordered.

Let xx and yy be nodes in the tree.

Every subtree of a BST is a BST.

Traversal §

in_order(x):
	if x != null:
		in_order(x.left)
		print(x.key)
		in_order(x.right)

Running time: Θ(n)\Theta(n) where nn is the tree size

tree_search(x, k):
	if x == null or k == x.key
		return x

	if k < x.key:
		return tree_search(x.left, k)

	return tree_serach(x.right, k)

O(h)O(h), where hh is the height of the tree. hh is

tree_search(x, k):
	while x != null and k != x.key:
		if k < x.key:
			x = x.left
		else:
			x = x.right

	return x

Successor §

Pre-condition: x != null

tree_successor(x):
	if x.right != null:
		return tree_min(x)

	y = x.p
	while y != null and x == y.right:
		x = y
		y = y.p
	return y

Red-Black Trees §

  1. every node is either red or black
  2. the root is black
  3. every leaf (NIL) is black
  4. the children of a red node must be black
  5. for every node uu, all simple paths from uu to descendent leaves have the same number of black nodes.

properties 4 and 5 together ensure that the tree is “roughly balanced”.

AVL Trees §

Binary search trees that are balanced.

For every node xx with a left subtree ll and a right subtree rr, height(r)height(l)1|\textrm{height}(r) - \textrm{height}(l)| \leq 1. The heights of subtrees ll and rr differ by at most 1.

Balance factor balance(x)=height(x.r)height(x.l)\textrm{balance}(x) = \textrm{height}(x.r) - \textrm{height}(x.l)

B-Trees §

Assumption: each node is stored on a separate disk page.

The branching factor of B trees can be large.

A node with nn keys has n+1n + 1 children.

A rooted tree:

Consider a non-empty B-tree TT with minimum degree tt. The height of TT is hh.

# assumption: x.node has already been read from disk k.key
b_tree_search(x, k):
	i = 1
	while i <= x.n and k > x.key[i]:
		i++

	if i <= x.n and k == x.key[i]:
		return (x, i)
	else if x.leaf:
		return null
	else:
		disk_read(x, c[i])
		return b_tree_search(x.c[i], k)

O(h)O(h) reads from disk and O(th)O(t \cdot h) running time.

Hash Tables §

Data structure that stores keys with additional satellite data

The universe is small §

Direct address table: array T[0..m-1].

The universe is large; the set kk of present keys is small §

Memory: Θ(k)\Theta(|k|)

Search: O(1)O(1) time in the average case

Components:

values stored in T[h(k)]. collisions unavoidable since m<Um < |U|.

chaining: elements that hash to the same value are placed into a linked list.

worst case for search: all nn keys hash to the same slot. searching will be Θ(n)\Theta(n).

[Assumption: simple, uniform hashing]

We insert keys k1,k2,,knk_1, k_2, \cdots, k_n into a hash table with mm slots

P(h(i)=j)=1mP(h(i) = j) = \frac{1}{m}

njn_j length of the list T[j]

load factor