Shreyas’ Notes

# COMP 182

## Logic and Proofs §

### Propositions §

please don’t confuse AND/conjunction/$\wedge$ with OR/disjunction/$\vee$.

$p$ $q$ $p \wedge q$ $p \vee q$ $p \oplus q$ $p \rightarrow q$ $q \rightarrow p$ $p \leftrightarrow q$ $\neg p$ $\neg q$
F F F F F T T T T T
F T F T T T F F T F
T F F T T F T F F T
T T T T F T T T F F

Tautology: always true

Satisfiable: not a contradiction. not always false. not impossible.

SAT: propositional satisfiability. NP-complete.

### Propositional equivalence §

$p$ and $q$ are logically equivalent if $p \leftrightarrow q$ ($p = q$) is a tautology.

De Morgan’s laws:

• $\neg (p \wedge q) \equiv \neg p \vee \neg q$

• $\neg (p \vee q) \equiv \neg p \wedge \neg q$

• double negation

• $\neg (\neg p) \equiv p$
• commutative laws

• $p \wedge q \equiv q \wedge p$
• $p \vee q \equiv q \vee p$
• distributive laws

• implication laws

• $p \rightarrow q \equiv \neg p \vee q$
• $p \rightarrow q \equiv \neg q \rightarrow \neg p$
• $p\leftrightarrow q \equiv (p \rightarrow q) \wedge (q \rightarrow p)$

prove equivalences using:

• truth tables
• a series of other equivalences

### Predicates and Quantifiers §

variables.

e.g. $x \in \{1, 2, 3\}$, $P(x) := x \in \{1, 2, 3\}$

quantifiers:

• universal quantification

$\forall x P(x)$. for all $x$ in the presumed domain, $P(x)$ is true

• existential quantification

$\exists x P(x)$. there exists an $x$ such that $P(x)$ is true.

demorgan’s laws:

• $\neg \forall P(x) \equiv \exists x \neg P(x)$

• $\neg \exists P(x) \equiv \forall x \neg P(x)$

• $\forall x \forall y$

• $\exists x \exists y$

• $\forall x \exists y$

• $\exists x \forall y$

$\forall x \forall y \equiv \forall y \forall x$, $\exists x \exists y \equiv \exists y \exists x$

$\forall x \exists y \not \equiv \exists y \forall x$

negating nested quantifiers:

1. s/$\exists$/$\forall$/g
2. s/$\forall$/$\exists$/g
3. negate the proposition

## Proofs and Proof Techniques §

proofs: valid arguments that establish the truth of statements

### Proof techniques §

• direct proofs
• contrapositive proofs
• existence proofs
• counterexamples

## Sets §

unordered collection of unique items

• roster method
• set comprehension

cardinality of a finite set: number of elements in the set

infinite sets: countable, uncountable

$\forall x(x \in A \rightarrow x \in B) \rightarrow A \subseteq B$

$\mathscr{P}(A) = 2^A = \{x : x \subseteq A\}$

$\mathscr{P}(\varnothing) = \{\varnothing\}$

Proper subsets ($\subset$): subsets excluding the set itself

### Set operations §

• union — $A \cup B$
• intersection — $A \cap B$
• disjoint: $A \cap B = \varnothing$
• difference — $A \setminus B = A - B$
• complement — $\overline{A} = U \setminus A$
• cartesian product — $A \times B = \{(x, y) : x \in A \wedge y \in B\}$
• $A \times \varnothing = \varnothing$
• $|A \times B| = |A| \times |B|$

### Set identities §

• equality

$A \subseteq B \wedge B \subseteq A$ implies $A = B$

• de morgan’s laws

• $\overline{A \cap B} = \overline{A} \cup \overline{B}$
• $\overline{A \cup B} = \overline{A} \cap \overline{B}$
• complementation law — $\overline{\overline{A}} = A$

• commutative

• associative

• distributive

## Functions §

non-empty sets $D$ (domain) and $R$ (codomain)

function: assignment — from every element in $D$ to exactly one element in $R$

range (elements actually used as values) $\subseteq$ codomain

$f: A \rightarrow B$ and $f: C \rightarrow D$ are equal functions when:

• $A = C$
• $B = D$
• $\forall a \in A [f(a) = g(a)]$

when $f: A \rightarrow \mathbb{R}$ and $g: A \rightarrow \mathbb{R}$

• $f + g = (f + g)(a) = f(a) + g(a)$

• $fg = (f \cdot g)(a) = f(a) \cdot g(a)$

### Injections, surjections, bijections §

• Injection aka “one-to-one”

$f: A \rightarrow B$: $\forall a, b \in A \left(f(a) = f(b) \rightarrow a = b\right)$

$|B| \geq |A|$

hoz. line test

• Surjection aka “onto”

$\forall b \in B (\exists a \in A (f(a) = b))$

$|A| \geq |B|$

• Bijection

both one-to-one and onto

There exists a bijection $f: A \rightarrow B$ iff $|A| = |B|$

## Relations and their properties §

a binary relation from $A$ to $B$ is a subset of $A \times B$.

$(a, b) \in R$, $aRb$

$(a, b) \not \in R$, $a \not R b$

• reflexivity — $aRa$
• symmetric — $aRb \implies bRa$
• antisymmetric — $aRb \wedge bRa \implies a = b$
• transitivity — $aRb \wedge bRc \implies aRc$

composing — $aRb \wedge bSc \implies a(R \circ S)c$

### Representing relations §

• $|A| \times |B|$ adjacency matrix $M$ where:

$m_{i,j} = \begin{cases}1 & a_i R b_j \\ 0 & a_i \not R b_j\end{cases}$

• di(rected)graphs

### Relation closures §

"$S$ is the closure of $R$ wrt $P$" — relation $S$ is the smallest set that satisfies property $P$ such that $R \subseteq S$

[$R$ is a relation on set $A$]:

• reflexive closure

$S = R \cup \{(a, a) : a \in A\}$

• symmetric closure

$S = R \cup \{(a,b) : (b,a) \in R\}$

• transitive closure

### Equivalence relations §

• reflexive

• symmetric

• transitive

beware of “or”/“common” relations wrt transitivity

two elements are equivalent ($a \sim b$) if they’re related by an equivalent relation.

$R$ is an equivalence relation on $A$, $a \in A$. equivalence class of $a$ ($[a]$) is $\{b \in A: (a,b) \in R\}$

partition of $A$.

• $A_1, A_2, \cdots, A_n$ are non-empty sets
• for any $a, b \leq n$, $A_a \cap A_b = \varnothing$
• $\bigcup_{i=1}^{n} A_i = A$

bipartition: $n = 2$.

fundamental theorem of equivalence relations: the set of all equivalence classes of a relation $R$ on $A$ form a partition of $A$.

### Partial Orders §

• reflexive
• antisymmetric
• transitive

set $A$ + partial order $R$ = partial-ordered set aka “poset” $(A, R)$

$a$ and $b$ are elements of a poset $(A, \preceq)$$a$ and $b$ are comparable if either $a \preceq b$ or $b \preceq a$. else, they are incomparable.

if $(A, \preceq)$ is a poset where every two elements of $A$ are comparable, $A$ is a totally ordered set and $\preceq$ is a total order.

Hasse diagrams are graphical reps of partial orders. excludes reflexivity and arrows that can be obtained by transitivty.

$a$ is a maximal element if there’s no $c \in A, c \neq a$ such that $a \preceq c$

$b$ is a minimal element if there’s no $c \in A, c \neq b$ such that $c \preceq b$

minimal and maximal elements do not have to exist or be unique.

$a$ is the greatest element if $\forall c \in A, c \preceq a$.

$b$ is the least element if $\forall c \in A, b \preceq c$.

greatest and least elements do not have to exist, but they must be unique.

a totally ordered set $(A, \preceq)$ is a well-ordered set if every non-empty subset of $A$ has a least element

## Graphs §

### Undirected graphs §

Pair $(V, E)$ where:

• $V$ is a non-empty set of nodes
• $E \subseteq \{A : A \in \mathscr{P}(V) \wedge |A| = 2\}$

### Directed graphs §

aka digraphs

Pair $(V, E)$ where:

• $V$ is a non-empty set of nodes
• $E \subseteq (V \times V) - \{(u, u) : u \in V\}$

### Jargon §

• neighbours

• degree

• in-degree
• out-degree

• subgraph

subset of nodes and subset of edges. $V' \subseteq V \wedge E' \subseteq E$

• induced subgraph

subset of edges and all edges that connect them. $V' \subseteq V \wedge E' = \{\{u, v\} \in E : u, v \in V'\}$

### Special graphs §

• complete graph: graph with an edge between every two nodes

• cycle

• bipartite graph

$V$ can be bipartitioned into $V_1$ and $V_2$ such that $E \subseteq (V_1 \times V_2 \cup V_2 \times V_1)$

bipartite theorem: a graph is bipartite iff there exists a function $f : V \rightarrow \{1, 2\}$ such that $\forall {u, v} \in E, f(u) \neq f(v)$

colour neighbours of $1$-coloured nodes with $2$ and vice-versa

• complete bipartite graph
• forest: acyclic

• tree: connected and acyclic.

### Connectivity §

• path

• simple — path that doesn’t contain the same node more than once
• circuit — path that begins and ends at the same node
• cycle — simple circuit
• connected

• connected component
• strongly connected
• weakly connected
• strongly connected component

### Graph Isomorphism §

• graph isomorphism

equivalence relation

• subgraph isomorphism

partial order

Q: subgraph isomorphism vs subgraph

• every graph is isomorphic to itself
• $(g_1, g_2) \in R \rightarrow (g_2, g_1) \in R$
• $(g_1, g_2) \in R \wedge (g_2, g_3) \in R \rightarrow (g_1, g_3) \in R$

### Representing graphs §

$n :=$ number of nodes

$m :=$ number of edges

input size $(n, m)$

❤ sparse

input size $n$

❤ dense

## Sequences and Summations §

sequence: a function from a subset of the set of integers to a set $S$. $a_n$, $\{a_n\}$

• geometric progression $a, ar, ar^2, \cdots$

initial term $a$, common ratio $r$

• arithmetic progression $a, a + d, a + 2d, \cdots$

initial term $a$, common difference $d$

recurrence relation for a sequence is an equation that expresses $a_n$ in terms of one or more of the previous terms of the sequence

summation:

• $\sum_{n = 0}^{\infty} a_n$
• $\sum_{n = i}^{n=j} a_n$
• $\sum_{n \in S} a_n$

series: sum of a sequence from $n = 0$ or $n = 1$ as necessary

• telescoping sums
• perturbation method

## Algorithm Efficiency §

finite sequence of precise instructions for performing a computation

• a name
• input/output
• unambiguous steps
• finite {time,number of steps}
• correct output values for each set of input values

### Big O notation §

$f(x) = \mathcal{O}\left(g(x)\right)$ if there exist positive constants $C$ and $k$ such that $f(x) \leq Cg(x)$ for all $x \geq k$

• $f(x) = a_n x^n + a_{n - 1} x^{n - 1} + \cdots +a_1 x + a_0 = \mathcal{O}(x^n)$
• $f_1(x) = \mathcal{O}(g(x))$, $f_2(x) = \mathcal{O}(h(x))$. $(f_1 + f_2)(x) = \mathcal{O}\left(\max\left(g(x), h(x)\right)\right)$
• $f_1(x) = \mathcal{O}(g(x))$, $f_2(x) = \mathcal{O}(h(x))$. $(f_1 \cdot f_2)(x) = \mathcal{O}(g(x) \cdot h(x))$

### Big Omega notation §

$f(x) = \Omega\left(g(x)\right)$ if there exist positive constants $C$ and $k$ such that $f(x) \geq Cg(x)$ for all $x \geq k$

$f(x) = \Omega(g(x)) \Leftrightarrow g(x) = \mathcal{O}(f(x))$

### Big Theta notation §

… aka “order of”

$f(x) = \Theta\left(g(x)\right)$ if there exist positive constants $C_1$, $C_2$, and $k$ such that $C_1 g(x) \leq f(x) \leq C_2 g(x)$ for all $x \geq k$

$f(x) = \Theta(g(x))$ iff $f(x) = \mathcal{O}(g(x))$ and $f(x) = \Omega(g(x))$.

### Complexity of algorithms §

efficiency:

• time
• space

for a given input size $n$:

• worst case — upper bound

• best case — usually useless

• average case — informative, but requires making assumptions about the input

• $\Theta(1)$

• $\Theta(\log n)$

• $\Theta(n)$

• $\Theta(n \log n)$

• $\Theta(n^a)$

• $\Theta(a^n)$, $n > 1$

• $\Theta(n!)$

if there exists a polynomial-time worst-case algorithm for a problem, the problem is tractable. else, it’s intractable.

• tractable: class P
• intractable
• solution can be verified in polynomial time: class NP
• class NP-complete

halting problem:

• does a given computer program terminate for a given input
• proven unsolvable

post correspondence problem

## Asymptoticities of General Functions §

$f(x) = \mathcal{O}(g(x))$ iff $\exists k, C$ such that $|f(x)| \leq C \cdot |g(x)|$ for all $x \geq k$.

e.g. $-x = \mathcal{O}(-x^2)$

### Equality and Equivalence §

$f(x) = \mathcal{O(g)}$” is notation abuse.

$\mathcal{O}(g) = \{h : \exists C, k, \forall x \geq k, h(x) \leq C \cdot g(x)\}$

$f(x) \in \mathcal{O(g)}$

Asymptotic equivalence ($\Theta$) is an equivalence relation.

$f(x) = \mathcal{O}(g(x)) \leftrightarrow g(x) = \Omega(f(x))$

$f(x) = \Theta(g(x)) \leftrightarrow [f(x) = \mathcal{O}(g(x)) \wedge f(x) = \Omega(g(x))]$

Let $f(x) = a_n x^n + a_{n - 1} x^{n - 1} + \cdots + a_1 x + a_0$ where $a_0, a_1, \ldots, a_n \in \mathbb{R}$. Then, $f(x) = \mathcal{O}(x^n)$.

Let $f_1 = \mathcal{O}(g)$ and $f_2 = \mathcal{O}(h)$. Then, $(f_1 + f_2) = \mathcal{O}(\max(g, h))$.

Let $f_1 = \mathcal{O}(g)$ and $f_2 = \mathcal{O}(h)$. Then, $(f_1 \cdot f_2) = \mathcal{O}(g \cdot h)$.

### Limit-based definitions §

If $f = \mathcal{O}(g)$, $\lim_{x \rightarrow \infty} \frac{f}{g} < \infty$.

If $f = \Omega(g)$, $\lim_{x \rightarrow \infty} \frac{f}{g} > 0$.

If $f = \Theta(g)$, $0 < \lim_{x \rightarrow \infty} \frac{f}{g} < \infty$.

### Little ones §

$f = o(g)$ iff $\lim_{x \rightarrow \infty} \frac{f}{g} = 0$.

$f = \omega(g)$ iff $\lim_{x \rightarrow \infty} \frac{f}{g} = \infty$.

### Asymptotic analogies to relational operators §

Relational operator Analogous asymptotic operator
$\leq$ $\mathcal{O}$
$\geq$ $\Omega$
$=$ $\Theta$
$<$ $o$
$>$ $\omega$

### Divide-Concur and Recurrences §

• A problem of size $n$ is divided into $b$ smaller instances (ideally of equal size) of the same problem.
• Some of the smaller instances are solved.
• The solutions to the smaller instances are combined, if necessary.

#### Merge sort §

• split each array into two subarrays
• arrays of size one are trivially sorted
• work backwards and successively merge sorted subarrays

$\mathrm{MergeSort}$:

if n > 1
copy L[0 .. n/2 - 1] to A
copy L[n/2 .. n] to B
MergeSort(A)
MergeSort(B)
Merge(A, B, L)


$\mathrm{Merge}$:

i = 0, j = 0, k = 0

while i < p and j < q
if A[i] <= B[j]
L[k] = A[i]
i++
else
L[k] = B[j]
j++
k++

if i = p
copy B[j .. q-1] to L[k .. p+q-1]
else
copy A[i .. p-1] to L[k .. p+q-1]


Running time $T(n) = 2 T\left(\frac{n}{2}\right) + \mathcal{O}(n)$.

Master theorem: Let $f$ be an increasing function such that $f(n) = a f\left(\frac{n}{b}\right) + cn^d$. Then:

$f(n) = \begin{cases} \mathcal{O}\left(n^d\right) & a < b^d \\ \mathcal{O}\left(n^d \log n\right) & a = b^d \\ \mathcal{O}\left(n^{\log_b a}\right) & a > b^d \end{cases}$

From the master theorem $T(n) = \mathcal{O}(n \log n)$.

$T(n) = T\left(\frac{n}{2}\right) + \mathcal{O}(1) = \mathcal{O}(\log n)$

## Mathematical Induction §

$[P(1) \wedge \forall k (P(k) \rightarrow P(k + 1))] \rightarrow \forall n P(n)$

Important fact: The set of positive integers is well-ordered.

1. Basis step: Show that $P(1)$

2. Inductive step: Show that $\forall k P(k) \rightarrow P(k + 1)$

The assumption that $P(k)$ is called the inductive hypothesis

### Strong Mathematical Induction §

1. Basis step: Show that $P(1)$

2. Inductive step: Show that $\forall k [P(1) \wedge P(2) \wedge \cdots \wedge P(k)] \rightarrow P(k + 1)$

### Recursive definitions §

$\Sigma = \{0, 1\} \implies \Sigma^* = \{\epsilon, 0, 1, 00, 01, 10, 11, 000, \ldots\}$

$\epsilon$: empty string

Exclusion rule

## Weighted graphs and greedy algorithms §

### Optimization problems and greedy algorithms §

Optimization problem: maximing or minimizing something.

Greedy algorithm:

• piece-by-piece
• choose the locally optimal (valid) piece as the next piece

don’t work for many problems.

### Minimum Spanning Tree §

input: weighted graph

output: tree such that the sum of the edges in the tree — $\sum_{e \in T} weight_e$ — is minimum

Prim’s algorithm:

• $V_T = \{x\}$
• $E_T = \varnothing$
• for $i$ from 1 to $|V| - 1$
• find the lightest edge $e = \{u, v\} \in E$ such that $u \in V_T$ and $v \in V - V_T$.
• $V_T = V_T \cup \{v\}$
• $E_T = E_T \cup \{e\}$
• return $(V_T, E_T)$

### Single Source Shortest Paths problem §

input: weighted graph $g = (V, E, w)$ such that $w(e) \geq 0 \forall e \in E$; a source node $i \in V$.

output: distance from $i$ to each node $j \in V$, $d_J$; a shortest path from $i$ to $j$, $p_j$.

Dijkstra’s algorithm:

• $x = \varnothing$
• for $j \in V$
• $d_j = \infty$; $p_j = null$; $x = x \cup \{j\};$
• $d_i = 0$
• while $X \neq \varnothing$
• let $k$ be a node with minimum $d_k$
• if $d_k = 0$: break
• $X \setminus \{k\}$
• for each neighbour $h$ of $k$ in $X$
• if $d_k + w((k, h)) < d_h$
• $d_h = d_k + w((k, h))$
• $p_h = k$
• return $d, p$

$\mathcal{O}(m \log n)$ assuming that the graph is represented as an adj. list and that $X$ is represented as a min-heap.

## Counting §

### Pigeonhole principle §

$n$ holes, $n + 1$ pigeons; at least one of the pigeonholes has two pigeons.

$f: A \rightarrow B$. $|B| < |A| \implies \exists a_1, a_2 \in A : a_1 \neq a_2 \wedge f(a_1) = f(a_2)$.

$f: X \rightarrow Y$. $|Y| < |X| \implies \exists y \in Y : |\{x : f(x) = y\}| \geq \lceil\frac{|X|}{|Y|}\rceil$

### Product and division rules §

Product rule: if $A = A_1 \times A_2 \times \cdots \times A_n$, then $|A| = |A_1| \times |A_2| \times \cdots \times |A_n|$

Division rule.

### Sum and subtraction rules §

#### Sum rule §

$A = A_1 \cup A_2 \cup \cdots \cup A_n$ where $A_1, A_2, \ldots$ are pairwise disjoint. $|A| = \sum |A_i|$.

#### Difference rule §

$A = A_1 \cup A_2$. $|A| = |A_1| + |A_2| - |A \cap B|$.

### Permutations and Combinations §

#### Without repetition §

Permutations: Order matters.

$0 \leq r \leq n$. $P(n, r) = \frac{n!}{(n - r)!}$

Combinations: Order doesn’t matter.

$0 \leq r \leq n$. $C(n, r) = ^nC_r = \binom{n}{r} = \frac{n!}{r!(n - r)!}$

$C(n, r) = \frac{P(n, r)}{r!}$

#### With repetition §

the number of $k$-permutations of $n$ items with repetition: $n^k$

the number of $k$-combinations of $n$ items with repetition: $\binom{n + k - 1}{k} = \binom{n + k - 1}{n - 1}$

(aka. the number of ways to choose a subset of size $k$ from a set of size $n$)

### Inclusion-Exclusion principle §

$\left|\bigcup_{i = 1}^n A_i\right| = \sum_{i = 1}^n |A_i| - \sum_{1 \leq i \leq j \leq n} |A_i \cap A_j| + \sum_{1 \leq i \leq j \leq k \leq n} |A_i \cap A_j \cap A_k| - \cdots + (-1)^n |A_1 \cap A_2 \cap \cdots \cap A_n|$

$\left|\bigcup_{i=1}^n A_i\right| = \sum_{k=1}^n (-1)^{k+1} \left( \sum_{1 \leqslant i_1 < \cdots < i_k \leqslant n} | A_{i_1} \cap \cdots \cap A_{i_k} | \right)$

$f: A \rightarrow B$; $|A| = m$, $|B| = n$

• total number of functions: $n^m$
• total number of 1-1 functions: $n(n - 1)(n - 2) \cdots (n - m + 1)$
• total number of onto functions: $n^m - \binom{n}{1} (n - 1)^m + \binom{n}{2} (n - 2)^m - \cdots + (-1)^{n - 1} \binom{n}{n - 1} 1^m$

### Binomial coefficients §

$\left(x + y\right)^n = \sum_{i = 0}^n \binom{n}{i} x^{n - i} y^i$

### Combinatorial Proofs §

• use counting arguments to show LS and RS count the same objects in different ways
• show that there is a bijection between the sets counted on LS and RS

pascal’s identity: $\binom{n}{k} = \binom{n - 1}{k - 1} + \binom{n - 1}{k}$

vandermonde’s identity: $\binom{m + n}{k} = \sum_{l = 0}^{k} \binom{m}{k - l} \binom{n}{l}$

## Disscrete Probability §

• Average case analysis
• Randomized algorithms
• Machine learning
• Stochastic processes

• experiment
• sample space
• event

### Assigning probability §

Given a sample space $S$, a probability distribution is a function $p : S \rightarrow [0, 1]$ such that $\sum_{s \in S} p(s) = 1$.

• fair
• biased

uniform distribution: $\forall s \in S, p(s) = \frac{1}{|S|}$

### Conditional Probability and Independence §

$P(E|F) = \frac{P(E \cap F)}{P(F)}$ ($P(F) \neq 0$)

If $P(E|F) = P(E)$, $E$ is independent of $F$

$P(E \cap F) = P(E) \cdot P(F) \implies$ $E$ is independent of $F$

$P(E|F) = \frac{P(F|E) P(E)}{P(F)}$ where $P(F) \neq 0$.

### Bayes theorem §

$P(D|T) = \frac{P(T|D) P(D)}{P(T|D) P(D) + P(T|D') P(D')}$

If $F_1, F_2, \ldots, F_n$ is a partition of $S$,

$P(F_i | E) = \frac{P(E|F_i)P(F_i)}{\sum_{i = 1}^n [P(E|F_i) P(F_i)]}$

Bayesian inference: $P(M|D) \propto P(D|M) P(M)$

### Random variables §

A random variable is a function from the sample space to the set $\mathbb{R}$.

### Expected values §

The expectation of a rv $X$ is its mean.

$\mathbb{E}(X) = \sum_{s \in S} p(s) \cdot X(s) = \sum_x p(x) \cdot x$

Linearity of expectations

If $X_1, X_2, \ldots, X_n$ are rv’s on $S$, then $\mathbb{E}(X_1 + X_2 + \ldots + X_n) = \mathbb{E}(X_1) + \mathbb{E}(X_2) + \cdots + \mathbb{E}(X_n)$

$\mathbb{E}(aX + b) = a\mathbb{E}(X) + b$

For independent random variables $X$ and $Y$, $\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y)$.

### Variance §

$var(X) = \sum_x \left(x - \mathbb{E}\right)^2 - p(x)$

$var(X) = \mathbb{E}[\left(X - \mathbb{E}\right)^2]$

$var(X) = \mathbb{E}(X^2) - \left[\mathbb{E}(X)\right]^2$

## Discrete Probability §

### Tail Bounds §

Markov’s inequality: $P(x \geq a) \leq \frac{\mathbb{E}(x)}{a}$

$P(x \geq a \mathbb{E}(x)) \leq \frac{1}{a}$

Chebyshev’s inequality: $P(|X - \mathbb{E}(x)| \geq a) \leq \frac{V(x)}{a^2}$

$P(|x - \mathbb{E}(x)| \geq a \sigma) \leq \frac{1}{a^2}$

### Families of Distributions §

#### Bernoulli Distribution §

$X = \{0, 1\}$

$\mathbb{E}(X) = \sum_x x p(x) = 0 \cdot (1 - p) + 1 \cdot p = p$

$V(x) = \sum_x {(x - p)}^2 p(x) = p(1 - p)$

$P(x) = \begin{cases}p & x = 1 \\ 1 - p & x = 0\end{cases}$

#### Binomial Distribution §

Run $n$ independent Bernoulli trials and count the number of successes in the sequence.

$p(X = k) = \binom{n}{k} p^k {(1 - p)}^{n - k}$

$\mathbb{E}(X) = np$

$V(X) = np(1 - p)$

#### Geometric Distribution §

$p(X = k) = {(1 - p)}^{k - 1} p$

$\mathbb{E}(X) = \frac{1}{p}$

$V(X) = \frac{1 - p}{p^2}$

#### Negative Binomial Distribution §

$l$ trials until we see $k$ successes

$p(X = l) = \binom{l - 1}{k - 1} {(1 - p)}^{l - k} p^k$

### Markov Chains and Hidden Markov Models (HMMs) §

• initial distribution ($\pi$)
• emission probability ($E_l(X_i)$)
• transition matrix ($A_{l', l}$)

## Dynamic Programming §

### Binomial Coefficients §

Use pascal’s identity and a 2D array similar to a tilted pascal’s triangle.

### Longest Increasing Subsequence §

Input: an array of integers

Output: longest possible sequence of indices such that the values indexed by the indices are strictly increasing

### Viterbi’s algorithm §

$v[l, i] = \max_{l' \in Q} (v[l', i - 1] \cdot A_{l', l}) \times E_l(x_i)$

for i in 1..L-1:
for l in Q: # Q is the set of states
v[l][i] = (max of (v[l_][i - 1] * A[l_][l]) over l_ in Q) * E[l][x_i]


## Recurrences §

### Modelling with recurrence relations §

examples:

• fibonacci
• tower of hanoi
• counting strings
• counting partitions

### Linear Homogeneous recurrences §

a linear homogeneous recurrence relation of degree $k$ with constant coefficients is a recurrence relation of the form $a_n = c_1 a_{n - 1} + c_2 a_{n - 2} + \cdots + c_k a_{n - k}$ where $c_1, c_2, \ldots, c_k \in \mathbb{R}$ and $c_k \neq 0$.

## Infinite Sets and Their Cardinalities §

### Infinity and Hilbert’s Grand Hotel §

Hotel with infinite number of (numbered) rooms.

1. Hotel is full. A new guest shows up.

Move every guest in room $i$ to room $i + 1$, assign room 1 to the new guest.

Bijection $f : \{1, 2, 3, \ldots\} \rightarrow \{2, 3, 4, \ldots\}$.

If $X$ is finite, there does not exist a bijection f : X \righarrow Y where $Y \subset X$

2. Two-floor hotel

Bijection from $\{(1, i), (2, i) : i \in \mathbb{N}\} \rightarrow \{1, 2, 3, \ldots\}$

3. More floors

4. Real-numbered hotel to natural-numbered hotel?

A bijection from $\mathbb{R} \rightarrow \mathbb{N}$ does not exist.

### Infinity, Injections, and Bijections §

Set $X$ is infinite if there exists a bijection $f : X \rightarrow Y$ where $Y \subset X$.

If there exists a bijection from $X \rightarrow Y$, then $X$ and $Y$ have the same cardinality.

If there exists a 1-1 function $f : X \rightarrow Y$, then $|X| \leq |Y|$.

There exists a bijection $f : X \rightarrow Y$ iff there exists a bijection $g : Y \rightarrow X$.

Shröder-Bernstein Theorem: if $A$ and $B$ are two sets such that $|A| \leq |B|$ and $|B| \leq |A|$, then $|A| = |B|$. Show a one-to-one function $f : A \rightarrow B$ and a one-to-one function $g : B \rightarrow A$.

### An infinity of infinities §

Every finite set is countable.

A infinite set $X$ is countably infinite if there exists a bijection $f : \mathbb{N} \rightarrow X$.

A infinite set $X$ is uncountable if there doesn’t exist a bijection $f : \mathbb{N} \rightarrow X$. Proof: Cantor’s diagonal argument.

• $\aleph_0 = |\mathbb{N}|$ — smallest possible cardinality of an infinite set
• $\aleph_1 = |2^{\mathbb{N}}| = 2^{\aleph_0}$
• $\aleph_2 = |2^{2^{\mathbb{N}}}| = 2^{\aleph_1}$

Continuum hypothesis: $|A| < |2^A|$. $\aleph_0 < \aleph_1 < \cdots$

“Zermelo-Fraenkel axioms of set theory”