Algorithmic Thinking
COMP 182
spring, freshman year
Logic and Proofs Propositions Propositional equivalence Predicates and Quantifiers Proofs and Proof Techniques Rules of Inference Proof techniques Sets Set operations Set identities Functions Injections, surjections, bijections Inverse and composition Relations and their properties Representing relations Relation closures Equivalence relations Partial Orders Matrices Graphs Undirected graphs Directed graphs Jargon Special graphs Connectivity Graph Isomorphism Representing graphs Sequences and Summations Algorithm Efficiency Big O notation Big Omega notation Big Theta notation Complexity of algorithms Asymptoticities of General Functions Equality and Equivalence Helpful results Limit-based definitions Little ones Asymptotic analogies to relational operators BFS Divide-Concur and Recurrences Merge sort Binary search Mathematical Induction Strong Mathematical Induction Recursive definitions Weighted graphs and greedy algorithms Optimization problems and greedy algorithms Minimum Spanning Tree Single Source Shortest Paths problem Counting Pigeonhole principle Product and division rules Sum and subtraction rules Sum rule Difference rule Permutations and Combinations Without repetition With repetition Inclusion-Exclusion principle Binomial coefficients Combinatorial Proofs Disscrete Probability Experiment, Sample Space, and Event Assigning probability Conditional Probability and Independence Bayes theorem Random variables Expected values Average case running time of linear search Variance Discrete Probability Tail Bounds Families of Distributions Bernoulli Distribution Binomial Distribution Geometric Distribution Negative Binomial Distribution Poisson Distribution Markov Chains and Hidden Markov Models (HMMs) Parameter Estimation For Markov Chains Dynamic Programming Binomial Coefficients Longest Increasing Subsequence Weighted Compatible Intervals All pairs shortest paths Longest common subsequence RNA Secondary Structure Prediction Knapsack Problem Viterbi’s algorithm Recurrences Modelling with recurrence relations Linear Homogeneous recurrences Infinite Sets and Their Cardinalities Infinity and Hilbert’s Grand Hotel The Infinitesimally Small Infinity, Injections, and Bijections An infinity of infinities Logic and Proofs
Propositions
please don’t confuse AND
/conjunction/∧ \wedge ∧ with OR
/disjunction/∨ \vee ∨ .
p p p
q q q
p ∧ q p \wedge q p ∧ q
p ∨ q p \vee q p ∨ q
p ⊕ q p \oplus q p ⊕ q
p → q p \rightarrow q p → q
q → p q \rightarrow p q → p
p ↔ q p \leftrightarrow q p ↔ q
¬ p \neg p ¬ p
¬ q \neg q ¬ q
F
F
F
F
F
T
T
T
T
T
F
T
F
T
T
T
F
F
T
F
T
F
F
T
T
F
T
F
F
T
T
T
T
T
F
T
T
T
F
F
Tautology : always true
Contradiction : always false
Satisfiable : not a contradiction. not always false. not impossible.
SAT : propositional satisfiability. NP-complete.
Propositional equivalence
p p p and q q q are logically equivalent if p ↔ q p \leftrightarrow q p ↔ q (p = q p = q p = q ) is a tautology.
De Morgan’s laws:
prove equivalences using:
truth tables
a series of other equivalences
Predicates and Quantifiers
variables.
e.g. x ∈ { 1 , 2 , 3 } x \in \{1, 2, 3\} x ∈ { 1 , 2 , 3 } , P ( x ) : = x ∈ { 1 , 2 , 3 } P(x) := x \in \{1, 2, 3\} P ( x ) : = x ∈ { 1 , 2 , 3 }
quantifiers:
universal quantification
∀ x P ( x ) \forall x P(x) ∀ x P ( x ) . for all x x x in the presumed domain, P ( x ) P(x) P ( x ) is true
existential quantification
∃ x P ( x ) \exists x P(x) ∃ x P ( x ) . there exists an x x x such that P ( x ) P(x) P ( x ) is true.
demorgan’s laws:
¬ ∀ P ( x ) ≡ ∃ x ¬ P ( x ) \neg \forall P(x) \equiv \exists x \neg P(x) ¬ ∀ P ( x ) ≡ ∃ x ¬ P ( x )
¬ ∃ P ( x ) ≡ ∀ x ¬ P ( x ) \neg \exists P(x) \equiv \forall x \neg P(x) ¬ ∃ P ( x ) ≡ ∀ x ¬ P ( x )
∀ x ∀ y \forall x \forall y ∀ x ∀ y
∃ x ∃ y \exists x \exists y ∃ x ∃ y
∀ x ∃ y \forall x \exists y ∀ x ∃ y
∃ x ∀ y \exists x \forall y ∃ x ∀ y
∀ x ∀ y ≡ ∀ y ∀ x \forall x \forall y \equiv \forall y \forall x ∀ x ∀ y ≡ ∀ y ∀ x , ∃ x ∃ y ≡ ∃ y ∃ x \exists x \exists y \equiv \exists y \exists x ∃ x ∃ y ≡ ∃ y ∃ x
∀ x ∃ y ≢ ∃ y ∀ x \forall x \exists y \not \equiv \exists y \forall x ∀ x ∃ y ≡ ∃ y ∀ x
negating nested quantifiers:
s/∃ \exists ∃ /∀ \forall ∀ /g
s/∀ \forall ∀ /∃ \exists ∃ /g
negate the proposition
Proofs and Proof Techniques
proofs: valid arguments that establish the truth of statements
Rules of Inference
Proof techniques
direct proofs
contrapositive proofs
proof by contradiction
existence proofs
counterexamples
Sets
unordered collection of unique items
roster method
set comprehension
cardinality of a finite set: number of elements in the set
infinite sets: countable, uncountable
∀ x ( x ∈ A → x ∈ B ) → A ⊆ B \forall x(x \in A \rightarrow x \in B) \rightarrow A \subseteq B ∀ x ( x ∈ A → x ∈ B ) → A ⊆ B
P ( A ) = 2 A = { x : x ⊆ A } \mathscr{P}(A) = 2^A = \{x : x \subseteq A\} P ( A ) = 2 A = { x : x ⊆ A }
P ( ∅ ) = { ∅ } \mathscr{P}(\varnothing) = \{\varnothing\} P ( ∅ ) = { ∅ }
Proper subsets (⊂ \subset ⊂ ): subsets excluding the set itself
Set operations
union — A ∪ B A \cup B A ∪ B
intersection — A ∩ B A \cap B A ∩ B
disjoint: A ∩ B = ∅ A \cap B = \varnothing A ∩ B = ∅
difference — A ∖ B = A − B A \setminus B = A - B A ∖ B = A − B
complement — A ‾ = U ∖ A \overline{A} = U \setminus A A = U ∖ A
cartesian product — A × B = { ( x , y ) : x ∈ A ∧ y ∈ B } A \times B = \{(x, y) : x \in A \wedge y \in B\} A × B = { ( x , y ) : x ∈ A ∧ y ∈ B }
A × ∅ = ∅ A \times \varnothing = \varnothing A × ∅ = ∅
∣ A × B ∣ = ∣ A ∣ × ∣ B ∣ |A \times B| = |A| \times |B| ∣ A × B ∣ = ∣ A ∣ × ∣ B ∣
Set identities
Functions
non-empty sets D D D (domain) and R R R (codomain)
function : assignment — from every element in D D D to exactly one element in R R R
range (elements actually used as values) ⊆ \subseteq ⊆ codomain
f : A → B f: A \rightarrow B f : A → B and f : C → D f: C \rightarrow D f : C → D are equal functions when:
A = C A = C A = C
B = D B = D B = D
∀ a ∈ A [ f ( a ) = g ( a ) ] \forall a \in A [f(a) = g(a)] ∀ a ∈ A [ f ( a ) = g ( a ) ]
when f : A → R f: A \rightarrow \mathbb{R} f : A → R and g : A → R g: A \rightarrow \mathbb{R} g : A → R
Injections, surjections, bijections
Injection aka “one-to-one”
f : A → B f: A \rightarrow B f : A → B : ∀ a , b ∈ A ( f ( a ) = f ( b ) → a = b ) \forall a, b \in A \left(f(a) = f(b) \rightarrow a = b\right) ∀ a , b ∈ A ( f ( a ) = f ( b ) → a = b )
∣ B ∣ ≥ ∣ A ∣ |B| \geq |A| ∣ B ∣ ≥ ∣ A ∣
hoz. line test
Surjection aka “onto”
∀ b ∈ B ( ∃ a ∈ A ( f ( a ) = b ) ) \forall b \in B (\exists a \in A (f(a) = b)) ∀ b ∈ B ( ∃ a ∈ A ( f ( a ) = b ) )
∣ A ∣ ≥ ∣ B ∣ |A| \geq |B| ∣ A ∣ ≥ ∣ B ∣
Bijection
both one-to-one and onto
There exists a bijection f : A → B f: A \rightarrow B f : A → B iff ∣ A ∣ = ∣ B ∣ |A| = |B| ∣ A ∣ = ∣ B ∣
Inverse and composition
Relations and their properties
a binary relation from A A A to B B B is a subset of A × B A \times B A × B .
( a , b ) ∈ R (a, b) \in R ( a , b ) ∈ R , a R b aRb a R b
( a , b ) ∉ R (a, b) \not \in R ( a , b ) ∈ R , a R̸ b a \not R b a R b
reflexivity — a R a aRa a R a
symmetric — a R b ⟹ b R a aRb \implies bRa a R b ⟹ b R a
antisymmetric — a R b ∧ b R a ⟹ a = b aRb \wedge bRa \implies a = b a R b ∧ b R a ⟹ a = b
transitivity — a R b ∧ b R c ⟹ a R c aRb \wedge bRc \implies aRc a R b ∧ b R c ⟹ a R c
composing — a R b ∧ b S c ⟹ a ( R ∘ S ) c aRb \wedge bSc \implies a(R \circ S)c a R b ∧ b S c ⟹ a ( R ∘ S ) c
Representing relations
∣ A ∣ × ∣ B ∣ |A| \times |B| ∣ A ∣ × ∣ B ∣ adjacency matrix M M M where:
m i , j = { 1 a i R b j 0 a i R̸ b j m_{i,j} = \begin{cases}1 & a_i R b_j \\ 0 & a_i \not R b_j\end{cases} m i , j = { 1 0 a i R b j a i R b j
di(rected)graphs
Relation closures
"S S S is the closure of R R R wrt P P P " — relation S S S is the smallest set that satisfies property P P P such that R ⊆ S R \subseteq S R ⊆ S
[R R R is a relation on set A A A ]:
reflexive closure
S = R ∪ { ( a , a ) : a ∈ A } S = R \cup \{(a, a) : a \in A\} S = R ∪ { ( a , a ) : a ∈ A }
symmetric closure
S = R ∪ { ( a , b ) : ( b , a ) ∈ R } S = R \cup \{(a,b) : (b,a) \in R\} S = R ∪ { ( a , b ) : ( b , a ) ∈ R }
transitive closure
Equivalence relations
two elements are equivalent (a ∼ b a \sim b a ∼ b ) if they’re related by an equivalent relation.
R R R is an equivalence relation on A A A , a ∈ A a \in A a ∈ A . equivalence class of a a a ([ a ] [a] [ a ] ) is { b ∈ A : ( a , b ) ∈ R } \{b \in A: (a,b) \in R\} { b ∈ A : ( a , b ) ∈ R }
partition of A A A .
A 1 , A 2 , ⋯ , A n A_1, A_2, \cdots, A_n A 1 , A 2 , ⋯ , A n are non-empty sets
for any a , b ≤ n a, b \leq n a , b ≤ n , A a ∩ A b = ∅ A_a \cap A_b = \varnothing A a ∩ A b = ∅
⋃ i = 1 n A i = A \bigcup_{i=1}^{n} A_i = A ⋃ i = 1 n A i = A
bipartition: n = 2 n = 2 n = 2 .
fundamental theorem of equivalence relations : the set of all equivalence classes of a relation R R R on A A A form a partition of A A A .
Partial Orders
reflexive
antisymmetric
transitive
set A A A + partial order R R R = partial-ordered set aka “poset” ( A , R ) (A, R) ( A , R )
a a a and b b b are elements of a poset ( A , ⪯ ) (A, \preceq) ( A , ⪯ ) . a a a and b b b are comparable if either a ⪯ b a \preceq b a ⪯ b or b ⪯ a b \preceq a b ⪯ a . else, they are incomparable .
if ( A , ⪯ ) (A, \preceq) ( A , ⪯ ) is a poset where every two elements of A A A are comparable, A A A is a totally ordered set and ⪯ \preceq ⪯ is a total order .
Hasse diagrams are graphical reps of partial orders. excludes reflexivity and arrows that can be obtained by transitivty.
a a a is a maximal element if there’s no c ∈ A , c ≠ a c \in A, c \neq a c ∈ A , c = a such that a ⪯ c a \preceq c a ⪯ c
b b b is a minimal element if there’s no c ∈ A , c ≠ b c \in A, c \neq b c ∈ A , c = b such that c ⪯ b c \preceq b c ⪯ b
minimal and maximal elements do not have to exist or be unique.
a a a is the greatest element if ∀ c ∈ A , c ⪯ a \forall c \in A, c \preceq a ∀ c ∈ A , c ⪯ a .
b b b is the least element if ∀ c ∈ A , b ⪯ c \forall c \in A, b \preceq c ∀ c ∈ A , b ⪯ c .
greatest and least elements do not have to exist, but they must be unique.
a totally ordered set ( A , ⪯ ) (A, \preceq) ( A , ⪯ ) is a well-ordered set if every non-empty subset of A A A has a least element
Matrices
Graphs
Undirected graphs
Pair ( V , E ) (V, E) ( V , E ) where:
V V V is a non-empty set of nodes
E ⊆ { A : A ∈ P ( V ) ∧ ∣ A ∣ = 2 } E \subseteq \{A : A \in \mathscr{P}(V) \wedge |A| = 2\} E ⊆ { A : A ∈ P ( V ) ∧ ∣ A ∣ = 2 }
Directed graphs
aka digraphs
Pair ( V , E ) (V, E) ( V , E ) where:
V V V is a non-empty set of nodes
E ⊆ ( V × V ) − { ( u , u ) : u ∈ V } E \subseteq (V \times V) - \{(u, u) : u \in V\} E ⊆ ( V × V ) − { ( u , u ) : u ∈ V }
Jargon
Special graphs
complete graph : graph with an edge between every two nodes
cycle
bipartite graph
V V V can be bipartitioned into V 1 V_1 V 1 and V 2 V_2 V 2 such that E ⊆ ( V 1 × V 2 ∪ V 2 × V 1 ) E \subseteq (V_1 \times V_2 \cup V_2 \times V_1) E ⊆ ( V 1 × V 2 ∪ V 2 × V 1 )
bipartite theorem : a graph is bipartite iff there exists a function f : V → { 1 , 2 } f : V \rightarrow \{1, 2\} f : V → { 1 , 2 } such that ∀ u , v ∈ E , f ( u ) ≠ f ( v ) \forall {u, v} \in E, f(u) \neq f(v) ∀ u , v ∈ E , f ( u ) = f ( v )
colour neighbours of 1 1 1 -coloured nodes with 2 2 2 and vice-versa
forest : acyclic
tree : connected and acyclic.
Connectivity
path
simple — path that doesn’t contain the same node more than once
circuit — path that begins and ends at the same node
cycle — simple circuit
connected
connected component
strongly connected
weakly connected
strongly connected component
Graph Isomorphism
graph isomorphism
equivalence relation
subgraph isomorphism
partial order
Q: subgraph isomorphism vs subgraph
every graph is isomorphic to itself
( g 1 , g 2 ) ∈ R → ( g 2 , g 1 ) ∈ R (g_1, g_2) \in R \rightarrow (g_2, g_1) \in R ( g 1 , g 2 ) ∈ R → ( g 2 , g 1 ) ∈ R
( g 1 , g 2 ) ∈ R ∧ ( g 2 , g 3 ) ∈ R → ( g 1 , g 3 ) ∈ R (g_1, g_2) \in R \wedge (g_2, g_3) \in R \rightarrow (g_1, g_3) \in R ( g 1 , g 2 ) ∈ R ∧ ( g 2 , g 3 ) ∈ R → ( g 1 , g 3 ) ∈ R
Representing graphs
n : = n := n : = number of nodes
m : = m := m : = number of edges
Sequences and Summations
sequence : a function from a subset of the set of integers to a set S S S . a n a_n a n , { a n } \{a_n\} { a n }
geometric progression a , a r , a r 2 , ⋯ a, ar, ar^2, \cdots a , a r , a r 2 , ⋯
initial term a a a , common ratio r r r
arithmetic progression a , a + d , a + 2 d , ⋯ a, a + d, a + 2d, \cdots a , a + d , a + 2 d , ⋯
initial term a a a , common difference d d d
recurrence relation for a sequence is an equation that expresses a n a_n a n in terms of one or more of the previous terms of the sequence
summation :
∑ n = 0 ∞ a n \sum_{n = 0}^{\infty} a_n ∑ n = 0 ∞ a n
∑ n = i n = j a n \sum_{n = i}^{n=j} a_n ∑ n = i n = j a n
∑ n ∈ S a n \sum_{n \in S} a_n ∑ n ∈ S a n
series : sum of a sequence from n = 0 n = 0 n = 0 or n = 1 n = 1 n = 1 as necessary
telescoping sums
perturbation method
Algorithm Efficiency
finite sequence of precise instructions for performing a computation
a name
input/output
unambiguous steps
finite {time,number of steps}
correct output values for each set of input values
Big O notation
f ( x ) = O ( g ( x ) ) f(x) = \mathcal{O}\left(g(x)\right) f ( x ) = O ( g ( x ) ) if there exist positive constants C C C and k k k such that f ( x ) ≤ C g ( x ) f(x) \leq Cg(x) f ( x ) ≤ C g ( x ) for all x ≥ k x \geq k x ≥ k
f ( x ) = a n x n + a n − 1 x n − 1 + ⋯ + a 1 x + a 0 = O ( x n ) f(x) = a_n x^n + a_{n - 1} x^{n - 1} + \cdots +a_1 x + a_0 = \mathcal{O}(x^n) f ( x ) = a n x n + a n − 1 x n − 1 + ⋯ + a 1 x + a 0 = O ( x n )
f 1 ( x ) = O ( g ( x ) ) f_1(x) = \mathcal{O}(g(x)) f 1 ( x ) = O ( g ( x ) ) , f 2 ( x ) = O ( h ( x ) ) f_2(x) = \mathcal{O}(h(x)) f 2 ( x ) = O ( h ( x ) ) . ( f 1 + f 2 ) ( x ) = O ( max ( g ( x ) , h ( x ) ) ) (f_1 + f_2)(x) = \mathcal{O}\left(\max\left(g(x), h(x)\right)\right) ( f 1 + f 2 ) ( x ) = O ( max ( g ( x ) , h ( x ) ) )
f 1 ( x ) = O ( g ( x ) ) f_1(x) = \mathcal{O}(g(x)) f 1 ( x ) = O ( g ( x ) ) , f 2 ( x ) = O ( h ( x ) ) f_2(x) = \mathcal{O}(h(x)) f 2 ( x ) = O ( h ( x ) ) . ( f 1 ⋅ f 2 ) ( x ) = O ( g ( x ) ⋅ h ( x ) ) (f_1 \cdot f_2)(x) = \mathcal{O}(g(x) \cdot h(x)) ( f 1 ⋅ f 2 ) ( x ) = O ( g ( x ) ⋅ h ( x ) )
Big Omega notation
f ( x ) = Ω ( g ( x ) ) f(x) = \Omega\left(g(x)\right) f ( x ) = Ω ( g ( x ) ) if there exist positive constants C C C and k k k such that f ( x ) ≥ C g ( x ) f(x) \geq Cg(x) f ( x ) ≥ C g ( x ) for all x ≥ k x \geq k x ≥ k
f ( x ) = Ω ( g ( x ) ) ⇔ g ( x ) = O ( f ( x ) ) f(x) = \Omega(g(x)) \Leftrightarrow g(x) = \mathcal{O}(f(x)) f ( x ) = Ω ( g ( x ) ) ⇔ g ( x ) = O ( f ( x ) )
Big Theta notation
… aka “order of”
f ( x ) = Θ ( g ( x ) ) f(x) = \Theta\left(g(x)\right) f ( x ) = Θ ( g ( x ) ) if there exist positive constants C 1 C_1 C 1 , C 2 C_2 C 2 , and k k k such that C 1 g ( x ) ≤ f ( x ) ≤ C 2 g ( x ) C_1 g(x) \leq f(x) \leq C_2 g(x) C 1 g ( x ) ≤ f ( x ) ≤ C 2 g ( x ) for all x ≥ k x \geq k x ≥ k
f ( x ) = Θ ( g ( x ) ) f(x) = \Theta(g(x)) f ( x ) = Θ ( g ( x ) ) iff f ( x ) = O ( g ( x ) ) f(x) = \mathcal{O}(g(x)) f ( x ) = O ( g ( x ) ) and f ( x ) = Ω ( g ( x ) ) f(x) = \Omega(g(x)) f ( x ) = Ω ( g ( x ) ) .
Complexity of algorithms
efficiency:
ignore multiplicative and additive constants
for a given input size n n n :
worst case — upper bound
best case — usually useless
average case — informative, but requires making assumptions about the input
Θ ( 1 ) \Theta(1) Θ ( 1 )
Θ ( log n ) \Theta(\log n) Θ ( log n )
Θ ( n ) \Theta(n) Θ ( n )
Θ ( n log n ) \Theta(n \log n) Θ ( n log n )
Θ ( n a ) \Theta(n^a) Θ ( n a )
Θ ( a n ) \Theta(a^n) Θ ( a n ) , n > 1 n > 1 n > 1
Θ ( n ! ) \Theta(n!) Θ ( n ! )
if there exists a polynomial-time worst-case algorithm for a problem, the problem is tractable . else, it’s intractable .
tractable: class P
intractable
solution can be verified in polynomial time: class NP
class NP-complete
halting problem :
does a given computer program terminate for a given input
proven unsolvable
post correspondence problem
Asymptoticities of General Functions
f ( x ) = O ( g ( x ) ) f(x) = \mathcal{O}(g(x)) f ( x ) = O ( g ( x ) ) iff ∃ k , C \exists k, C ∃ k , C such that ∣ f ( x ) ∣ ≤ C ⋅ ∣ g ( x ) ∣ |f(x)| \leq C \cdot |g(x)| ∣ f ( x ) ∣ ≤ C ⋅ ∣ g ( x ) ∣ for all x ≥ k x \geq k x ≥ k .
e.g. − x = O ( − x 2 ) -x = \mathcal{O}(-x^2) − x = O ( − x 2 )
Equality and Equivalence
“f ( x ) = O ( g ) f(x) = \mathcal{O(g)} f ( x ) = O ( g ) ” is notation abuse.
O ( g ) = { h : ∃ C , k , ∀ x ≥ k , h ( x ) ≤ C ⋅ g ( x ) } \mathcal{O}(g) = \{h : \exists C, k, \forall x \geq k, h(x) \leq C \cdot g(x)\} O ( g ) = { h : ∃ C , k , ∀ x ≥ k , h ( x ) ≤ C ⋅ g ( x ) }
f ( x ) ∈ O ( g ) f(x) \in \mathcal{O(g)} f ( x ) ∈ O ( g )
Asymptotic equivalence (Θ \Theta Θ ) is an equivalence relation.
Helpful results
f ( x ) = O ( g ( x ) ) ↔ g ( x ) = Ω ( f ( x ) ) f(x) = \mathcal{O}(g(x)) \leftrightarrow g(x) = \Omega(f(x))
f ( x ) = O ( g ( x ) ) ↔ g ( x ) = Ω ( f ( x ) )
f ( x ) = Θ ( g ( x ) ) ↔ [ f ( x ) = O ( g ( x ) ) ∧ f ( x ) = Ω ( g ( x ) ) ] f(x) = \Theta(g(x)) \leftrightarrow [f(x) = \mathcal{O}(g(x)) \wedge f(x) = \Omega(g(x))]
f ( x ) = Θ ( g ( x ) ) ↔ [ f ( x ) = O ( g ( x ) ) ∧ f ( x ) = Ω ( g ( x ) ) ]
Let f ( x ) = a n x n + a n − 1 x n − 1 + ⋯ + a 1 x + a 0 f(x) = a_n x^n + a_{n - 1} x^{n - 1} + \cdots + a_1 x + a_0 f ( x ) = a n x n + a n − 1 x n − 1 + ⋯ + a 1 x + a 0 where a 0 , a 1 , … , a n ∈ R a_0, a_1, \ldots, a_n \in \mathbb{R} a 0 , a 1 , … , a n ∈ R . Then, f ( x ) = O ( x n ) f(x) = \mathcal{O}(x^n) f ( x ) = O ( x n ) .
Let f 1 = O ( g ) f_1 = \mathcal{O}(g) f 1 = O ( g ) and f 2 = O ( h ) f_2 = \mathcal{O}(h) f 2 = O ( h ) . Then, ( f 1 + f 2 ) = O ( max ( g , h ) ) (f_1 + f_2) = \mathcal{O}(\max(g, h)) ( f 1 + f 2 ) = O ( max ( g , h ) ) .
Let f 1 = O ( g ) f_1 = \mathcal{O}(g) f 1 = O ( g ) and f 2 = O ( h ) f_2 = \mathcal{O}(h) f 2 = O ( h ) . Then, ( f 1 ⋅ f 2 ) = O ( g ⋅ h ) (f_1 \cdot f_2) = \mathcal{O}(g \cdot h) ( f 1 ⋅ f 2 ) = O ( g ⋅ h ) .
Limit-based definitions
If f = O ( g ) f = \mathcal{O}(g) f = O ( g ) , lim x → ∞ f g < ∞ \lim_{x \rightarrow \infty} \frac{f}{g} < \infty lim x → ∞ g f < ∞ .
If f = Ω ( g ) f = \Omega(g) f = Ω ( g ) , lim x → ∞ f g > 0 \lim_{x \rightarrow \infty} \frac{f}{g} > 0 lim x → ∞ g f > 0 .
If f = Θ ( g ) f = \Theta(g) f = Θ ( g ) , 0 < lim x → ∞ f g < ∞ 0 < \lim_{x \rightarrow \infty} \frac{f}{g} < \infty 0 < lim x → ∞ g f < ∞ .
Little ones
f = o ( g ) f = o(g) f = o ( g ) iff lim x → ∞ f g = 0 \lim_{x \rightarrow \infty} \frac{f}{g} = 0 lim x → ∞ g f = 0 .
f = ω ( g ) f = \omega(g) f = ω ( g ) iff lim x → ∞ f g = ∞ \lim_{x \rightarrow \infty} \frac{f}{g} = \infty lim x → ∞ g f = ∞ .
Asymptotic analogies to relational operators
Relational operator
Analogous asymptotic operator
≤ \leq ≤
O \mathcal{O} O
≥ \geq ≥
Ω \Omega Ω
= = =
Θ \Theta Θ
< < <
o o o
> > >
ω \omega ω
BFS
[see also: COMP 140’s BFS section ]
[see also: COMP 140’s Queue section ]
Divide-Concur and Recurrences
A problem of size n n n is divided into b b b smaller instances (ideally of equal size) of the same problem.
Some of the smaller instances are solved.
The solutions to the smaller instances are combined, if necessary .
Merge sort
split each array into two subarrays
arrays of size one are trivially sorted
work backwards and successively merge sorted subarrays
M e r g e S o r t \mathrm{MergeSort} M e r g e S o r t :
if n > 1
copy L[ 0 . . n/ 2 - 1 ] to A
copy L[ n/ 2 . . n] to B
MergeSort ( A)
MergeSort ( B)
Merge ( A, B, L)
M e r g e \mathrm{Merge} M e r g e :
i = 0 , j = 0 , k = 0
while i < p and j < q
if A[ i] <= B[ j]
L[ k] = A[ i]
i++
else
L[ k] = B[ j]
j++
k++
if i = p
copy B[ j . . q- 1 ] to L[ k . . p+ q- 1 ]
else
copy A[ i . . p- 1 ] to L[ k . . p+ q- 1 ]
Running time T ( n ) = 2 T ( n 2 ) + O ( n ) T(n) = 2 T\left(\frac{n}{2}\right) + \mathcal{O}(n) T ( n ) = 2 T ( 2 n ) + O ( n ) .
Master theorem : Let f f f be an increasing function such that f ( n ) = a f ( n b ) + c n d f(n) = a f\left(\frac{n}{b}\right) + cn^d f ( n ) = a f ( b n ) + c n d . Then:
f ( n ) = { O ( n d ) a < b d O ( n d log n ) a = b d O ( n log b a ) a > b d f(n) =
\begin{cases}
\mathcal{O}\left(n^d\right) & a < b^d \\
\mathcal{O}\left(n^d \log n\right) & a = b^d \\
\mathcal{O}\left(n^{\log_b a}\right) & a > b^d
\end{cases}
f ( n ) = ⎩ ⎪ ⎪ ⎨ ⎪ ⎪ ⎧ O ( n d ) O ( n d log n ) O ( n l o g b a ) a < b d a = b d a > b d
From the master theorem T ( n ) = O ( n log n ) T(n) = \mathcal{O}(n \log n) T ( n ) = O ( n log n ) .
Binary search
T ( n ) = T ( n 2 ) + O ( 1 ) = O ( log n ) T(n) = T\left(\frac{n}{2}\right) + \mathcal{O}(1) = \mathcal{O}(\log n) T ( n ) = T ( 2 n ) + O ( 1 ) = O ( log n )
Mathematical Induction
[ P ( 1 ) ∧ ∀ k ( P ( k ) → P ( k + 1 ) ) ] → ∀ n P ( n ) [P(1) \wedge \forall k (P(k) \rightarrow P(k + 1))] \rightarrow \forall n P(n) [ P ( 1 ) ∧ ∀ k ( P ( k ) → P ( k + 1 ) ) ] → ∀ n P ( n )
Important fact: The set of positive integers is well-ordered .
Basis step : Show that P ( 1 ) P(1) P ( 1 )
Inductive step : Show that ∀ k P ( k ) → P ( k + 1 ) \forall k P(k) \rightarrow P(k + 1) ∀ k P ( k ) → P ( k + 1 )
The assumption that P ( k ) P(k) P ( k ) is called the inductive hypothesis
Strong Mathematical Induction
Basis step : Show that P ( 1 ) P(1) P ( 1 )
Inductive step : Show that ∀ k [ P ( 1 ) ∧ P ( 2 ) ∧ ⋯ ∧ P ( k ) ] → P ( k + 1 ) \forall k [P(1) \wedge P(2) \wedge \cdots \wedge P(k)] \rightarrow P(k + 1) ∀ k [ P ( 1 ) ∧ P ( 2 ) ∧ ⋯ ∧ P ( k ) ] → P ( k + 1 )
Recursive definitions
Σ = { 0 , 1 } ⟹ Σ ∗ = { ϵ , 0 , 1 , 00 , 01 , 10 , 11 , 000 , … } \Sigma = \{0, 1\} \implies \Sigma^* = \{\epsilon, 0, 1, 00, 01, 10, 11, 000, \ldots\} Σ = { 0 , 1 } ⟹ Σ ∗ = { ϵ , 0 , 1 , 0 0 , 0 1 , 1 0 , 1 1 , 0 0 0 , … }
ϵ \epsilon ϵ : empty string
Exclusion rule
Weighted graphs and greedy algorithms
Optimization problems and greedy algorithms
Optimization problem : maximing or minimizing something.
Greedy algorithm :
piece-by-piece
choose the locally optimal (valid) piece as the next piece
don’t work for many problems.
Minimum Spanning Tree
input: weighted graph
output: tree such that the sum of the edges in the tree — ∑ e ∈ T w e i g h t e \sum_{e \in T} weight_e ∑ e ∈ T w e i g h t e — is minimum
Prim’s algorithm :
V T = { x } V_T = \{x\} V T = { x }
E T = ∅ E_T = \varnothing E T = ∅
for i i i from 1 to ∣ V ∣ − 1 |V| - 1 ∣ V ∣ − 1
find the lightest edge e = { u , v } ∈ E e = \{u, v\} \in E e = { u , v } ∈ E such that u ∈ V T u \in V_T u ∈ V T and v ∈ V − V T v \in V - V_T v ∈ V − V T .
V T = V T ∪ { v } V_T = V_T \cup \{v\} V T = V T ∪ { v }
E T = E T ∪ { e } E_T = E_T \cup \{e\} E T = E T ∪ { e }
return ( V T , E T ) (V_T, E_T) ( V T , E T )
Single Source Shortest Paths problem
input: weighted graph g = ( V , E , w ) g = (V, E, w) g = ( V , E , w ) such that w ( e ) ≥ 0 ∀ e ∈ E w(e) \geq 0 \forall e \in E w ( e ) ≥ 0 ∀ e ∈ E ; a source node i ∈ V i \in V i ∈ V .
output: distance from i i i to each node j ∈ V j \in V j ∈ V , d J d_J d J ; a shortest path from i i i to j j j , p j p_j p j .
Dijkstra’s algorithm :
x = ∅ x = \varnothing x = ∅
for j ∈ V j \in V j ∈ V
d j = ∞ d_j = \infty d j = ∞ ; p j = n u l l p_j = null p j = n u l l ; x = x ∪ { j } ; x = x \cup \{j\}; x = x ∪ { j } ;
d i = 0 d_i = 0 d i = 0
while X ≠ ∅ X \neq \varnothing X = ∅
let k k k be a node with minimum d k d_k d k
if d k = 0 d_k = 0 d k = 0 : break
X ∖ { k } X \setminus \{k\} X ∖ { k }
for each neighbour h h h of k k k in X X X
if d k + w ( ( k , h ) ) < d h d_k + w((k, h)) < d_h d k + w ( ( k , h ) ) < d h
d h = d k + w ( ( k , h ) ) d_h = d_k + w((k, h)) d h = d k + w ( ( k , h ) )
p h = k p_h = k p h = k
return d , p d, p d , p
O ( m log n ) \mathcal{O}(m \log n) O ( m log n ) assuming that the graph is represented as an adj. list and that X X X is represented as a min-heap.
Counting
Pigeonhole principle
n n n holes, n + 1 n + 1 n + 1 pigeons; at least one of the pigeonholes has two pigeons.
f : A → B f: A \rightarrow B f : A → B . ∣ B ∣ < ∣ A ∣ ⟹ ∃ a 1 , a 2 ∈ A : a 1 ≠ a 2 ∧ f ( a 1 ) = f ( a 2 ) |B| < |A| \implies \exists a_1, a_2 \in A : a_1 \neq a_2 \wedge f(a_1) = f(a_2) ∣ B ∣ < ∣ A ∣ ⟹ ∃ a 1 , a 2 ∈ A : a 1 = a 2 ∧ f ( a 1 ) = f ( a 2 ) .
f : X → Y f: X \rightarrow Y f : X → Y . ∣ Y ∣ < ∣ X ∣ ⟹ ∃ y ∈ Y : ∣ { x : f ( x ) = y } ∣ ≥ ⌈ ∣ X ∣ ∣ Y ∣ ⌉ |Y| < |X| \implies \exists y \in Y : |\{x : f(x) = y\}| \geq \lceil\frac{|X|}{|Y|}\rceil ∣ Y ∣ < ∣ X ∣ ⟹ ∃ y ∈ Y : ∣ { x : f ( x ) = y } ∣ ≥ ⌈ ∣ Y ∣ ∣ X ∣ ⌉
Product and division rules
Product rule: if A = A 1 × A 2 × ⋯ × A n A = A_1 \times A_2 \times \cdots \times A_n A = A 1 × A 2 × ⋯ × A n , then ∣ A ∣ = ∣ A 1 ∣ × ∣ A 2 ∣ × ⋯ × ∣ A n ∣ |A| = |A_1| \times |A_2| \times \cdots \times |A_n| ∣ A ∣ = ∣ A 1 ∣ × ∣ A 2 ∣ × ⋯ × ∣ A n ∣
Division rule.
Sum and subtraction rules
Sum rule
A = A 1 ∪ A 2 ∪ ⋯ ∪ A n A = A_1 \cup A_2 \cup \cdots \cup A_n A = A 1 ∪ A 2 ∪ ⋯ ∪ A n where A 1 , A 2 , … A_1, A_2, \ldots A 1 , A 2 , … are pairwise disjoint. ∣ A ∣ = ∑ ∣ A i ∣ |A| = \sum |A_i| ∣ A ∣ = ∑ ∣ A i ∣ .
Difference rule
A = A 1 ∪ A 2 A = A_1 \cup A_2 A = A 1 ∪ A 2 . ∣ A ∣ = ∣ A 1 ∣ + ∣ A 2 ∣ − ∣ A ∩ B ∣ |A| = |A_1| + |A_2| - |A \cap B| ∣ A ∣ = ∣ A 1 ∣ + ∣ A 2 ∣ − ∣ A ∩ B ∣ .
Permutations and Combinations
Without repetition
Permutations : Order matters.
0 ≤ r ≤ n 0 \leq r \leq n 0 ≤ r ≤ n . P ( n , r ) = n ! ( n − r ) ! P(n, r) = \frac{n!}{(n - r)!} P ( n , r ) = ( n − r ) ! n !
Combinations : Order doesn’t matter.
0 ≤ r ≤ n 0 \leq r \leq n 0 ≤ r ≤ n . C ( n , r ) = n C r = ( n r ) = n ! r ! ( n − r ) ! C(n, r) = ^nC_r = \binom{n}{r} = \frac{n!}{r!(n - r)!} C ( n , r ) = n C r = ( r n ) = r ! ( n − r ) ! n !
C ( n , r ) = P ( n , r ) r ! C(n, r) = \frac{P(n, r)}{r!} C ( n , r ) = r ! P ( n , r )
With repetition
the number of k k k -permutations of n n n items with repetition: n k n^k n k
the number of k k k -combinations of n n n items with repetition: ( n + k − 1 k ) = ( n + k − 1 n − 1 ) \binom{n + k - 1}{k} = \binom{n + k - 1}{n - 1} ( k n + k − 1 ) = ( n − 1 n + k − 1 )
(aka. the number of ways to choose a subset of size k k k from a set of size n n n )
Inclusion-Exclusion principle
∣ ⋃ i = 1 n A i ∣ = ∑ i = 1 n ∣ A i ∣ − ∑ 1 ≤ i ≤ j ≤ n ∣ A i ∩ A j ∣ + ∑ 1 ≤ i ≤ j ≤ k ≤ n ∣ A i ∩ A j ∩ A k ∣ − ⋯ + ( − 1 ) n ∣ A 1 ∩ A 2 ∩ ⋯ ∩ A n ∣ \left|\bigcup_{i = 1}^n A_i\right| = \sum_{i = 1}^n |A_i| - \sum_{1 \leq i \leq j \leq n} |A_i \cap A_j| + \sum_{1 \leq i \leq j \leq k \leq n} |A_i \cap A_j \cap A_k| - \cdots + (-1)^n |A_1 \cap A_2 \cap \cdots \cap A_n| ∣ ⋃ i = 1 n A i ∣ = ∑ i = 1 n ∣ A i ∣ − ∑ 1 ≤ i ≤ j ≤ n ∣ A i ∩ A j ∣ + ∑ 1 ≤ i ≤ j ≤ k ≤ n ∣ A i ∩ A j ∩ A k ∣ − ⋯ + ( − 1 ) n ∣ A 1 ∩ A 2 ∩ ⋯ ∩ A n ∣
∣ ⋃ i = 1 n A i ∣ = ∑ k = 1 n ( − 1 ) k + 1 ( ∑ 1 ⩽ i 1 < ⋯ < i k ⩽ n ∣ A i 1 ∩ ⋯ ∩ A i k ∣ ) \left|\bigcup_{i=1}^n A_i\right| = \sum_{k=1}^n (-1)^{k+1} \left( \sum_{1 \leqslant i_1 < \cdots < i_k \leqslant n} | A_{i_1} \cap \cdots \cap A_{i_k} | \right) ∣ ⋃ i = 1 n A i ∣ = ∑ k = 1 n ( − 1 ) k + 1 ( ∑ 1 ⩽ i 1 < ⋯ < i k ⩽ n ∣ A i 1 ∩ ⋯ ∩ A i k ∣ )
f : A → B f: A \rightarrow B f : A → B ; ∣ A ∣ = m |A| = m ∣ A ∣ = m , ∣ B ∣ = n |B| = n ∣ B ∣ = n
total number of functions: n m n^m n m
total number of 1-1 functions: n ( n − 1 ) ( n − 2 ) ⋯ ( n − m + 1 ) n(n - 1)(n - 2) \cdots (n - m + 1) n ( n − 1 ) ( n − 2 ) ⋯ ( n − m + 1 )
total number of onto functions: n m − ( n 1 ) ( n − 1 ) m + ( n 2 ) ( n − 2 ) m − ⋯ + ( − 1 ) n − 1 ( n n − 1 ) 1 m n^m - \binom{n}{1} (n - 1)^m + \binom{n}{2} (n - 2)^m - \cdots + (-1)^{n - 1} \binom{n}{n - 1} 1^m n m − ( 1 n ) ( n − 1 ) m + ( 2 n ) ( n − 2 ) m − ⋯ + ( − 1 ) n − 1 ( n − 1 n ) 1 m
Binomial coefficients
( x + y ) n = ∑ i = 0 n ( n i ) x n − i y i \left(x + y\right)^n = \sum_{i = 0}^n \binom{n}{i} x^{n - i} y^i
( x + y ) n = i = 0 ∑ n ( i n ) x n − i y i
Combinatorial Proofs
use counting arguments to show LS and RS count the same objects in different ways
show that there is a bijection between the sets counted on LS and RS
pascal’s identity : ( n k ) = ( n − 1 k − 1 ) + ( n − 1 k ) \binom{n}{k} = \binom{n - 1}{k - 1} + \binom{n - 1}{k} ( k n ) = ( k − 1 n − 1 ) + ( k n − 1 )
vandermonde’s identity : ( m + n k ) = ∑ l = 0 k ( m k − l ) ( n l ) \binom{m + n}{k} = \sum_{l = 0}^{k} \binom{m}{k - l} \binom{n}{l} ( k m + n ) = ∑ l = 0 k ( k − l m ) ( l n )
Disscrete Probability
Average case analysis
Randomized algorithms
Machine learning
Stochastic processes
Experiment, Sample Space, and Event
experiment
sample space
event
Assigning probability
Given a sample space S S S , a probability distribution is a function p : S → [ 0 , 1 ] p : S \rightarrow [0, 1] p : S → [ 0 , 1 ] such that ∑ s ∈ S p ( s ) = 1 \sum_{s \in S} p(s) = 1 ∑ s ∈ S p ( s ) = 1 .
uniform distribution : ∀ s ∈ S , p ( s ) = 1 ∣ S ∣ \forall s \in S, p(s) = \frac{1}{|S|} ∀ s ∈ S , p ( s ) = ∣ S ∣ 1
Conditional Probability and Independence
P ( E ∣ F ) = P ( E ∩ F ) P ( F ) P(E|F) = \frac{P(E \cap F)}{P(F)} P ( E ∣ F ) = P ( F ) P ( E ∩ F ) (P ( F ) ≠ 0 P(F) \neq 0 P ( F ) = 0 )
If P ( E ∣ F ) = P ( E ) P(E|F) = P(E) P ( E ∣ F ) = P ( E ) , E E E is independent of F F F
P ( E ∩ F ) = P ( E ) ⋅ P ( F ) ⟹ P(E \cap F) = P(E) \cdot P(F) \implies P ( E ∩ F ) = P ( E ) ⋅ P ( F ) ⟹ E E E is independent of F F F
P ( E ∣ F ) = P ( F ∣ E ) P ( E ) P ( F ) P(E|F) = \frac{P(F|E) P(E)}{P(F)} P ( E ∣ F ) = P ( F ) P ( F ∣ E ) P ( E ) where P ( F ) ≠ 0 P(F) \neq 0 P ( F ) = 0 .
Bayes theorem
P ( D ∣ T ) = P ( T ∣ D ) P ( D ) P ( T ∣ D ) P ( D ) + P ( T ∣ D ′ ) P ( D ′ ) P(D|T) = \frac{P(T|D) P(D)}{P(T|D) P(D) + P(T|D') P(D')}
P ( D ∣ T ) = P ( T ∣ D ) P ( D ) + P ( T ∣ D ′ ) P ( D ′ ) P ( T ∣ D ) P ( D )
If F 1 , F 2 , … , F n F_1, F_2, \ldots, F_n F 1 , F 2 , … , F n is a partition of S S S ,
P ( F i ∣ E ) = P ( E ∣ F i ) P ( F i ) ∑ i = 1 n [ P ( E ∣ F i ) P ( F i ) ] P(F_i | E) = \frac{P(E|F_i)P(F_i)}{\sum_{i = 1}^n [P(E|F_i) P(F_i)]}
P ( F i ∣ E ) = ∑ i = 1 n [ P ( E ∣ F i ) P ( F i ) ] P ( E ∣ F i ) P ( F i )
Bayesian inference : P ( M ∣ D ) ∝ P ( D ∣ M ) P ( M ) P(M|D) \propto P(D|M) P(M) P ( M ∣ D ) ∝ P ( D ∣ M ) P ( M )
Random variables
A random variable is a function from the sample space to the set R \mathbb{R} R .
Expected values
The expectation of a rv X X X is its mean.
E ( X ) = ∑ s ∈ S p ( s ) ⋅ X ( s ) = ∑ x p ( x ) ⋅ x \mathbb{E}(X) = \sum_{s \in S} p(s) \cdot X(s) = \sum_x p(x) \cdot x E ( X ) = ∑ s ∈ S p ( s ) ⋅ X ( s ) = ∑ x p ( x ) ⋅ x
Linearity of expectations
If X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n are rv’s on S S S , then E ( X 1 + X 2 + … + X n ) = E ( X 1 ) + E ( X 2 ) + ⋯ + E ( X n ) \mathbb{E}(X_1 + X_2 + \ldots + X_n) = \mathbb{E}(X_1) + \mathbb{E}(X_2) + \cdots + \mathbb{E}(X_n) E ( X 1 + X 2 + … + X n ) = E ( X 1 ) + E ( X 2 ) + ⋯ + E ( X n )
E ( a X + b ) = a E ( X ) + b \mathbb{E}(aX + b) = a\mathbb{E}(X) + b E ( a X + b ) = a E ( X ) + b
For independent random variables X X X and Y Y Y , E ( X Y ) = E ( X ) E ( Y ) \mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y) E ( X Y ) = E ( X ) E ( Y ) .
Average case running time of linear search
Variance
v a r ( X ) = ∑ x ( x − E ) 2 − p ( x ) var(X) = \sum_x \left(x - \mathbb{E}\right)^2 - p(x) v a r ( X ) = ∑ x ( x − E ) 2 − p ( x )
v a r ( X ) = E [ ( X − E ) 2 ] var(X) = \mathbb{E}[\left(X - \mathbb{E}\right)^2] v a r ( X ) = E [ ( X − E ) 2 ]
v a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2 var(X) = \mathbb{E}(X^2) - \left[\mathbb{E}(X)\right]^2 v a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2
Discrete Probability
Tail Bounds
Markov’s inequality : P ( x ≥ a ) ≤ E ( x ) a P(x \geq a) \leq \frac{\mathbb{E}(x)}{a} P ( x ≥ a ) ≤ a E ( x )
P ( x ≥ a E ( x ) ) ≤ 1 a P(x \geq a \mathbb{E}(x)) \leq \frac{1}{a} P ( x ≥ a E ( x ) ) ≤ a 1
Chebyshev’s inequality : P ( ∣ X − E ( x ) ∣ ≥ a ) ≤ V ( x ) a 2 P(|X - \mathbb{E}(x)| \geq a) \leq \frac{V(x)}{a^2} P ( ∣ X − E ( x ) ∣ ≥ a ) ≤ a 2 V ( x )
P ( ∣ x − E ( x ) ∣ ≥ a σ ) ≤ 1 a 2 P(|x - \mathbb{E}(x)| \geq a \sigma) \leq \frac{1}{a^2} P ( ∣ x − E ( x ) ∣ ≥ a σ ) ≤ a 2 1
Families of Distributions
Bernoulli Distribution
X = { 0 , 1 } X = \{0, 1\} X = { 0 , 1 }
E ( X ) = ∑ x x p ( x ) = 0 ⋅ ( 1 − p ) + 1 ⋅ p = p \mathbb{E}(X) = \sum_x x p(x) = 0 \cdot (1 - p) + 1 \cdot p = p E ( X ) = ∑ x x p ( x ) = 0 ⋅ ( 1 − p ) + 1 ⋅ p = p
V ( x ) = ∑ x ( x − p ) 2 p ( x ) = p ( 1 − p ) V(x) = \sum_x {(x - p)}^2 p(x) = p(1 - p) V ( x ) = ∑ x ( x − p ) 2 p ( x ) = p ( 1 − p )
P ( x ) = { p x = 1 1 − p x = 0 P(x) = \begin{cases}p & x = 1 \\ 1 - p & x = 0\end{cases} P ( x ) = { p 1 − p x = 1 x = 0
Binomial Distribution
Run n n n independent Bernoulli trials and count the number of successes in the sequence.
p ( X = k ) = ( n k ) p k ( 1 − p ) n − k p(X = k) = \binom{n}{k} p^k {(1 - p)}^{n - k} p ( X = k ) = ( k n ) p k ( 1 − p ) n − k
E ( X ) = n p \mathbb{E}(X) = np E ( X ) = n p
V ( X ) = n p ( 1 − p ) V(X) = np(1 - p) V ( X ) = n p ( 1 − p )
Geometric Distribution
p ( X = k ) = ( 1 − p ) k − 1 p p(X = k) = {(1 - p)}^{k - 1} p p ( X = k ) = ( 1 − p ) k − 1 p
E ( X ) = 1 p \mathbb{E}(X) = \frac{1}{p} E ( X ) = p 1
V ( X ) = 1 − p p 2 V(X) = \frac{1 - p}{p^2} V ( X ) = p 2 1 − p
Negative Binomial Distribution
l l l trials until we see k k k successes
p ( X = l ) = ( l − 1 k − 1 ) ( 1 − p ) l − k p k p(X = l) = \binom{l - 1}{k - 1} {(1 - p)}^{l - k} p^k p ( X = l ) = ( k − 1 l − 1 ) ( 1 − p ) l − k p k
Poisson Distribution
Markov Chains and Hidden Markov Models (HMMs)
initial distribution (π \pi π )
emission probability (E l ( X i ) E_l(X_i) E l ( X i ) )
transition matrix (A l ′ , l A_{l', l} A l ′ , l )
Parameter Estimation For Markov Chains
Dynamic Programming
Binomial Coefficients
Use pascal’s identity and a 2D array similar to a tilted pascal’s triangle.
Longest Increasing Subsequence
Input: an array of integers
Output: longest possible sequence of indices such that the values indexed by the indices are strictly increasing
Weighted Compatible Intervals
All pairs shortest paths
Longest common subsequence
RNA Secondary Structure Prediction
Knapsack Problem
Viterbi’s algorithm
v [ l , i ] = max l ′ ∈ Q ( v [ l ′ , i − 1 ] ⋅ A l ′ , l ) × E l ( x i ) v[l, i] = \max_{l' \in Q} (v[l', i - 1] \cdot A_{l', l}) \times E_l(x_i) v [ l , i ] = max l ′ ∈ Q ( v [ l ′ , i − 1 ] ⋅ A l ′ , l ) × E l ( x i )
for i in 1 . . L- 1 :
for l in Q:
v[ l] [ i] = ( max of ( v[ l_] [ i - 1 ] * A[ l_] [ l] ) over l_ in Q) * E[ l] [ x_i]
Recurrences
Modelling with recurrence relations
examples:
fibonacci
tower of hanoi
counting strings
counting partitions
Linear Homogeneous recurrences
a linear homogeneous recurrence relation of degree k k k with constant coefficients is a recurrence relation of the form a n = c 1 a n − 1 + c 2 a n − 2 + ⋯ + c k a n − k a_n = c_1 a_{n - 1} + c_2 a_{n - 2} + \cdots + c_k a_{n - k} a n = c 1 a n − 1 + c 2 a n − 2 + ⋯ + c k a n − k where c 1 , c 2 , … , c k ∈ R c_1, c_2, \ldots, c_k \in \mathbb{R} c 1 , c 2 , … , c k ∈ R and c k ≠ 0 c_k \neq 0 c k = 0 .
Infinite Sets and Their Cardinalities
Infinity and Hilbert’s Grand Hotel
Hotel with infinite number of (numbered) rooms.
Hotel is full. A new guest shows up.
Move every guest in room i i i to room i + 1 i + 1 i + 1 , assign room 1 to the new guest.
Bijection f : { 1 , 2 , 3 , … } → { 2 , 3 , 4 , … } f : \{1, 2, 3, \ldots\} \rightarrow \{2, 3, 4, \ldots\} f : { 1 , 2 , 3 , … } → { 2 , 3 , 4 , … } .
If X X X is finite, there does not exist a bijection f : X \righarrow Y where Y ⊂ X Y \subset X Y ⊂ X
Two-floor hotel
Bijection from { ( 1 , i ) , ( 2 , i ) : i ∈ N } → { 1 , 2 , 3 , … } \{(1, i), (2, i) : i \in \mathbb{N}\} \rightarrow \{1, 2, 3, \ldots\} { ( 1 , i ) , ( 2 , i ) : i ∈ N } → { 1 , 2 , 3 , … }
More floors
Real-numbered hotel to natural-numbered hotel?
A bijection from R → N \mathbb{R} \rightarrow \mathbb{N} R → N does not exist.
The Infinitesimally Small
Zeno’s dichotomy paradox.
Infinity, Injections, and Bijections
Set X X X is infinite if there exists a bijection f : X → Y f : X \rightarrow Y f : X → Y where Y ⊂ X Y \subset X Y ⊂ X .
If there exists a bijection from X → Y X \rightarrow Y X → Y , then X X X and Y Y Y have the same cardinality.
If there exists a 1-1 function f : X → Y f : X \rightarrow Y f : X → Y , then ∣ X ∣ ≤ ∣ Y ∣ |X| \leq |Y| ∣ X ∣ ≤ ∣ Y ∣ .
There exists a bijection f : X → Y f : X \rightarrow Y f : X → Y iff there exists a bijection g : Y → X g : Y \rightarrow X g : Y → X .
Shröder-Bernstein Theorem : if A A A and B B B are two sets such that ∣ A ∣ ≤ ∣ B ∣ |A| \leq |B| ∣ A ∣ ≤ ∣ B ∣ and ∣ B ∣ ≤ ∣ A ∣ |B| \leq |A| ∣ B ∣ ≤ ∣ A ∣ , then ∣ A ∣ = ∣ B ∣ |A| = |B| ∣ A ∣ = ∣ B ∣ . Show a one-to-one function f : A → B f : A \rightarrow B f : A → B and a one-to-one function g : B → A g : B \rightarrow A g : B → A .
An infinity of infinities
Every finite set is countable.
A infinite set X X X is countably infinite if there exists a bijection f : N → X f : \mathbb{N} \rightarrow X f : N → X .
A infinite set X X X is uncountable if there doesn’t exist a bijection f : N → X f : \mathbb{N} \rightarrow X f : N → X . Proof: Cantor’s diagonal argument.
ℵ 0 = ∣ N ∣ \aleph_0 = |\mathbb{N}| ℵ 0 = ∣ N ∣ — smallest possible cardinality of an infinite set
ℵ 1 = ∣ 2 N ∣ = 2 ℵ 0 \aleph_1 = |2^{\mathbb{N}}| = 2^{\aleph_0} ℵ 1 = ∣ 2 N ∣ = 2 ℵ 0
ℵ 2 = ∣ 2 2 N ∣ = 2 ℵ 1 \aleph_2 = |2^{2^{\mathbb{N}}}| = 2^{\aleph_1} ℵ 2 = ∣ 2 2 N ∣ = 2 ℵ 1
Continuum hypothesis : ∣ A ∣ < ∣ 2 A ∣ |A| < |2^A| ∣ A ∣ < ∣ 2 A ∣ . ℵ 0 < ℵ 1 < ⋯ \aleph_0 < \aleph_1 < \cdots ℵ 0 < ℵ 1 < ⋯
“Zermelo-Fraenkel axioms of set theory”