Shreyas’ Notes

# COMP 481

## Finite State Automata §

finite state machines.

the language of a fsa $M$ is the set of strings that are accepted by $M$

$L(M) = \{x \in \Sigma^* | M \textrm{ accepts } x\}$

DFSAs can’t count.

### Non-Deterministic Finite State Automata §

no more powerful than DFSA. can always be converted to an equivalent DFSA.

• same symbol may transition from one state to several different states
• $\epsilon$-transitions are permitted

accept if at least one path accepts

$M = \{Q, \Sigma, \delta, q_0, F\}$

• $\Sigma$: symbols
• $Q$: states
• $q_0$: initial state
• $F \subseteq Q$: accepting states
• $\sigma : Q \times \Sigma \rightarrow 2^Q$
• transitions need not be unique

simulate NDFSA by BFS’ing

### Minimal Machines §

DFSA that accepts a language $L$ with the fewest possible states. minimal DFSAs are unique.

“states are equivalence classes”

if $M$ is a DFSA that accepts $L$, then the number of states in $M$ is greater than or equal to the number of equivalence classes of $L$.

Theorem: if the number of equivalence classes of $L$ is finite, then there exists a DFSA $M$ that accepts $L$, such that the number of states of $M$ equals the number of equivalence classes of $L$. Therefore, $M$ is minimal.

Theorem: the minimal DFSA for a language $L$ is unique (up to state names).

Myhill-Nerode Theorem: There exists a DFSA $M$ that accepts a language $L$ if and only if the number of equivalence classes of $L$ is finite.

## Regular Languages §

Kleene closure:

• $\epsilon \in L^*$
• $\omega \in L^*$ if $\omega \in L$
• $\omega \omega \in L^*$ if $\omega \in L^*$ and $v \in L^*$

Alternatively, $L^* = \{x_1 x_2 \dots x_n \mid x_k \in L, n \geq 0\}$

Regular expressions:

• $\varnothing$, $\epsilon$, $a$ (where $a \in \Sigma$)
• $(r)$, $r \cup s$, $rs$, $r^*$ where $r, s$ are regular expressions on $\Sigma$

Properties:

• $\varnothing^* = \epsilon = \epsilon^*$
• $r\varnothing = \varnothing = \varnothing r$
• $r\epsilon = r = \epsilon r$
• $rs \neq sr$
• $r \cup (st) \neq (r \cup s)(r \cup t)$
• $\left(r \cup s\right)^* \neq r^* \cup s^*$
• $(rs)^* \neq r^* s^*$

Theorem: A language $L$ can be recognized by a DFA if and only if $L$ can be represented by a regular expression.

Theorem: Regular languages are closed under $\cdot$, $^*$, $\cup$, $\cap$, $^C$

## Non-Regular Languages §

Lemma: $L = \{0^n 1^n\}$ is not regular

## Regular Grammars §

$G = \{N, \Sigma, R, S\}$

• $N$: non-terminal symbols
• $\Sigma$: terminal symbols
• $R$: rules
• $S$: start symbol

rules for regular grammars:

• $T \rightarrow a$
• $T_1 \rightarrow a T_2$

non-terminals: states. terminals: transitions.

Theorem: the regular grammars define exactly the regular languages.

def regular_grammar_to_fsa(G):
for n in non_terminals:
create state for n
let S correspond to the start state

create accepting state #
for rule A → wB:
add transition from state A to state B
for rule A → w:
add transition from state A to state #
for rule A → ϵ:
mark A as an accepting state

• a countable union of finite sets is countable
• a countable union of countable sets is countable (cantor’s diagonalization)

Theorem: if $\Sigma$ is a finite alphabet, then $\Sigma^*$ (strings of finite length) is countable.

## Context-Free Grammars §

ambiguity: more than one parse tree for the same string

• some languages are inherently ambiguous
• no algorithm for detecting ambiguity in all cases
• no algorithm for removing ambiguity in all cases

heuristics for removing ambiguity:

• eliminate $\varepsilon$ rules
• eliminate symmetric recursive rules
• eliminate ambiguous attachments

chomsky normal form: rules have one of the following forms:

• $T \rightarrow \alpha$ where $\alpha$ is a terminal
• $T \rightarrow T_1 T_2$ where $T_1$ and $T_2$ are non-terminals

converting to chomsky normal form:

1. remove $\varepsilon$ rules
2. remove unit rules (e.g. $A \rightarrow B$, where $B$ is not a terminal)
3. remove mixed rules (e.g. $A \rightarrow aB$ or $A \rightarrow BaC$)
4. remove long rules (e.g. $A \rightarrow B_1 \cdots B_n$ where $n > 2$)

note: chomsky normal form may be ambiguous

## Push-down Automata §

Deterministic PDAs are not equivalent to non-deterministic PDAs.

• for every cfg there exists an equivalent pda
• for every pda there exists an equivalent cfg
• pdas and cfgs describe the same languages

restricted normal form:

• $M$ has a new start state $S^\#$
• $M$ has a single accepting state $A^\#$
• every transition of $M$ (except from $S^\#$) pops exactly on symbol off the stack