# Automata, Formal Languages, and Computability

# COMP 481

## Finite State Automata §

finite state machines.

the language of a fsa $M$ is the set of strings that are accepted by $M$

$L(M) = \{x \in \Sigma^* | M \textrm{ accepts } x\}$

DFSAs can’t count.

### Deterministic Finite State Automata §

### Non-Deterministic Finite State Automata §

no more powerful than DFSA. can always be converted to an equivalent DFSA.

- same symbol may transition from one state to several different states
- $\epsilon$-transitions are permitted

accept if at least one path accepts

$M = \{Q, \Sigma, \delta, q_0, F\}$

- $\Sigma$: symbols
- $Q$: states
- $q_0$: initial state
- $F \subseteq Q$: accepting states
- $\sigma : Q \times \Sigma \rightarrow 2^Q$
- transitions need not be unique

simulate NDFSA by BFS’ing

### Minimal Machines §

DFSA that accepts a language $L$ with the fewest possible states. minimal DFSAs are unique.

“states are equivalence classes”

if $M$ is a DFSA that accepts $L$, then the number of states in $M$ is greater than or equal to the number of equivalence classes of $L$.

**Theorem**: if the number of equivalence classes of $L$ is finite, then there exists a DFSA $M$ that accepts $L$, such that the number of states of $M$ equals the number of equivalence classes of $L$. Therefore, $M$ is minimal.

**Theorem**: the minimal DFSA for a language $L$ is unique (up to state names).

**Myhill-Nerode Theorem**: There exists a DFSA $M$ that accepts a language $L$ if and only if the number of equivalence classes of $L$ is finite.

## Regular Languages §

Kleene closure:

- $\epsilon \in L^*$
- $\omega \in L^*$ if $\omega \in L$
- $\omega \omega \in L^*$ if $\omega \in L^*$ and $v \in L^*$

Alternatively, $L^* = \{x_1 x_2 \dots x_n \mid x_k \in L, n \geq 0\}$

Regular expressions:

- $\varnothing$, $\epsilon$, $a$ (where $a \in \Sigma$)
- $(r)$, $r \cup s$, $rs$, $r^*$ where $r, s$ are regular expressions on $\Sigma$

Properties:

- $\varnothing^* = \epsilon = \epsilon^*$
- $r\varnothing = \varnothing = \varnothing r$
- $r\epsilon = r = \epsilon r$
- $rs \neq sr$
- $r \cup (st) \neq (r \cup s)(r \cup t)$
- $\left(r \cup s\right)^* \neq r^* \cup s^*$
- $(rs)^* \neq r^* s^*$

**Theorem**: A language $L$ can be recognized by a DFA if and only if $L$ can be represented by a regular expression.

**Theorem**: Regular languages are closed under $\cdot$, $^*$, $\cup$, $\cap$, $^C$

## Non-Regular Languages §

**Lemma**: $L = \{0^n 1^n\}$ is not regular

## Regular Grammars §

$G = \{N, \Sigma, R, S\}$

- $N$: non-terminal symbols
- $\Sigma$: terminal symbols
- $R$: rules
- $S$: start symbol

rules for regular grammars:

- $T \rightarrow a$
- $T_1 \rightarrow a T_2$

non-terminals: states. terminals: transitions.

**Theorem**: the regular grammars define exactly the regular languages.

```
def regular_grammar_to_fsa(G):
for n in non_terminals:
create state for n
let S correspond to the start state
create accepting state #
for rule A → wB:
add transition from state A to state B
for rule A → w:
add transition from state A to state #
for rule A → ϵ:
mark A as an accepting state
```

- a countable union of finite sets is countable
- a countable union of countable sets is countable (cantor’s diagonalization)

**Theorem**: if $\Sigma$ is a finite alphabet, then $\Sigma^*$ (strings of finite length) is countable.

## Context-Free Grammars §

**ambiguity**: more than one parse tree for the same string

- some languages are inherently ambiguous
- no algorithm for detecting ambiguity in all cases
- no algorithm for removing ambiguity in all cases

heuristics for removing ambiguity:

- eliminate $\varepsilon$ rules
- eliminate symmetric recursive rules
- eliminate ambiguous attachments

**chomsky normal form**: rules have one of the following forms:

- $T \rightarrow \alpha$ where $\alpha$ is a terminal
- $T \rightarrow T_1 T_2$ where $T_1$ and $T_2$ are non-terminals

converting to chomsky normal form:

- remove $\varepsilon$ rules
- remove unit rules (e.g. $A \rightarrow B$, where $B$ is not a terminal)
- remove mixed rules (e.g. $A \rightarrow aB$ or $A \rightarrow BaC$)
- remove long rules (e.g. $A \rightarrow B_1 \cdots B_n$ where $n > 2$)

note: chomsky normal form may be ambiguous

## Push-down Automata §

Deterministic PDAs are **not** equivalent to non-deterministic PDAs.

- for every cfg there exists an equivalent pda
- for every pda there exists an equivalent cfg
- pdas and cfgs describe the same languages

**restricted normal form**:

- $M$ has a new start state $S^\#$
- $M$ has a single accepting state $A^\#$
- every transition of $M$ (except from $S^\#$) pops exactly on symbol off the stack