Computational Thinking

COMP 140

fall, freshman year

Module 1 §

Identify the objective.
Determine the inputs.
Break the problem into subproblems.
Solve the subproblems.
Use the solution to the subproblems to solve the original problem.
Write your solutions in Python.

Intersection of two lines §

$x_i = \frac{(s_1x_1)-(s_2 x_2)+(y_2 - y_1)}{s_1 - s_2}$

$y_i = s_1 (x_i - x_1) + y_1$

Module 2 §

Modular arithmetic §

$a^{-1}$ in a prime finite field $= a^{p - 2} \mod p$ when $p$ is prime

Projective Geometry §

Properties:

A point: $(x,y,z)$ except $(0,0,0)$
A line: $[a,b,c]$ except $(0,0,0)$
$k \neq 0 \and (x,y,z)=(kx,ky,kz) \implies$ They’re the same point. $k \neq 0 \and [a,b,c]=[ka,kb,kc] \implies$ They’re the same line.
$(x,y,z)$ is on $[a,b,c]$ iff $ax + by + cz = 0$

Finite field §

$\mathbb{Z}_m$ where $m$ is a prime number. All operations are performed $\mod m$ .

Division in a finite field §

$c = a \times b^{-1} \mod m$

If the modulo $m$ is prime, $b^{-1}= b^{m - 2} \mod m$

Cross product §

cross product of two points: the line that connects two points

$p \times q = \left[p_y q_z - p_z q_y, p_z q_x - p_x q_z, p_x q_y - p_y q_x\right]$

Module 3 §

`random` §

random.random(): $[0, 1)$
random.randrange([start=0], end) — exclusive
random.choice(iterable)
random.shuffle(list)

Dictionaries §

not ordered.

for key in dictionary:
   pass

for key in dictionary.keys():
    pass

for key, value in dictionary.items():
    pass

for value in dictionary.values():
    pass

#import collections
from collections import defaultdict

def fn():
    """
    Returns a value that should be used as the
    default value of keys of the defaultdict

    Inputs: none.
    """

    return 0

counts = defaultdict(fn)

Markov chains §

states. transitions with probabilities. discrete time.

Markov property: doesn’t matter how you go to any state. probabilities of the next transition would remain the same regardless.

Empirically calculate transition probabilities.

Predicc.

Markov chains with memory §

$n$ th order markov models. $n$ states of history.

$n=0$ : no memory

$n=1$ : one state of memory

$\dots$

encoding, in a single state, the history

first_order = {
    ('A',): { 'A': 0.2, 'B': 0.5, 'C': 0.3 },
    ('B',): { 'A': 0.1, 'B': 0, 'C': 0.9 },
    ('C',): { 'A': 0.4, 'B': 0.5, 'C': 0.1 }
}

second_order = {
    ('A', 'A'): { 'A': 0.4, 'B': 0.6 },
    ('A', 'B'): { 'A': 0.2, 'B': 0.8},
    ('B', 'A'): { 'A': 1, 'B': 0 },
    ('B', 'B'): { 'A': 0.3, 'B': 0.7 }
}

Experimental method §

Data:

train dataset
test dataset

randomness.

mean squared error: $\frac{1}{n} \sum^n_{i=1}\left(P_i - A_i\right)^2$ where $P$ is the vector of predictions, $A$ is the vector of actual values

`simpleplot` §

Data format:

# doesn't allow repeated x coordinates
wages = {2000: 200, 2001: 200, 2002: 250, 2003: 300}

# allows repeated x coordinates
wages = [(2000, 200), (2001, 200), (2002: 250), (2003: 300)]

`plot_lines` §

plot_lines(title, width, height, x_label, y_label, datasets, show_points, [dataset_labels])

datasets must be a list

`plot_bars` §

plot_bars(title, width, height, x_label, y_label, datasets, [dataset_labels])

List comprehensions §

[x ** 2 for x in range(n)]

`zip` and `enumerate` §

zip take iterables and returns an iterator over tuples of their $i$ th values.

for a_i, b_i in zip(a, b):

zip([1, 2, 3], [1, 2, 3])
# [(1, 1), (2, 2), (3, 3)]

zip([1, 2, 3], ['a', 'b', 'c'], ['i', 'ii', 'iii'])
# [(1, 'a', 'i'), (2, 'b', 'ii'), (3, 'c', 'iii')]

zip([1, 2, 3, 4], [1, 2, 3])
# [(1, 1), (2, 2), (3, 3)]

enumerate([5, 4, 3])
# [(0, 5), (1, 4), (2, 3)]

enumerate([5, 4, 3], 2)
# [(2, 5), (3, 4), (4, 3)]

Module 4 §

OOP §

class Foo:
    """
    """

    def __init__(self, a, b):
        self.x = a
        self.y = b
        # return nothing

    def fn(self):
        pass

    def bar(self):
        pass

f = Foo(1, 2)
f.fn()
f.x
f.y

Graphs §

$G = (V, E)$

$V$ : non-empty set of vertices

$E$ : set of pairs of vertices (aka. edges)

degree of a vertex is the number of edges connected to it.

deleting some vertices and/or edges leaves a subgraph

$G' = (V',E')$ is a subgraph of $G$ iff:

$V' \subset V$ and $V'$ is non-empty
$E' \subset E$ that only contains edges connecting vertices in $V'$ (not necessarily all of those edges)

Types §

directed graphs: edges have direction: edges are ordered pairs of vertices.

indegree of a vertex: # of edges connecting to the vertex
outdegree of a vertex: # of edges connecting from the vertex

weighted graphs: edges have weights (numbers) associated. weight usually represents “cost”, but may also represent importance etc.

nodes and edges may have attributes. node/edge metadata.

Properties §

connectivity: are all nodes connected?

how many isolated components? how large are those components.

trickier def when dealing with directed graphs.
path: sequences of edges connecting a set of vertices between a starting vertex and an ending vertex.

length: number of edges.

must follow the direction (if applicable).

may have 0 edges (path from $x$ to $x$ )

may repeat edges.
distance: length of the shortest path(s)

distance between unconnected nodes is $\infty$
diameter: longest shortest path between any two nodes

Representation §

$V = \{1, 2, 3, 4\}$

$E = \{(1, 2), (2, 4), (4, 1)\}$

For undirected graphs you could just rep two edges: $(x, y)$ and $(y, x)$

Adjacency matrix §

square matrix. $i$ th row and $i$ th column corresponding to $i$ th vertex.

$a_{ij} = \begin{cases} 1 & (i,j) \in E\\ 0 & (i,j) \not\in E \end{cases}$

elements could also be weights. if undirected, use as upper/lower matrix perhaps.

fine if the number of vertices is small compared to the number of edges. else, space inefficient.

Adjacency list §

map.

keys: nodes
- values: lists of nodes the key is connected to

graph = {
    'A': set(['B']), # *from* A *to* B
    'B': set(['A', 'C']), # *from* B *to* A and C
    'C': set([]) # *from* C *to* nothing
}

directed by default.

Sets §

unordered
no dupes

s = set([3, 4, 5, 1, 9])

s.add(element)
s.remove(element)

KeyError when element not in s
s.discard(element)

doesn’t error when element not in s
s.pop()
s.union(b)

doesn’t mutate. returns a new set.
s.intersection(b)
s.difference(b)
s.symmetric_difference(b)
s.intersection_update(b)

mutates.

Stacks and Queues §

Stacks §

Last-in first-out

PUSH
POP

Queues §

First-in first-out

PUSH
POP

BFS §

“breadth-first”

node_queue = Queue()
    distance = {}
    parent = {}

    for node in graph.get_nodes():
        distance[node] = None # None: the node cannot be reached
        parent[node] = None

    node_queue.push(start_node)
    distance[start_node] = 0

    while not node_queue.empty():
        current_node = node_queue.pop()

        for neighbor_node in graph.get_neighbors(current_node):
            if distance[neighbor_node] == None:
                distance[neighbor_node] = distance[current_node] + 1
                parent[neighbor_node] = current_node
                node_queue.push(neighbor_node)

    return distance, parent

Module 5 §

QR Codes §

in: message

encode message as binary (ascii-like)
Reed-Solomon ECC
Order codewords (interleave)
Place data on a QR grid
Mask data
Add format and version info

out: qr code

Error detection and correction §

Parity bit: additional bit.

$P = \begin{cases}1 & \textrm{odd number of ones} \\ 0 & \textrm{even number of ones} \end{cases}$

can’t detect where or what the errors are.

Hamming codes

$H_0 = P(D_0, D_1, D_3)$

$H_1 = P(D_0, D_2, D_3)$

$H_2 = P(D_1, D_2, D_3)$

Hamming distance: “how many bits are different?”

Hamming codes can:

detect and correct 1-bit errors
detect 2-bit errors

Copying §

Be careful when creating copies of objects with nested objects. You usually want a deep copy (as opposed to a shallow copy).

$Z_{256}$ §

Reed Solomon §

$n$ is the number of message bytes

$k$ is the number of EC bytes required

$m = \sum_{i = 0}^{n - 1} M_i x^{k+n-1-i}$ in $Z_{256}$

where $M = m_0 m_1 \cdots m_{n - 1}$

$g = \prod_{i = 0}^{k - 1}(x - 2^i)$ in $Z_{256}$

Programming Principles §

think
don’t program by coincidence
DRY
make it to reuse
avoid unrelated side-effects
refactor {early,often}
don’t live with broken windows
don’t panic :p

Module 6 §

Reading files §

from urllib import request
import codeskulptor

url = codeskulptor.file2url("examples_files_dracula.txt")
netfile = request.urlopen(url)

strdata = netfile.read().decode('utf-8')
print(strdata)

for line in netfile.readlines():
    strline = line.decode('utf-8')
    print(strline[:-1]) # strip extra newline

`numeric` §

import numeric

a = numeric.Matrix([
    [1, 2, 3],
    [4, 5, 6],
])

operator overloading: +, -, @ (matrix multiplication)
matrix.transpose()
matrix.inverse()
numeric.Matrix(ll)
numeric.identity(n)

Linear Algebra §

$AX = B \implies X = A^{-1} B$

$(A+B)^T = A^T + B^T$

$(AB)^T = B^T A^T$

Module 7 §

DFS §

Take BFS, s/queue/stack/g, s/breadth/depth/g
Recursive DFS

A* search §

$g$
$h$
$f = g + h$
“open” set
“closed” set

Inheritance §

class Base:
    pass

class Derived(Base):
    pass