 CMS 1

Data Structures and Algorithms

# Terminology

• Data Structure
• Stores and organizes data so that it can be used efficiently
• Algorithm
• Procedure or formula for solving a problem
• Finite series of computational steps to produce a result
• Independent from any programming language
• Examples for basic algorithms: sorting and searching
• Brute force algorithm: performs an exhaustive search of all possible solutions
• Complexity
• Measure of the difficulty of constructing a solution to a problem
• Estimate of the number of operations needed to execute an algorithm
• Big O notation
• Asymptotic notation
• Estimates the worst case performance of data structures and algorithms
• Pseudo code
• Description of a computation written in an informal notation
• Purpose
• Describe algorithms independent of a programming language
• Use of "magic language" becomes possible
• Focus on framing a solution without worrying about syntax etc.
• Books don't have to be written multiple times
• Less work for authors
• Code examples cannot be used right away
• Syntax cannot be tested for correctness
• Implementation
• Realization of a specification or algorithm as a working computer program
• Black box
• Function, module or object which can be viewed solely in terms of input and output
• Transfer characteristics are known but internal workings are not
• Until you look at the source, everything stays a black box!

# Big O notation

• Analyzes the worst case memory usage/ runtime of algorithms
• Big O provides an upper bound
• History
• First introduced by number theorist Paul Bachmann in 1894
• Popularized in the work of number theorist Edmund Landau (Landau symbol)
• Donald Knuth popularized the notation in computer science
• Problem classes
• Constant
• Logarithmic
• Linear
• Loglinear
• Polynomial
• Exponential
• Factorial
• Orders of common problem classes
• $O(1) < O(\log(n)) < O(n) < O(n \log(n)) < O(n^2) < O(n^k) \mbox{ for } k\gt 2$
• $O(n^k) < O(c^n) \mbox{ for } c\gt 1$
• $O(c^n) < O(n!)$
• Basic rules
• $O(k*n) = O(n)$
• $O(3n + 5n^2) = O(3n) + O(5n^2) = O(n) + O(n^2) = O(n^2)$
• $O(n^2) + O(e^n) = O(e^n)$
• Counting examples
• Calculating a sum
# seq is defined as a sequence of numbers
result = sum(seq)

• Sorting
# seq is defined as a sequence of numbers
result = sort(seq)

• Shaking each other's hands
• Calculating the third power of each element in a $n \times n$ square matrix
• Solving a TSP (brute force)

# Data Structures in Python

• class ListNode(object):
"""
Simple implementation of a singly linked list node.
"""
def __init__(self, data, succ=None):
self.data = data
self.succ = succ

# populating the list
for i in range(3, 0, -1):
tail.succ = ListNode(i)
tail = tail.succ

# using the list (simply print its elements)
while list_iterator:
print(list_iterator.data)
list_iterator = list_iterator.succ

• double_linked_list.py
class ListNode(object):
"""
Simple implementation of a doubly linked list node.
"""
def __init__(self, data, pred=None, succ=None):
"""
succ: the current node's successor
pred: the current node's predecessor
"""
self.data = data
self.pred = pred
self.succ = succ

# populating the list
for i in range(9, 0, -1):
tail.succ = ListNode(i, tail)
tail = tail.succ

# iterate over the list
while list_iterator:
print(list_iterator.data)
list_iterator = list_iterator.succ

# iterate from the back
list_iterator = tail
while list_iterator:
print(list_iterator.data)
list_iterator = list_iterator.pred

• Creating a basic binary tree
• simple_tree.py
class TreeNode(object):
"""
Simple implementation of a binary tree.
"""
def __init__(self, data, left=None, right=None):
self.data = data
self.left = left
self.right = right

# populating the tree
root = TreeNode(10)
root.left = TreeNode(5)
root.right = TreeNode(30)
root.left.left = TreeNode(2)
root.left.right = TreeNode(8)
root.right.left = TreeNode(21)

def print_tree(tree):
"""
Print the elements of a tree in no specific order.
"""
if tree:
print(tree.data)
print_tree(tree.left)
print_tree(tree.right)
print_tree(root)


# Representing Graphs

• Purpose
• Graphs are useful in many OR contexts like
• Shortest and longest path problems
• Maximum network flow
• Minimum spanning trees
• ...
• Ulrich Pferschy's course on graph algorithms is highly recommended!
• Graph G
• Directed graph
• • Useful representations differ depending on the required algorithm and the number of edges in the graph
• Dedicated class
• graph.py
#!/usr/bin/env python3

"""
One of the many ways to represent a graph in Python.
"""

import collections

Vertex = collections.namedtuple("Vertex", ("id", "data"))
Edge = collections.namedtuple("Edge", ("first", "second"))

class Graph(object):
"""
This graph class uses two sets to store the vertices and edges.

However, it would also be possible to use an adjacency matrix or adjacency
list. Depending on the purpose of the graph and the algorithms using the
graph, these alternative implementations might highly benefit the runtime.
"""
def __init__(self, vertici, edges):
self.__vertici = set()
self.__edges = set()
for v in vertici:
for e in edges:

def remove_vertex(self, v):
try:
self.__vertici.remove(v)
except KeyError:
pass
to_remove = {e for e in self.__edges if v == e.first or v == e.second}
self.__edges -= to_remove

# this would be much faster with an adjacency matrix
for e in self.__edges:
if e.first == v1 and e.second == v2:
return True
if e.first == v2 and e.second == v1:
return True
return False

# many more useful methods...

v1 = Vertex('1', '')
v2 = Vertex('2', '')
v3 = Vertex('3', '')
v4 = Vertex('4', '')
v5 = Vertex('5', '')
g = Graph(tuple(), (Edge(v1, v2), Edge(v1, v3), Edge(v2, v3), Edge(v3, v3),
Edge(v3, v4), Edge(v4, v5), Edge(v5, v1)))

• $O(n^2)$ elements where $n$ is the number of vertices
• $$\left( \begin{array}{ccccc} 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 \\ \end{array} \right)$$
• Testing adjacency is $O(1)$
• Traversing the neighborhood is $O(n)$
• Edge weights can easily be represented
• Requires less memory for sparse graphs
• graph = {1: [2, 3],
2: ,
3: [3, 4],
4: ,
5: }

• Testing adjacency is $O(\text{# neighbors})$
• Traversing the neighborhood is $O(\text{# neighbors})$

1. Read through at least one of the PuLP optimization case studies. If you lack prior knowledge with regard to linear programming, also study the optimization process and optimization concepts in the PuLP documentation.
2. Do the 9th homework

# Summary

• Analyze algorithmic behavior when needed (memory or runtime issues occur)
• Beware of black boxes
• Pick a data structure that suits your problem