4th Homework

This homework is to be prepared in teams of two students. Find your partner on http://moodle.uni-graz.at/. Download hw_4.tar.gz and extract it. Then add your homework solutions to the files contained in the directory. Rename the directory according to the rules in the syllabus before submitting it as compressed archive. Note that the grader requires the package python3-numpy to be installed. Don't forget to add the correct subject to the email when submitting.

  1. Telephone numbers with letters
    In some countries it is a common practice to encode selected telephone numbers as text because they are easier to memorize that way. The letters of the text signal which button to press on a telephone keypad. See the image below as a reference:
    Source: Marnanel (via Wikimedia Commons)
    Implement a function as_numeric(text) that returns a string containing only the numbers that correspond to the input text. Using the function in a python3 shell should look like this:
    >>> as_numeric('0800 reimann')
    '0800 7346266'
    
    Hint: using a Python dictionary to store the translation table facilitates this task.
    Name the program file: telephone_numbers.py
  2. Working on existing programs (2 points)
    Your lecturer just started riding the fake news wave. In order to illustrate how much fake new is out there, he wrote a little fake news generator. However, the generator is far from perfect.
    In order for the generator to work you need to install the faker and wikipedia packages using
    sudo pip3 install ...
    The little script deliberately makes use of a selection of libraries to illustrate the power of Python. At the same time the example illustrates that you do not have to understand every line of a script in order to improve it. Start by playing around with the fake news generator from the commandline: python3 fake_news_generator.py -h
    Your lecturer needs your help to create messages that are more credible. At the moment, the messages are given credit by adding a "source". Let's assume female sources are more credible. Find the line that adds the name of the the source and adjust it to only use female names (check the documentation of the faker module if needed).
    Furthermore, the module can use an article from Wikipedia as source for the list of words that make up the fake news. It also contains a function to remove non-word characters from wordlists. However, this function is currently not applied. Make sure it is applied, but only for the Wikipedia articles - not for the carefully handpicked tweets which serve as default inputs.
    Name the program file: fake_news_generator.py
  3. Basic statistics (2 points)
    Write a few functions that compute basic statistics from given financial data stored in CSV files. The input files have to have column headers in their first row. As you'll have to be able to deal with bigger amounts of data, it cannot be guaranteed that all of the data can fit your computer's memory. To help you out with this situation, you can use the provided function items(.) yielding one row of the data after the other when being iterated over. The rows are yielded as dictionaries using the first row as keys. The provided count(.) function gives an idea on how to use the items(.) generator function. Doing this correctly for find_median(.) is a bonus challenge. If you do not manage to implement this under the memory constraints just implement it ignoring them. You need to create a series of functions that compute the required values:
    calc_mean(.)
    calc_stddev(.)
    calc_sum(.)
    calc_variance(.)
    calc_median(.)
    Name the program file: statistics.py
  4. Counting unique words in a file (4 points)
    In Prof. Rauch's 'Information Science' course, one of the task is to count how many times each word in an article occurs. To alleviate checking if you performed such a task correctly, write a Python program that does the work for you. You don't have to write the entire program from scratch. Instead, use the provided count_unique.py file and implement count_unique(words).
    Also implement count_unique_sorted(words) that returns a list of named tuples. The first element of each named tuple must be 'word' and the second 'count'. The list has to contain the tuples in the same order as the words occur in the input file.
    Name the program file: unique_words.py