Lecture 24 (Sherriff) - String Processing

Lecture Date: Wednesday, October 19

Let's start by finishing up our todo list program (completed code is below).

We are going to look closer at how to parse text and look for the information you want after you download/open a file. Often, text is messy - it isn't nicely laid out like a CSV file where each data point is separated cleanly from the next. Sometimes you have to figure out ways to hunt through a lot of information to pull out just the one nugget you want.

Let's look through the string API to see what we can find!

Python str API - https://docs.python.org/3.5/library/stdtypes.html#text-sequence-type-str

Python string API - https://docs.python.org/3.5/library/string.html

Functions to know:

  • startswith()
  • endswith()
  • strip(), rstrip(), lstrip()
  • count()
  • find(), rfind()
  • index(), rindex()
  • join()
  • replace()
  • split()

Let's look at "Alice In Wonderland":

import urllib.request

url = "http://cs1110.cs.virginia.edu/alice.txt"

stream = urllib.request.urlopen(url)
for line in stream:
    decoded = line.decode("UTF-8").strip()
    if "Alice" in decoded:
        print(decoded)

What if we wanted to find an email address?

text = '<a href="mailto:sherriff@virginia.edu">Email Me!</a>'

at_sign = text.index('@')
colon = text.index(":")
end_quote = text.index('"', at_sign)

print(text[colon+1:end_quote])

Complete todo list program:

todo_list = []


def read_todo_list():
   datafile = open("todo_list.txt", "r")
   for line in datafile:
       todo_list.append(line.strip())


def add_to_list(item):
   todo_list.append(item)


def write_todo_list_file():
   datafile = open("todo_list.txt", "w")
   for item in todo_list:
       datafile.write(item)
       datafile.write("\n")
   datafile.close()


def print_todo_list():
    print()
    print()
    print("Your TODO List")
    print("--------------")
    for i in range(len(todo_list)):
        print(str(i) + ") " + todo_list[i])
    print()



def main():
    done = False
    read_todo_list()

    while not done:
        print_todo_list()
        print("Select an item to remove it, A to add a new item, Q to quit")
        choice = input("Choice?: ")
        if choice.isdigit():
            del todo_list[int(choice)]
        elif choice == 'A':
            new_item = input("New item?: ")
            add_to_list(new_item)
        elif choice == 'Q':
            write_todo_list_file()
            done = True


main()

more ...

Lecture 23 (Sherriff) - Writing Files

Lecture Date: Monday, October 17

We'll start by finishing up our debate analysis code.

Reading files is great, but what if we want to write stuff to disk? How can we do this... and not blow up our computers? Consider what would happen if we created an infinite loop....

Writing a file is very similar to calling print(), but we use the function my_file.write() instead. Other than that, it basically works the same.

Can we get information about the file/folder structure on the disk? Yes!

import os

print(os.listdir(os.getcwd()))

total_size = 0
for filename in os.listdir(os.getcwd()):
      total_size = total_size + os.path.getsize(os.path.join(os.getcwd(),filename))

print(total_size)

output_file = open("output_file.txt", "w")

output_file.write(str(os.listdir(os.getcwd())))
output_file.write('\n')
output_file.write(str(total_size))
output_file.close()

Completed debate program:

import string

def load_debate(filename):
   # read the debate file
   # return a list of lists where each item is one line from the file
   datafile = open(filename, "r")

   list_of_lines = []

   for line in datafile:
       new_line = line.strip().split(";")
       list_of_lines.append(new_line)

   return list_of_lines


def find_interruptions(debate):

   interruptions = {}

   for line in debate:
       if "..." in line[2]:
           if line[1] in interruptions:
               interruptions[line[1]] += 1
           else:
               interruptions[line[1]] = 1

   return interruptions

def word_finder(debate, speaker, word_to_find):

   count = 0
   for line in debate:
       if line[1] == speaker:
           if word_to_find in line[2].lower():
               full_line = line[2].lower()
               no_punctuation_line = ""
               for char in full_line:
                   if char not in string.punctuation:
                       no_punctuation_line += char
               no_punctuation_line_list = no_punctuation_line.split()
               for word in no_punctuation_line_list:
                   if word == word_to_find:
                       count += 1

   return count


debate = load_debate("first_debate.csv")
inter = find_interruptions(debate)
print(inter)
print("Trump Wrongs:", word_finder(speaker="Trump", debate=debate, word_to_find="wrong"))
print("Clinton Wrongs:", word_finder(debate, "Clinton", "american"))
print("Holt Wrongs:", word_finder(debate, "Holt", "please"))

Todo List program (up through the end of class):

todo_list = []

def read_todo_list():
   datafile = open("todo_list.txt", "r")
   for line in datafile:
       todo_list.append(line.strip())

   return todo_list

def add_to_list(item):
   todo_list.append(item)

def write_todo_list_file():
   # 1 - open the file
   # 2 - loop over the list
   # 3 - close

   datafile = open("todo_list.txt", "w")
   for item in todo_list:
       datafile.write(item)
       datafile.write("\n")
   datafile.close()


   return

print(read_todo_list())
add_to_list('Pet a goat')
write_todo_list_file()

Shopping List program (we didn't get to this, but it is here as an example):

shopping_list = []

# Read the file into the list
datafile = open("shopping_list.txt", "r")

for line in datafile:
    line = line.strip()
    shopping_list.append(line)
datafile.close()

print("Your current shopping list is:")
for item in shopping_list:
    print(item)

print()
while True:
    item_to_add = input("Items to add (NONE to stop): ")
    if item_to_add.upper() == "NONE":
        break
    shopping_list.append(item_to_add)

print()
while True:
    item_to_remove = input("Items to remove (NONE to stop): ")
    if item_to_remove.upper() == "NONE":
        break
    shopping_list.remove(item_to_remove)

print()
print("Your current shopping list is:")
for item in shopping_list:
    print(item)

print()
print("Saving to shopping_list.txt...")
datafile = open("shopping_list.txt", "w")

for item in shopping_list:
    datafile.write(item + "\n")

more ...

Lecture 22 (Sherriff) - Files

Lecture Date: Friday, October 14

We're going to keep looking at files today! We're going to work with a couple large data sets and see if we can write some interesting programs.

Download these data sets and put them into your PyCharm project directory:

Here are some other datasets! http://introcs.cs.princeton.edu/java/data/ and https://github.com/fivethirtyeight/data

If we get to it today, we'll look at reading datasets directly from the web, like our weather data: http://www.wunderground.com/history/airport/KCHO/2016/10/14/DailyHistory.html?format=1

Reading the debate:

def load_debate(filename):

    lines = []
    datafile = open(filename,"r")
    for line in datafile:
        lines.append(line.strip().split(";"))

    return lines


transcript = load_debate("first_debate.csv")

for line in transcript:
    if line[1] == 'Clinton':
        if '...' in line[2]:
            print(line[2])

More debate reading (code not final - will finish next class):

def load_debate(filename):
   # read the debate file
   # return a list of lists where each item is one line from the file
   datafile = open(filename, "r")

   list_of_lines = []

   for line in datafile:
       new_line = line.strip().split(";")
       list_of_lines.append(new_line)

   return list_of_lines


def find_interruptions(debate):

   interruptions = {}

   for line in debate:
       if "..." in line[2]:
           if line[1] in interruptions:
               interruptions[line[1]] += 1
           else:
               interruptions[line[1]] = 1

   return interruptions

def wrong_finder(debate, speaker):

   count = 0
   for line in debate:
       if line[1] == speaker:
           if "wrong" in line[2].lower():
               for word in line[2].lower().split():

                   if word == "wrong":
                       count += 1


   return count


debate = load_debate("first_debate.csv")
inter = find_interruptions(debate)
print(inter)
print("Trump Wrongs:", wrong_finder(debate, "Trump"))
print("Clinton Wrongs:", wrong_finder(debate, "Clinton"))
print("Holt Wrongs:", wrong_finder(debate, "Holt"))

Looking for misspellings:

def load_spelling_file(filename):
    correct_spelling = {}
    misspelling = {}

    datafile = open(filename, "r")
    for line in datafile:
        line = line.split(",")
        correct_spelling[line[1].strip()] = line[0]
        misspelling[line[0]] = line[1].strip()

    return correct_spelling, misspelling

correct_spelling, misspelling = load_spelling_file("misspellings.csv")
done = False
while not done:
    word = input("Please enter a word (END to quit): ")
    if word == "END":
        done = True
        break
    if word in correct_spelling:
        print("The word '", word, "' is spelled correctly!")
    elif word in misspelling:
        print("I think that '", word, "' is spelled", misspelling[word])
    else:
        print("I don't know that word.  Sorry!")

Reading from the web:

import urllib.request

link = input ( 'Web page: ' )

stream = urllib.request.urlopen( link )

for line in stream:
    decoded = line.decode("UTF-8")
    print(decoded.strip())

Building a URL:

import urllib.request
year = "2012"
month = "03"
day = "17"
url = "http://www.wunderground.com/history/airport/KCHO/" + year + "/" + month +"/" + day + "/DailyHistory.html?format=1"

stream = urllib.request.urlopen(url)
for line in stream:
    decoded = line.decode("UTF-8").strip().split(",")
    print(decoded)

more ...

Lecture 21 (Sherriff) - Functions and Files 2

Lecture Date: Wednesday, October 12

Up to this point, you have been getting all of your input for your programs from the user from the keyboard. This is great, but what if you wanted to read in hundreds, or even thousands, of data points to run your program against. To say it would get tedious is probably an understatement.

For the next few lectures, we'll learn how to read files as input instead of just the keyboard.

First, though, we'll make our own very simple data set. Create a new file in a project called names.csv. In that file, put the names of the five or six people sitting around you.

Then try out this code:

def read_list_of_names(filename):
    names = []
    datafile = open(filename, "r")

    for line in datafile:
        line = line.strip()
        names.append(line)
    datafile.close()

    return names

print(read_list_of_names("names.txt"))

In this example, we have written a function that will read a file based upon a filename we provide. Let's make it even more generic:

filename = input("What file would you like to read?")
print(read_list_of_names(filename))

Now, we can read any file of names that a user provides!

So, why did we write a function for this? We want to 1) be able to read any file, 2) remove that code from the main part of the program so it's easier read, and 3) this makes it way easier to test.

Where else can we get data? Let's look at some weather data! Weather Underground Historical Weather

Some files:

And, if we're really brave:

More example code for today:

# input: name of file to read
# output: list of all names in the file
def read_name_list(file_name):
    names = []
    name_file = open(file_name, "r")

    for line in name_file:
        line = line.strip()
        names.append(line)

    return names

# input: name of file to read
# output: list of temperatures in the file
def read_temperature_data(file_name):
    temperatures = []
    data_file = open(file_name, "r")
    burn_line = True

    for line in data_file:
        if burn_line:
            burn_line = False
            continue
        entry = line.split(",")
        temperatures.append(float(entry[1]))

    return temperatures

# input: file name of weather data to read
# output: average temp (float), high temp, low temp
def statistics(file_name):
    temperatures = read_temperature_data(file_name)

    length = len(temperatures)
    total = sum(temperatures)

    return total / length, max(temperatures), min(temperatures)


print(statistics("weather.csv"))

Even more code:

# Open the file in read mode
datafile = open("cville_weather_sept15.csv", "r")

# Burn the column header line
datafile.readline()

list_of_temps = []

# for each line in the file, READ IT!
for line in datafile:
   new_line = line.strip().split(",")
   list_of_temps.append(int(new_line[1]))

print(sum(list_of_temps)/len(list_of_temps))

And yet, even more code:

def fav_cartoon(filename):
   datafile = open(filename, "r")

   cartoon_dict = {}

   datafile.readline()

   for line in datafile:
       print(line.strip().split(","))
       split_line = line.strip().split(",")
       if split_line[1] in cartoon_dict:
           cartoon_dict[split_line[1]] += 1
       else:
           cartoon_dict[split_line[1]] = 1
   return cartoon_dict

to_open = input("filename: ")
print(fav_cartoon(to_open))

more ...

Lecture 20 (Sherriff) - Functions and Files

Lecture Date: Monday, October 10

We will continue talking about functions today. We will write a few new functions and cover some specific nuances of function usage:

  • calling a function from another function
  • local and global variables
  • variable name collision
  • keyword parameters
  • global variables vs. global constants
  • how to test a function
  • returning multiple values

Let's consider some different types of functions:

  • Functions might or might not return a value: All functions return - that's what happens when they reach the end or finish by using the keyword return. The difference is some bring back to the place where they were called some value of something. Others don't - we call these void functions.
  • Some functions have effects, even if they are void: Consider print(text). This function is technically void - it does not return anything. However, it does print stuff to the screen. A function can have an effect on the system even if it does not return a value. We will see this more when we consider how parameters are passed by value or passed by reference.
  • You can call functions from other files: The point of a library is to provide a set of functionality with functions that you can use in your program. You can then import that library in to a bunch of different programs. It's completely reusable.

When you invoke a function, the following happens in order:

  1. A new piece of memory (called a "stack frame" or "activation record") is created.
  2. The parameters of the function are created in that new memory.
  3. The parameters you pass in to the function invocation get copied into the new memory's parameters in order.
  4. We note where we left off running the old function
  5. We stack the new memory on top of the old memory, covering up the old memory completely
  6. We start running the code in the new function's body

When you return from a function, either with the return keyword or by reaching the end of the body, the new memory is removed ("popped") from the stack, leaving us with the calling function's memory.

Let's watch and see what happens at the Python Tutor Visualizer! - http://www.pythontutor.com/visualize.html#mode=edit

Some example code for today:

import random

def who_wins(team_1, team_2):
    winner_choice = random.randint(0,1)
    if winner_choice == 0:
        return team_1
    else:
        return team_2

team_1 = input("Please enter Team 1: ")
team_2 = input("Please enter Team 2: ")

team_1_wins = 0
team_2_wins = 0

for i in range(10000):
    winner = who_wins(team_1, team_2)
    if winner == team_1:
        team_1_wins += 1
    else:
        team_2_wins += 1

if team_1_wins > team_2_wins:
    print(team_1 + " is the overall winner!  (" + str(team_1_wins) + " to " + str(team_2_wins) + ")")
elif team_2_wins > team_1_wins:
    print(team_2 + " is the overall winner!  (" + str(team_2_wins) + " to " + str(team_1_wins) + ")")
else:
    print("It's a tie!  (" + str(team_2_wins) + " to " + str(team_1_wins) + ")")

Example code with optional/named parameters:

def my_function(name="Mark", school="UVa"):
    print(name + " goes to " + school)

my_function()
my_function("Steve")
my_function(school="VT")
my_function("Ann", "GMU")
my_function(school="GMU", name="Ann")

more ...

Lecture 19 (Sherriff) - Functions

Lecture Date: Friday, October 7

Since the first day we began coding in this class, we have been using (or "calling") methods and functions to do various things:

def draw_square(t, x, y):
    t.penup()
    t.goto(x,y)
    t.pendown()
    rand_color = random.randint(0,len(colors)-1)
    t.color(colors[rand_color])
    for i in range(4):
        t.forward(100)
        t.left(90)

Everything with ()s after it are either methods or functions. Methods are, in effect, verbs in our programs. They tell objects to do things, change their state somehow, or provide us with information. In the code above, we are telling the t to do some specific tasks. Each one of those tasks is a method. A function is a bit of code that we separate out to do a particular task in a very similar way to a method, but the difference is it is not directly tied to an object and called with a period. def draw_square() above is a function. We are going to focus on functions in this class.

The things inside the parentheses are called parameters or arguments. Parameters provide extra information to the method to tell it how to perform a method or what data it should act on. So, when we tell t to do forward(100), we are saying that 100 is important information in order for the method to execute properly.

Methods form the basis of many programs for several reasons:

  • Code reuse: Often you'll have a bit of code you want to use over and over. Methods make that happen in a very nice way, while also allowing for modification with parameters.
  • Organization: It's a lot easier to see what's going on when you read code when you have a set of methods saying what they do! Reading forward() and backward() in the Turtle code is completely understandable - having the code to do those things might not be. It's an added layer of abstraction.
  • Testing: Once you verify a function is correct and working, you can move on to other parts of your system.

Another thing to know about functions is how parameters are passed into them.

  • Pass by Value: This is how non-mutable types are passed (or given) to a function. A full copy of the variable (value and everything) is sent to the function. Any changes made to the variable passed in ARE NOT REFLECTED back where the function was called.
  • Pass by Reference: This is how mutable types are passed to a method. A pointer (think "road sign"), which is a copy of the memory address of the object, is sent to the function. Any changes made to the variable passed in ARE REFLECTED back where the function was called.

Here an example of how pass by value and pass by reference works:

my_list = ['a', 'b', 'c', 'd']
my_value = 11

def change_a_value(some_value):
    print("Inside change_a_value()")
    print("   some_value starts as:", some_value)
    some_value *=2
    print("   some_value now is:", some_value)

def change_a_ref(some_list):
    print("Inside change_a_ref()")
    print("   some_list starts as:", some_list)
    some_list.append('x')
    print("   some_list now is:", some_list)

print("Starting the program!")
print("my_list starts as:", my_list)
change_a_ref(my_list)
print("my_list now is:", my_list)
print("my_value starts as:", my_value)
change_a_value(my_value)
print("my_value now is still:", my_value)

11:00 Lecture -

3:00 Lecture -

more ...

Lecture 18 (Sherriff) - Encryption Chase

Lecture Date: Wednesday, October 5

Today's lecture is encrypted! Solve the puzzle up on the board / screen to figure out the proper shift value needed to decrypt the first clue. After that, follow the instructions you are given.

ytifd'x qjhyzwj, fx dtz hfs xjj, mfx gjjs jshwduyji. tsqd ymj knwxy ufwy tk ymj qjhyzwj hfs gj ijhwduyji zxnsl ymj hfjxfw hnumjw dtz ozxy zxji. yt xjj jajwdymnsl ymfy ytifd mfx yt tkkjw, dtz'qq mfaj yt zsijwyfpj f kjb yfxpx. knwxy, dtz rfd sty knsnxm jajwdymnsl izwnsl ymj qjhyzwj ynrj ytifd, fsi ymfy'x tpfd. dtz hfs it ymnx tajw ynrj, tw nk dtz hfs'y knsnxm ny, n'qq lnaj dtz ymj ijhwduynts pjdx. gzy ymfy'x st kzs. stb, vznjyqd, ufhp zu dtzw ymnslx fsi qjfaj ymj hqfxxwttr. tshj dtz fwj tzyxnij, lt yt myyux://xyfwithp.hx.anwlnsnf.jiz/qfbshmfxj ktw ymj sjcy xjy tk hqzjx.

No Audio Today

more ...

Lecture 17 (Sherriff) - Encryption Overview

Lecture Date: Friday, September 30

What is encryption?

  • Change the form of data to conceal the meaning
  • hidden text

Why do we care? Why in CS?

  • Data Security
  • Do not want people to snoop on us
  • Do want to know we are talking to a particular person

Vocabulary:

  • plaintext - what we are hiding
  • ciphertext - what it looks like after we hide it
  • encrypt - go from plaintext to ciphertext
  • decrypt - go from ciphertext to plaintext
  • cipher - a technique for encrypting/decrypting
  • key - which variant of the cipher we are using

strs:

  • s[index] -> a character at that index (starting at 0)
  • len(s) -> how many chars there are
  • s += "w" -> add a w to the end of s

Assume s is a str. What is the index of the last character in s?

Ciphers to discuss:

  • Caesar
  • Vignere
  • Route
  • Scytale
  • Line-Word-Letter

Some example code:

plaintext = input("What should we turn into oppish?: ")
plaintext = plaintext.lower()
vowels = ['a', 'e', 'i', 'o', 'u']
ciphertext = ""
for letter in plaintext:
    if letter in vowels:
        ciphertext += "op"

    ciphertext += letter

print(ciphertext)

index = 0
decrypted = ""

while index < len(ciphertext):
    if ciphertext[index] == 'o' and ciphertext[index + 1] == 'p':
        decrypted += ciphertext[index + 2]
        index += 3
    else:
        decrypted += ciphertext[index]
        index += 1

print(decrypted)

more ...