Date

Lecture Date: Friday, October 14

We're going to keep looking at files today! We're going to work with a couple large data sets and see if we can write some interesting programs.

Download these data sets and put them into your PyCharm project directory:

Here are some other datasets! http://introcs.cs.princeton.edu/java/data/ and https://github.com/fivethirtyeight/data

If we get to it today, we'll look at reading datasets directly from the web, like our weather data: http://www.wunderground.com/history/airport/KCHO/2016/10/14/DailyHistory.html?format=1

Reading the debate:

def load_debate(filename):

    lines = []
    datafile = open(filename,"r")
    for line in datafile:
        lines.append(line.strip().split(";"))

    return lines


transcript = load_debate("first_debate.csv")

for line in transcript:
    if line[1] == 'Clinton':
        if '...' in line[2]:
            print(line[2])

More debate reading (code not final - will finish next class):

def load_debate(filename):
   # read the debate file
   # return a list of lists where each item is one line from the file
   datafile = open(filename, "r")

   list_of_lines = []

   for line in datafile:
       new_line = line.strip().split(";")
       list_of_lines.append(new_line)

   return list_of_lines


def find_interruptions(debate):

   interruptions = {}

   for line in debate:
       if "..." in line[2]:
           if line[1] in interruptions:
               interruptions[line[1]] += 1
           else:
               interruptions[line[1]] = 1

   return interruptions

def wrong_finder(debate, speaker):

   count = 0
   for line in debate:
       if line[1] == speaker:
           if "wrong" in line[2].lower():
               for word in line[2].lower().split():

                   if word == "wrong":
                       count += 1


   return count


debate = load_debate("first_debate.csv")
inter = find_interruptions(debate)
print(inter)
print("Trump Wrongs:", wrong_finder(debate, "Trump"))
print("Clinton Wrongs:", wrong_finder(debate, "Clinton"))
print("Holt Wrongs:", wrong_finder(debate, "Holt"))

Looking for misspellings:

def load_spelling_file(filename):
    correct_spelling = {}
    misspelling = {}

    datafile = open(filename, "r")
    for line in datafile:
        line = line.split(",")
        correct_spelling[line[1].strip()] = line[0]
        misspelling[line[0]] = line[1].strip()

    return correct_spelling, misspelling

correct_spelling, misspelling = load_spelling_file("misspellings.csv")
done = False
while not done:
    word = input("Please enter a word (END to quit): ")
    if word == "END":
        done = True
        break
    if word in correct_spelling:
        print("The word '", word, "' is spelled correctly!")
    elif word in misspelling:
        print("I think that '", word, "' is spelled", misspelling[word])
    else:
        print("I don't know that word.  Sorry!")

Reading from the web:

import urllib.request

link = input ( 'Web page: ' )

stream = urllib.request.urlopen( link )

for line in stream:
    decoded = line.decode("UTF-8")
    print(decoded.strip())

Building a URL:

import urllib.request
year = "2012"
month = "03"
day = "17"
url = "http://www.wunderground.com/history/airport/KCHO/" + year + "/" + month +"/" + day + "/DailyHistory.html?format=1"

stream = urllib.request.urlopen(url)
for line in stream:
    decoded = line.decode("UTF-8").strip().split(",")
    print(decoded)