Date

Due: Friday, October 21, 11:00 AM

Write a program called polling.py that will read 2016 presidential election polling data from a website and do some processing on it.

The Huffington Post provides an online API for developers who want to use publicly-available polling data to do their own analyses. The API can be found at http://elections.huffingtonpost.com/pollster/api.

A set of polls for a given state can be pulled from the site via a specific URL. For instance, to get all of the data for Virginia, you would go to http://elections.huffingtonpost.com/pollster/api/charts/2016-virginia-president-trump-vs-clinton.csv. For North Carolina, you would replace virginia with north-carolina in the link above, and so on for all fifty states.

The data is in .csv format, so each column is separated by a comma. The fields are (in order):

  • Trump
  • Clinton
  • Other
  • Undecided
  • poll_id
  • pollster
  • start_date
  • end_date
  • sample_subpopulation
  • sample_size
  • mode
  • partisanship
  • partisan_affiliation

We are only going to work with the Trump percentage, Clinton percentage, and end_date fields for this assignment. So, fields 0, 1, and 7 respectively. We will only test/grade your POTD with states that have this exact format listed above (some states may lack some of the fields listed above).

UPDATE: Since this POTD has been published, an additional field was added to the Virginia data set - Johnson, right after Clinton. As we are already testing with this exact set of fields, we will continue to do so so that everyone is on equal footing. You should still use fields 0, 1, and 7 as noted above. Some other states that use the format above (currently) include Wisconsin, Maryland, Florida, and South Carolina. We will ONLY test your code with states where the data we care about is in fields 0, 1, and 7.

For this POTD, you need to write the three functions outlined below that will allow you to read and analyze the data:

1. load_state_polls(state): Given a string representation of a state like we would use above in the URL, open the URL, read the data, and return a list of lists in which each poll is a list consisting of the data items mentioned above.

You can read the entire csv line by line to make it easier to search. Example code is below. Consider how you might dynamically create that URL based upon the state that is passed into the function.

import urllib.request

stream = urllib.request.urlopen( "http://elections.huffingtonpost.com/pollster/api/charts/2016-oregon-president-trump-vs-clinton.csv" )

for line in stream:
    decoded = line.decode("UTF-8")
    print(decoded.strip())

For example, running load_state_polls('wisconsin') should return:

[['38.0', '50.0', '', '12.0', '26282', 'PPP (D-End Citizens United)', '2016-10-18', '2016-10-19', 'Likely Voters', '804', 'IVR/Online', 'Sponsor', 'Dem'], ['40.0', '47.0', '3.0', '5.0', '26274', 'Monmouth University', '2016-10-15', '2016-10-18', 'Likely Voters', '403', 'Live Phone', 'Nonpartisan', 'None'], ['39.0', '47.0', '6.0', '7.0', '26236', 'St Norbert', '2016-10-13', '2016-10-16', 'Likely Voters', '664', 'Live Phone', 'Nonpartisan', 'None'], ['44.0', '48.0', '', '8.0', '26253', 'WashPost/SurveyMonkey', '2016-10-08', '2016-10-16', 'Likely Voters', '1076', 'Internet', 'Nonpartisan', 'None'], ['42.0', '46.0', '8.0', '4.0', '26150', 'Marquette Law School', '2016-10-06', '2016-10-09', 'Likely Voters', '878', 'Live Phone', 'Nonpartisan', 'None'], ['46.0', '51.0', '4.0', '', '26045', 'UPI/CVOTER', '2016-10-02', '2016-10-09', 'Likely Voters', '349', 'Internet', 'Nonpartisan', 'None'], ['39.0', '43.0', '3.0', '11.0', '26016', 'CBS/YouGov', '2016-10-05', '2016-10-07', 'Likely Voters', '993', 'Internet', 'Nonpartisan', 'None'], ['42.0', '46.0', '', '12.0', '26137', 'Ipsos/Reuters', '2016-09-16', '2016-10-06', 'Likely Voters', '866', 'Internet', 'Nonpartisan', 'None'], ['37.0', '47.0', '7.0', '9.0', '26083', 'Loras College', '2016-10-04', '2016-10-05', 'Likely Voters', '500', 'Live Phone', 'Nonpartisan', 'None'], ['47.0', '50.0', '3.0', '', '25946', 'UPI/CVOTER', '2016-09-19', '2016-10-02', 'Likely Voters', '575', 'Internet', 'Nonpartisan', 'None'], ['42.0', '42.0', '', '16.0', '25890', 'Ipsos/Reuters', '2016-09-09', '2016-09-29', 'Likely Voters', '751', 'Internet', 'Nonpartisan', 'None'], ['46.0', '50.0', '4.0', '', '25782', 'UPI/CVOTER', '2016-09-12', '2016-09-25', 'Likely Voters', '553', 'Internet', 'Nonpartisan', 'None'], ['41.0', '41.0', '', '18.0', '25686', 'Ipsos/Reuters', '2016-09-02', '2016-09-22', 'Likely Voters', '685', 'Internet', 'Nonpartisan', 'None'], ['42.0', '44.0', '8.0', '5.0', '25601', 'Marquette Law School', '2016-09-15', '2016-09-18', 'Likely Voters', '642', 'Live Phone', 'Nonpartisan', 'None'], ['40.0', '43.0', '', '17.0', '25576', 'Ipsos/Reuters', '2016-08-26', '2016-09-15', 'Likely Voters', '695', 'Internet', 'Nonpartisan', 'None'], ['44.0', '46.0', '', '11.0', '25347', 'WashPost/SurveyMonkey', '2016-08-09', '2016-09-01', 'Registered Voters', '2687', 'Internet', 'Nonpartisan', 'None'], ['38.0', '43.0', '4.0', '8.0', '25303', 'Monmouth University', '2016-08-27', '2016-08-30', 'Likely Voters', '404', 'Live Phone', 'Nonpartisan', 'None'], ['42.0', '45.0', '7.0', '4.0', '25304', 'Marquette Law School', '2016-08-25', '2016-08-28', 'Likely Voters', '650', 'Live Phone', 'Nonpartisan', 'None'], ['41.0', '48.0', '', '12.0', '25312', 'PPP (D-NELP) ', '2016-08-26', '2016-08-27', 'Likely Voters', '1054', 'IVR/Online', 'Sponsor', 'Dem'], ['37.0', '52.0', '7.0', '3.0', '25118', 'Marquette Law School', '2016-08-04', '2016-08-07', 'Likely Voters', '683', 'Live Phone', 'Nonpartisan', 'None'], ['41.0', '45.0', '9.0', '6.0', '24879', 'Marquette Law School', '2016-07-07', '2016-07-10', 'Likely Voters', '665', 'Live Phone', 'Nonpartisan', 'None'], ['36.0', '41.0', '11.0', '12.0', '24768', 'CBS/YouGov', '2016-06-21', '2016-06-24', 'Likely Voters', '993', 'Internet', 'Nonpartisan', 'None'], ['39.0', '47.0', '', '14.0', '24790', 'PPP (D-Americans United for Change/Constitutional Responsibility Project) ', '2016-06-22', '2016-06-23', 'Likely Voters', '843', 'IVR/Online', 'Sponsor', 'Dem'], ['36.0', '47.0', '7.0', '10.0', '24849', "GQR (D-Democracy Corps/Women's Voices Women Vote)", '2016-06-11', '2016-06-20', 'Likely Voters', '300', 'Live Phone', 'Sponsor', 'Dem'], ['37.0', '46.0', '13.0', '4.0', '24685', 'Marquette Law School', '2016-06-09', '2016-06-12', 'Likely Voters', '666', 'Live Phone', 'Nonpartisan', 'None'], ['31.0', '43.0', '', '', '24567', 'Public Opinion Strategies (R)/Federation for Children', '2016-05-10', '2016-05-12', 'Likely Voters', '600', 'Live Phone', 'Pollster', 'Rep'], ['34.0', '46.0', '12.0', '9.0', '24316', 'St Norbert/WPR/WPTV', '2016-04-12', '2016-04-15', 'Registered Voters', '616', 'Live Phone', 'Nonpartisan', 'None'], ['37.0', '47.0', '', '17.0', '24195', 'Emerson College Polling Society', '2016-03-30', '2016-04-03', 'Likely Voters', '1198', 'Automated Phone', 'Nonpartisan', 'None'], ['35.0', '49.0', '6.0', '10.0', '24182', 'Fox News', '2016-03-28', '2016-03-30', 'Likely Voters', '1602', 'Live Phone', 'Nonpartisan', 'None'], ['37.0', '47.0', '12.0', '5.0', '24172', 'Marquette Law School', '2016-03-24', '2016-03-28', 'Registered Voters', '1405', 'Live Phone', 'Nonpartisan', 'None'], ['37.0', '48.0', '11.0', '6.0', '23872', 'Marquette Law School', '2016-02-18', '2016-02-21', 'Registered Voters', '802', 'Live Phone', 'Nonpartisan', 'None'], ['38.0', '47.0', '9.0', '6.0', '23623', 'Marquette Law School', '2016-01-21', '2016-01-24', 'Registered Voters', '806', 'Live Phone', 'Nonpartisan', 'None'], ['38.0', '48.0', '9.0', '5.0', '23197', 'Marquette Law School', '2015-11-12', '2015-11-15', 'Registered Voters', '803', 'Live Phone', 'Nonpartisan', 'None'], ['39.0', '50.0', '', '11.0', '22973', 'St Norbert/WPR/WPTV', '2015-10-14', '2015-10-17', 'Registered Voters', '603', 'Live Phone', 'Nonpartisan', 'None'], ['36.0', '50.0', '9.0', '4.0', '22823', 'Marquette Law School', '2015-09-24', '2015-09-28', 'Registered Voters', '803', 'Live Phone', 'Nonpartisan', 'None'], ['35.0', '51.0', '10.0', '4.0', '22596', 'Marquette Law School', '2015-08-13', '2015-08-16', 'Registered Voters', '802', 'Live Phone', 'Nonpartisan', 'None']]

2. polls_average(state,completed_after,completed_before): Given a string representation of a state and two dates formatted like those from the data files, return a list with two string values in it (rounded to two decimal places), representing the Trump and Clinton poll average for all polls for that state in which the end_date falls between the completed_after and completed_before dates (inclusive).

Note that in this data set, dates are stored as strings in the format year-month-day. Thankfully, we can use normal >= and <= comparisons with this format and it works exactly as we think it would.

For example, running polls_average('wisconsin', '2016-09-11', '2016-09-27') should return ['42.25', '44.50'] since it will only use the latest 3 polls from the list shown above.

3. current_winner(poll): Given a list that contains at least 2 fields (the first representing Trump's percentage and the second representing Clinton's percentage - much like what you get from polls_average()), return a string showing the winner and margin, cut off to integer precision.

For example, running current_winner(polls_average('wisconsin', '2016-09-11', '2016-09-27')) should return Clinton +2. If the two candidates are tied, return just the word Tie.

Testing

We highly suggest you test these methods by importing them into another file and testing them out there. For example, you could create a file called testing_polling.py with the following code:

from polling import load_state_polls, polls_average, current_winner

print(load_state_polls('oregon'))
print(polls_average('oregon', '2016-09-11', '2016-09-27'))
print(current_winner(polls_average('oregon', '2016-09-11', '2016-09-27')))

This will allow you to test your polling.py file without adding any extra code to it.

Note that if you have any print statements (even commented out) in your submission, it will not be graded. Further, your submission should not have any code that is not in one of the three functions listed above.

Submission: Please submit one .py file named polling.py to the POTD submission system at https://archimedes.cs.virginia.edu/cs1110/. DO NOT submit any testing files. We only want your polling.py file.