Lecture Date: Wednesday, October 26

Let's review what we've done and do some more examples!

First, back to that weather example we ended with on Monday... Yeah... that was slightly harder than I thought it would be. We have to use a slightly different method to get a very particular tag that says that it is the temperature. This data is stored in a tag attribute, which we can access with the curly braces as shown below.


import urllib.request
import bs4

web = urllib.request.urlopen("")
page =

parsedPage = bs4.BeautifulSoup(page, "html.parser")

for tag in parsedPage.find_all("span", {"data-variable" : "temperature"}, class_="wx-data"):

What could you do with this? Well, it's actually pretty easy to have this code continuously running on a Raspberry Pi with one of these hooked up to it: And then you can have a light bar changing color based on the temperature! Neat!

Have you ever gone to a website and wanted to download, say, all of the pictures on that page without having to click all of them? Maybe you want to download a whole bunch of .mp3 files?

# Code based on
import urllib.request, os, bs4

count = 10 # how many comics to download

url = ''              # starting url
os.makedirs('xkcd', exist_ok=True)   # store comics in ./xkcd

while count > 0:

    # First, download the page.
    print('Downloading page', url)
    webpage = urllib.request.urlopen(url)

    parsed_page = bs4.BeautifulSoup(, "html.parser")

    # Use BeautifulSoup to find the URL of the comic image.
    comic_elem ='#comic img')
    if comic_elem == []:
         print('Could not find comic image.')
        comic_url = 'http:' + comic_elem[0].get('src')
        # Download the image.
        print('Downloading image', comic_url)
        comic_page = urllib.request.urlopen(comic_url)
        count -= 1

        # Save the image to ./xkcd.
        image_file = os.path.join('xkcd', os.path.basename(comic_url))
        with open(image_file, 'b+w') as file:

    # Get the Prev button's url.
    prev_link ='a[rel="prev"]')[0]
    url = '' + prev_link.get('href')


If time, we'll pick a dataset from and do some more example programs.