Web data for dummies

Tutorial: Web data for dummies

Learning goals

  • Explain the differences between retrieving data from websites vs. APIs
  • Retrieve web data in Python using the requests library, and store retrieved data in HTML or JSON/TXT files for further inspection.
  • Use browser control tools (“inspect”) to develop strategies how to select and capture information from websites (e.g., text, numbers, pictures, etc.)
  • Select elements from websites using BeautifulSoup (e.g., class names, attribute or tag names)
  • Select elements from JSON dictionaries obtained through APIs (attribute-value pairs)
  • Apply programming concepts (e.g., loops, functions) to the collection of web data, and convert dictionaries to JSON files.

Prerequisites

Downloading to and starting the tutorial on your computer

  • Click on the download button. If the file opens in your browser, right-click on the download link instead and select “download linked file as…)".
  • Move the file to a convenient file location (e.g., somewhere in your course folder)
  • If the downloaded file is a .zip (compressed) file, unzip it.
  • Open Jupyter Notebook (e.g., using the terminal or the Anaconda Navigator), navigate to the folder where you stored the downloaded files, and open the .ipynb file from within Jupyter Notebook.
  • Start with the tutorial!