Tutorial: Web data for dummies
Learning goals
- Explain the differences between retrieving data from websites vs. APIs
- Retrieve web data in Python using the
requests
library, and store retrieved data in HTML or JSON/TXT files for further inspection. - Use browser control tools (“inspect”) to develop strategies how to select and capture information from websites (e.g., text, numbers, pictures, etc.)
- Select elements from websites using BeautifulSoup (e.g., class names, attribute or tag names)
- Select elements from JSON dictionaries obtained through APIs (attribute-value pairs)
- Apply programming concepts (e.g., loops, functions) to the collection of web data, and convert dictionaries to JSON files.
Prerequisites
- Basic Python skills, for example by working through the Python Bootcamp.
Downloading to and starting the tutorial on your computer
- Click on the download button. If the file opens in your browser, right-click on the download link instead and select “download linked file as…)".
- Move the file to a convenient file location (e.g., somewhere in your course folder)
- If the downloaded file is a
.zip
(compressed) file, unzip it.- Open Jupyter Notebook (e.g., using the terminal or the Anaconda Navigator), navigate to the folder where you stored the downloaded files, and open the
.ipynb
file from within Jupyter Notebook.- Start with the tutorial!