Week 2: Workflow & learning how to scrape
Learning goals
- Differentiate between retrieving data from websites and APIs.
- Retrieve and store web data in various formats using Python’s requests library and browser inspection tools.
- Extract and manipulate data from websites and APIs using BeautifulSoup and JSON handling techniques.
- Apply programming concepts to automate data collection and understand the use of Jupyter Notebooks vs. raw Python files.
Lecture
Laptop required!
- Tutorial ( in-class slides, download tutorial, view tutorial in Google Colab, solutions provided by students: download, Google Colab)
Coaching session
- Please check out the project page.
After the lecture and coaching session
Complete the exercises contained at the end of the tutorial (about 1-2 hours)
Watch “What is web scraping and what are Application Programming Interfaces (APIs)?" (30 minutes)
“Fields of Gold: Scraping Web Data for Marketing Insights”
- Please watch the webinar (1 hour)
- Please carefully read the paper (2 hours)
Finalize team enrollment on Canvas.
This paper will provide a guiding framework for the rest of this course, and chance is you’ll have to read it a couple of times (e.g., first to get an overview, and later to appreciate and use the details in your project). The web appendix contains valuable tables, so don’t skip them.