Preparation before the course starts

Obtaining data via web scraping and APIs isn’t usually something you do “in your browser”. Certainly for continuous data collections, a local setup is required. That’s why you need to install software on your computer before you can get started. All the software used in this class is available open source, i.e., you don’t need to pay for it.

It’s important to spend some time before the beginning of the class to install the required software, and familiarize yourself with Python.

1) Install Python via Anaconda

Please follow the installation guide for Python via Anaconda. Already have Anaconda, but the installation has been done a while ago? Then please update Anaconda (execute conda update --all on your command prompt/terminal; see here for details).

Tips for Installing Python/Anaconda
  • The installation of Anaconda can easily take half an hour! Please install it before the start of the class, and ensure you have administrator rights.
  • We recommend installing Anaconda (Anaconda 64-Bit Graphical Installer; version 3.13), which you find here. Make sure to select the correct version for your operating system.
  • In the tutorial (at 2:23), we open the Command Prompt. On Mac, this program is called Terminal (for more information on the command prompt/terminal, see tip box below).

2) Obtain access to Premium Content at Datacamp.com

We use material provided by Datacamp.com that otherwise is only available via paid premium subscriptions. Students can use this material with their @tilburguniversity.edu account for free. The link is provided as a link in the preparation module on Canvas.

3) Getting to know Python

Eager for more? You can already follow the introduction course to Python on Datacamp, chapters 1-3. Novices will need about 3-4 hours. If you don’t have the time, don’t worry. In your first course week, we have set aside some time to work through the tutorial and exercises.

Getting to know the terminal/command

On top of just installing software and getting to know Python, we also recommend familiarizing yourself with the command line (also called terminal). The command line is a text-based interface in which you can interact with your computer by entering commands. It’s important to know how to use the command line when learning web scraping because many scraping tools and libraries are executed through command-line commands, allowing users to navigate directories, run scripts, and manage data extraction more efficiently.

Never worked with the command line/terminal? Then please develop your command line (Windows) / terminal (Mac) skills!