Preparation before the course starts
Obtaining data via web scraping and APIs isn’t usually something you do “in your browser”. Certainly for continuous data collections, a local setup is required. That’s why you need to install software on your computer before you can get started. All the software used in this class is available open source, i.e., you don’t need to pay for it.
It’s important to spend some time before the beginning of the class to install the required software, and familiarize yourself with Python.
1) Install Python via Anaconda
Please follow the installation guide for Python via Anaconda.Start the installation now!
- The installation of Anaconda can easily take half an hour! Please install it before the start of the class, and ensure you have administrator rights.
- We recommend installing the Anaconda Individual Edition (Anaconda 64-Bit Graphical Installer; version 3.9), which you find here. Make sure to select the correct package for your operating system.
- In the tutorial (at 2:23), we open the Command Prompt. On Mac, this program is called Terminal (for more information see this interactive walkthrough - optional).
2) Obtain access to Premium Content at Datacamp.com
We use material provided by Datacamp.com that otherwise is only available via paid premium subscriptions. Students can use this material with their @tilburguniversity.edu account for free.Unlock Premium content now!
3) Getting to know Python
Eager for more? You can already follow the introduction course to Python on Datacamp, chapters 1-3. Novices will need about 3-4 hours. If you don’t have the time, don’t worry. In your first course week, we have set aside some time to work through the tutorial and exercises.Start your first tutorial on Datacamp.com now!
4) Getting to know the terminal/command line
The command line (also called terminal) is a text-based interface in which you can interact with your computer by entering commands. It’s important to know how to use the command line when learning web scraping because many scraping tools and libraries are executed through command-line commands, allowing users to navigate directories, run scripts, and manage data extraction more efficiently.
Never worked with the command line/terminal? Then please develop your command line (Windows) / terminal (Mac) skills!
- Check out the ( Datacamp tutorial - first chapter only)
- Also check out
this presentation about the command line / terminal.
- Though it is targeted to Mac users, it provides a great overview.
- Windows users could still follow along using the same commands by using Git Bash (or Cygwin).
- The main goal here is to understand how directory structures work and how to navigate in the terminal or command line.