Team Project
Overview and project goal
Collecting data via web scraping and APIs requires practice. Together with your team members, you plan and execute an online data collection throughout the course by closely following the recommendations in “Fields of Gold”.
At the end of the course, you will submit a data package, consisting of
- your collected data,
- all source code, and
- a documentation (following this template).
The focus lies on completing an entire data collection project. Keep each stage of your project manageable and feasible. Your project will ultimately be written up as a proper data documentation, following this paper and corresponding template download. Head over to the grading details to find out more!
Getting started
Workplan, deliverables and coaching Grading requirements Past projects Tips and examples
Organization
Coaching sessions
During the course, you will have the opportunity to meet up with the course instructor for coaching sessions. These sessions are meant for you to receive feedback on your ideas and code. Frequently, this also entails problem-solving & debugging.
Teams participating in coaching sessions attend the entire session. They typically work together on the team project, and the course instructor will “walk around” to address students’ questions. When on Zoom, breakout rooms will be created. There will be approximately 10 minutes per team in total; some teams may prefer to use the entire time at once, while other teams may prefer to ask multiple, short questions over the entire duration of the session.
Deliverables: Most coaching sessions will help the team work on some deliverables, which are always due at the end of a course week and are submitted on Canvas by Friday, 7 pm.
This is how to prepare for coaching sessions:
- You get most out of the coaching sessions if you have already done some work on your project.
- In addition, ensure you have worked through preparation material for the lectures before meeting as a team.
- Make active use of the academic literature (i.e., Boegershausen et al. 2022 and Guyt et al. 2024)
- If you encounter technical problems, check whether a solution has been posted already
- e.g., on this website, this YouTube playlist, or the associated YouTube channel
- also conduct your own Google/Stackoverflow search
- Start Jupyter Notebook and load your scripts before the start of the coaching session!
- your “problem” (e.g., error message) needs to be “on the screen”, so that your coach can fiddle around with it. Merely showing a screenshot of an error message does not work.
- When meeting online…
- download & install TeamViewer so that the course instructor can use your keyboard and mouse. Try out a connection with a team member before your first meeting to verify your installation works and you can remote-control your peer’s computer.
- ensure you are sharing the screen with Jupyter Notebook, and join the scheduled session.
- Share in a private message to the course instructor your TeamViewer ID & temporary password.
- Be prepared that the course instructor takes over your screen. Be able to talk (i.e., check your microphone settings before!).
Typical issues to discuss in a coaching session
- How to capture data, and convert it into a proper format for storage (e.g., CSV file, JSON file)?
- How to verify whether all data that should have been downloaded/captured indeed was captured?
- How to schedule/run the data collection for extended periods?
- How to deal with technical hurdles (e.g., authentication, logging in on a site, scrolling)
Team composition
- 4-5 students per team
- you need to subscribe to a team yourself (be present in the live streams for that; registration on Canvas!)
- we recommend teams to have at least one-two students with coding expertise in Python on their team
Deadline and submission
- Deadline: tba
- Submission of your data package via Canvas in one zip file.