Hannes Datta
We're about to start with today's tutorial (“API 101”).
Browse the documentation of music-to-scrape's API.
1) Have a look at the various endpoints that are available. At what unit of analysis do they allow you to collect data?
2) Why would a platform offer an API in the first place?
Run the snippet below.
import requests
url = "https://api.music-to-scrape.org/users"
response = requests.get(url)
request = response.json()
print(request)
?limit=20
) and run again. What happens?.json
file and view in VS CodeDO: Run this snippet to write JSON output to a file! Inspect it using JSONviewer or VS Code!
import json
f=open('users.json', 'w', encoding='utf-8')
f.write(json.dumps(request))
f.close()
['name_of_attribute']
.get('name_of_attribute')
request['limit']
# or:
request.get('limit')
request['limit'] # this one worked for extracting the value for limit...
Suppose we do not want to extract the ages and country names for the users…
Can you come up with a way to anonymize them (e.g., overwrite them with NA), while keeping the rest of the dictionary intact?
.append()
adds one item at a time to an existing listusers = [] # empty list
users.append('another user')
users.append('yet another user')
users
extend()
adds multiple items to an existing listusers = []
new_users = ['another user','yet another user']
users.extend(new_users)
users
myapi.com/search/cats_and_dogs
[not the case for music to scrape]myapi.com/search/?query=cats_and_dogs
The API documentation will tell you what is required!
params
requires to be a dictionary with the parameter names and corresponding valuesimport requests
url = "https://api.music-to-scrape.org/users"
response = requests.get(url, params = {'limit': 15})
request = response.json()
print(request)
Can you speculate about the benefits of submitting parameters in the header (params
) rather than in the URL?
offset
parameter to the snippet below. What happens when you set it to 1
? What happens when you set it to 5
?# start with this code
import requests
url = "https://api.music-to-scrape.org/users?limit=10"
response = requests.get(url)
request = response.json()
print(request)
get_users()
, with parameters limit
and offset
, returning the dictionary of users from the API endpoint /users
.for
loop.for x in range(6):
print(x)
Do: Modify the snippet below so that it calls get_users()
10 times, incrementing the offset by 10 at each iteration.
for
loops (you usually know beforehand when to stop), and while
loops (the ending point can change, say when “there is no new data coming in”)# for loop
for x in range(6):
print(x)
cntr = 0
while cntr < 6:
print(cntr)
cntr = cntr+1
We can now combine our learnings to build a function that extracts all user names to new-line separated JSON files.
.py
file and test whether it runs from command prompt/terminal# will develop in class
The tutorial proceeds by introducing a series of additional endpoints.
user/plays
- get a user's total number of playscharts/top-artists
- see a list of top-performing artist for this week (and previous weeks)user/plays
, following the guidelines in the documentation. Do you succeed?charts/top-artists
. Do you get some output?# code in class
requests
?