Hannes Datta
We're about to start with today's tutorial (“API 101”).
Browse the documentation of music-to-scrape's API.
1) Have a look at the various endpoints that are available. At what unit of analysis do they allow you to collect data?
2) Why would a platform offer an API in the first place?
Run the snippet below.
import requests
url = "https://api.music-to-scrape.org/users"
response = requests.get(url)
request = response.json()
print(request)
?limit=20
) and run again. What happens?.json
file and view in VS Code (install JSON plugin first)DO: Run this snippet to write JSON output to a file! Inspect it using JSONviewer or VS Code!
import json
f=open('users.json', 'w', encoding='utf-8')
f.write(json.dumps(request))
f.close()
['name_of_attribute']
or.get('name_of_attribute')
request['limit']
# or:
request.get('limit')
request['limit'] # this one worked for extracting the value for limit...
#Q1:
request['data']
#Q2:
user_names = [] # initialize empty array
for user in request['data']: # start loop
user_names.append(user['username']) # append user name to the list of user_names (initialized above)
user_names # inspect the result
Suppose we do not want to extract the ages and country names for the users…
Can you come up with a way to anonymize them (e.g., overwrite them with NA), while keeping the rest of the dictionary intact?
new_dic = []
for user in request['data']:
obj = user
obj['username']='NA'
obj['age']='NA'
new_dic.append(obj)
new_dic
.append()
adds one item at a time to an existing listusers = [] # empty list
users.append('another user')
users.append('yet another user')
users
extend()
adds multiple items to an existing listusers = []
new_users = ['another user','yet another user']
users.extend(new_users)
users
myapi.com/search/cats_and_dogs
[not the case for music to scrape]myapi.com/search/?query=cats_and_dogs
The API documentation will tell you what is required!
params
requires to be a dictionary with the parameter names and corresponding valuesimport requests
url = "https://api.music-to-scrape.org/users"
response = requests.get(url, params = {'limit': 15})
request = response.json()
print(request)
Can you speculate about the benefits of submitting parameters in the header (params
) rather than in the URL?
Remember iterating through pages on a website to “view” data? APIs know the same concept!
offset
parameter to the snippet below. What happens when you set it to 1
? What happens when you set it to 5
?# start with this code
import requests
url = "https://api.music-to-scrape.org/users?limit=10"
response = requests.get(url)
request = response.json()
print(request)
# q1: setting the offset parameter
import requests
requests.get("https://api.music-to-scrape.org/users?limit=10").json()
requests.get("https://api.music-to-scrape.org/users?limit=10&offset=1").json()
requests.get("https://api.music-to-scrape.org/users?limit=10&offset=5").json()
# q2:
requests.get("https://api.music-to-scrape.org/users?limit=10&offset=10").json()
get_users()
, with parameters limit
and offset
, returning the dictionary of users from the API endpoint /users
.import requests
def get_users(limit, offset):
obj = requests.get(f"https://api.music-to-scrape.org/users?limit={limit}&offset={offset}").json()
return(obj['data'])
get_users(10,1)
for
loop.for x in range(6):
print(x)
Do: Modify the snippet below so that it calls get_users()
10 times, incrementing the offset by 10 at each iteration.
offset=0
for x in range(10):
print(get_users(limit=10, offset=offset))
offset=offset+10
for
loops (you usually know beforehand when to stop), and while
loops (the ending point can change, say when “there is no new data coming in”)# for loop
for x in range(6):
print(x)
cntr = 0
while cntr < 6:
print(cntr)
cntr = cntr+1
We can now combine our learnings to build a function that extracts 100 user names and meta data to new-line separated JSON files.
.py
file and test whether it runs from command prompt/terminal# will develop in class
import requests
import json
cntr = 0
f=open('output.json', 'w')
while cntr <= 50:
f.write(json.dumps(get_users(limit=10, offset=cntr)))
f.write('\n')
cntr = cntr+10
f.close()
The tutorial proceeds by introducing a series of additional endpoints.
user/plays
- get a user's total number of playscharts/top-artists
- see a list of top-performing artist for this week (and previous weeks)user/plays
, following the guidelines in the documentation. Do you succeed?charts/top-artists
. Do you get some output?# code in class