24
loading...
This website collects cookies to deliver better user experience
licenseType
query parameter in our API call. This utilizes Creative Commons licenses. We can specify exactly what type of license our images has. We can specify that want images that are public where the copyright is fully waived, which is what we will do. There are many Creative Commons license types that the Bing Image Search supports and there's a full list here.photo
. If we don't specify this we could get back animated GIFs, clip art, or drawings of Aston Martin cars.pip install requests
if you are using another envrionment.import json
import requests
import time
import os
import pprint
open
method to open the file to be able to read it and use the json
module to load the JSON file. This creates a dictionary where the JSON keys are the key names of the dictionary where you can get the values.config = json.load(open("config.json"))
api_key = config["apiKey"]
endpoint = "https://api.bing.microsoft.com/"
url = f"{endpoint}v7.0/images/search"
Ocp-Apim-Subscription-Key
header for the API key.headers = { "Ocp-Apim-Subscription-Key": api_key }
q
params = {
"q": "aston martin",
"license": "public",
"imageType": "photo",
"safeSearch": "Strict",
}
requests
we can just call the get
method. In there we pass in the URl, the headers, and the parameters. We use the raise_for_status
method to throw an exception if the status code isn't successful. Then, we get the JSON of the response and store that into a variable. Finally, we use the pretty print method to print the JSON response.response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
result = response.json()
pprint.pprint(result)
{'_type': 'Images',
'currentOffset': 0,
'instrumentation': {'_type': 'ResponseInstrumentation'},
'nextOffset': 38,
'totalEstimatedMatches': 475,
'value': [{'accentColor': 'C6A105',
'contentSize': '1204783 B',
'contentUrl': '[https://www.publicdomainpictures.net/pictures/380000/velka/aston-martin-car-1609287727yik.jpg](https://www.publicdomainpictures.net/pictures/380000/velka/aston-martin-car-1609287727yik.jpg)',
'creativeCommons': 'PublicNoRightsReserved',
'datePublished': '2021-02-06T20:45:00.0000000Z',
'encodingFormat': 'jpeg',
'height': 1530,
'hostPageDiscoveredDate': '2021-01-12T00:00:00.0000000Z',
'hostPageDisplayUrl': '[https://www.publicdomainpictures.net/view-image.php?image=376994&picture=aston-martin-car](https://www.publicdomainpictures.net/view-image.php?image=376994&picture=aston-martin-car)',
'hostPageFavIconUrl': '[https://www.bing.com/th?id=ODF.lPqrhQa5EO7xJHf8DMqrJw&pid=Api](https://www.bing.com/th?id=ODF.lPqrhQa5EO7xJHf8DMqrJw&pid=Api)',
'hostPageUrl': '[https://www.publicdomainpictures.net/view-image.php?image=376994&picture=aston-martin-car](https://www.publicdomainpictures.net/view-image.php?image=376994&picture=aston-martin-car)',
'imageId': '38DBFEF37523B232A6733D7D9109A21FCAB41582',
'imageInsightsToken': 'ccid_WTqn9r3a*cp_74D633ADFCF41C86F407DFFCF0DEC38F*mid_38DBFEF37523B232A6733D7D9109A21FCAB41582*simid_608053462467504486*thid_OIP.WTqn9r3aKv5TLZxszieEuQHaF5',
'insightsMetadata': {'availableSizesCount': 1,
'pagesIncludingCount': 1},
'isFamilyFriendly': True,
'name': 'Aston Martin Car Free Stock Photo - Public Domain '
'Pictures',
'thumbnail': {'height': 377, 'width': 474},
'thumbnailUrl': '[https://tse2.mm.bing.net/th?id=OIP.WTqn9r3aKv5TLZxszieEuQHaF5&pid=Api](https://tse2.mm.bing.net/th?id=OIP.WTqn9r3aKv5TLZxszieEuQHaF5&pid=Api)',
'webSearchUrl': '[https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=aston+martin&id=38DBFEF37523B232A6733D7D9109A21FCAB41582&simid=608053462467504486](https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=aston+martin&id=38DBFEF37523B232A6733D7D9109A21FCAB41582&simid=608053462467504486)',
'width': 1920}]
nextOffset
: This will help us page items to perform multiple requests.value.contentUrl
: This is the actual URL of the image. We will use this URL to download the images.nextOffset
item in the API response. We can use this value to pass in another query parameter offset
to give the next page of results.new_offset = 0
while new_offset <= 200:
print(new_offset)
params["offset"] = new_offset
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
result = response.json()
time.sleep(1)
new_offset = result["nextOffset"]
for item in result["value"]:
contentUrls.append(item["contentUrl"])
while
loop we limit to just 200 images for the offset. Within the loop we set the offset
parameter to the current offset, which will be 0 initially. Then we make the API call, we sleep or wait for one second, and we set the offset
parameter to the nextOffset
from the results and save the contentUrl
items from the results into a list. Then, we do it again until we reach the limit of our offset.contentUrl
items from each of the images. In order to get the images as training data we need to download them. Before we do that, let's set up our paths to be ready for images to be downloaded to them. First we set the path and then we use the os
module to check if the path exists. If it doesn't, we'll create it.dir_path = "./aston-martin/train/"
if not os.path.exists(dir_path):
os.makedirs(dir_path)
os.path.join
method to get the correct path for the system we're on, and open the path with the open
method. With that we can use requests
again with the get
method and pass in the URL. Then, with the open
function, we can write to the path from the image contents.for url in contentUrls:
path = os.path.join(dir_path, url)
try:
with open(path, "wb") as f:
image_data = requests.get(url)
f.write(image_data.content)
except OSError:
pass
https://www.publicdomainpictures.net/pictures/380000/velka/aston-martin-car-1609287727yik.jpg
https://images.pexels.com/photos/592253/pexels-photo-592253.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260
https://images.pexels.com/photos/2811239/pexels-photo-2811239.jpeg?cs=srgb&dl=pexels-tadas-lisauskas-2811239.jpg&fm=jpg
https://get.pxhere.com/photo/car-vehicle-classic-car-sports-car-vintage-car-coupe-antique-car-land-vehicle-automotive-design-austin-healey-3000-aston-martin-db2-austin-healey-100-69398.jpg
https://get.pxhere.com/photo/car-automobile-vehicle-automotive-sports-car-supercar-luxury-expensive-coupe-v8-martin-vantage-aston-land-vehicle-automotive-design-luxury-vehicle-performance-car-aston-martin-dbs-aston-martin-db9-aston-martin-virage-aston-martin-v8-aston-martin-dbs-v12-aston-martin-vantage-aston-martin-v8-vantage-2005-aston-martin-rapide-865679.jpg
https://c.pxhere.com/photos/5d/f2/car_desert_ferrari_lamborghini-1277324.jpg!d
jpeg
there are a few with some extra parameters on the end. If we try to download with those URLs we won't get the image. So we need to do a little bit of data cleaning here.?
in the URL and if there is a !
in the URL. With those patterns we can update our loop to download the images to the below to get the correct URLs for all images.for url in contentUrls:
split = url.split("/")
last_item = split[-1]
second_split = last_item.split("?")
if len(second_split) > 1:
last_item = second_split[0]
third_split = last_item.split("!")
if len(third_split) > 1:
last_item = third_split[0]
print(last_item)
path = os.path.join(dir_path, last_item)
try:
with open(path, "wb") as f:
image_data = requests.get(url)
#image_data.raise_for_status()
f.write(image_data.content)
except OSError:
pass