27
loading...
This website collects cookies to deliver better user experience
# To download the webpage
pip install requests
# To scrape data from the downloaded webpage
pip install beautifulsoup4
import requests
url = "https://www.imdb.com/search/title?release_date=2019&sort=user_rating,desc&ref_=adv_nxt"
# get() method downloads the entire HTML of the provided url
response = requests.get(url)
# Get the text from the response object
response_text = response.text
from bs4 import BeautifulSoup
# Create a BeautifulSoup object
# response_text -> The downloaded webpage
# lxml -> Used for processing HTML and XML pages
soup = BeautifulSoup(response_text,'lxml')
# As we saw the rating's class name was "ratings-bar"
# we prefix "." since its a class
rating_class_selector = ".ratings-bar"
# Extract the all the ratings class
rating_list = soup.select(rating_class_selector)
<div>
elements containing “ratings-bar” as class name. We need to get the text from within the div element.<div class="ratings-bar">
<div class="inline-block ratings-imdb-rating" data-value="10" name="ir">
<span class="global-sprite rating-star imdb-rating"></span>
<strong>10.0</strong>
</div>
...
</div>
<strong>
tag. We can extract the tags using find(‘tagName’) method and get the text using getText().# This List will store all the ratings
ratings = []
# Iterate through all the ratings object
for rating_object in rating_list:
# Find the <strong> tag and get the Text
rating_text = rating_object.find('strong').getText()
# Append the rating to the list
ratings.append(rating_text)
print(ratings)