27
loading...
This website collects cookies to deliver better user experience
item's name , in store url , target price
import requests
import bs4
import csv
import re
from os.path import exists
def tracker(filepath , item='' , target=0):
if len(item) != 0:
if not exists(filepath):
f = open(filepath , 'w' , encoding='UTF8')
else:
f = open(filepath , 'a')
writer = csv.writer(f)
![]() |
---|
Figure 1. Weird scary stuff that pops up on the browser. |
rq = requests.get(item , headers=({'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36', 'Accept-Language': 'en-US, en;q=0.5'}))
NOTE: When we use a browser to access a webpage, the servers hosting the page know which browser the user is using so it can send the page optimized to that specific browser. If this information is not available then we may run into some problems: the code gotten from the request is not the same as the one we see in our browser; the server can block unrecognized traffic so it prevents Python to get the info. In this tutorial we are pretending to be using Mozilla Firefox 😉. In addition to this, we want to get the webpage in English despite of our system's preferences.
info = bs4.BeautifulSoup(rq.content , features='lxml')
![]() |
---|
Figure 2. Retrieved information in a nested structure. |
name = info.find(id = "productTitle").get_text().strip()
price = info.find(id = "price_inside_buybox").get_text().strip()
price = re.split(r'(^[^\d]+)', info.find(id = "price_inside_buybox").get_text().strip())[-1]
writer.writerow([name , item , target , price])
if target > 0 and price <= target:
print("BUY IT NOW!!!!!!!!!!!!!!!!!!!")
fread = open(filepath , 'r')
items = list(csv.reader(fread))
fread.close()
fwrite = open(filepath , 'w')
wr = csv.writer(fwrite)
for i in range(len(items)):
rq = requests.get(items[i][1] , headers=({'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36', 'Accept-Language': 'en-US, en;q=0.5'}))
info = bs4.BeautifulSoup(rq.content , features='lxml')
price = re.split(r'(^[^\d]+)', info.find(id = "price_inside_buybox").get_text().strip())[-1]
items[i].append(price)
if items[i][2] > 0 and price <= items[i][2]:
print("BUY IT NOW!!!!!!!!!!!!!!!!!!!")
wr.writerows(items)
fwrite.close()
⚠ WARNING ⚠
If you run this script a considerable amount of times you can get an error stating that the id couldn't be found don't worry, this happens because the server hosting an URL has blocked you 🙁 . Each time you run this script a request is sent from your IP address to the target server, if the server gets several requests from you in a very short period of time it will classify your traffic as not normal and block your access. In this case just wait till the next day 😉.
To prevent the script from crashing if a URL is blocked, and still check the others, we can add some try and except blocks. Check the full code below.
import requests
import bs4
import csv
import re
from os.path import exists
def tracker(filepath , item='' , target=0):
if len(item) != 0:
if not exists(filepath):
f = open(filepath , 'w' , encoding='UTF8')
else:
f = open(filepath , 'a')
writer = csv.writer(f)
rq = requests.get(item , headers=({'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36', 'Accept-Language': 'en-US, en;q=0.5'}))
info = bs4.BeautifulSoup(rq.content , features='lxml')
try:
name = info.find(id = "productTitle").get_text().strip()
price = re.split(r'(^[^\d]+)', info.find(id = "price_inside_buybox").get_text().strip())[-1]
if target > 0 and price <= target:
print("BUY IT NOW!!!!!!!!!!!!!!!!!!!")
writer.writerow([name , item , target , price])
except:
raise Exception("Couldn't retrieve product's info")
else:
fread = open(filepath , 'r')
items = list(csv.reader(fread))
fread.close()
fwrite = open(filepath , 'w')
wr = csv.writer(fwrite)
for i in range(len(items)):
rq = requests.get(items[i][1] , headers=({'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36', 'Accept-Language': 'en-US, en;q=0.5'}))
info = bs4.BeautifulSoup(rq.content , features='lxml')
try:
price = re.split(r'(^[^\d]+)', info.find(id = "price_inside_buybox").get_text().strip())[-1]
items[i].append(price)
if items[i][2] > 0 and price <= items[i][2]:
print("BUY IT NOW!!!!!!!!!!!!!!!!!!!" + items[i][1])
except:
raise Exception("Couldn't retrieve product's info")
wr.writerows(items)
fwrite.close()
tracker("items.csv")
<path/to/python.exe> <path/to/python/price/tracker/script>
pause
![]() |
---|
Figure 3. Steps to schedule a task on Windows. |
dnf install cronie cronie-anacron
systemctl enable crond
chkconfig crond on
crontab -e
to open the file where all scheduled tasks are, if any;0 10 * * * python/path price/tracker/path
. This means the script will run every day at 10a.m. To mess with the scheduling time you can use https://crontab.guru/ ;Save and exit. You should see this output on the terminal
crontab: installing new crontab
Finish!