30
loading...
This website collects cookies to deliver better user experience
scrapy crawl examplespider
command should work.requirements.txt
, Pipfile
or setup.py
.# For macOS:
$ brew tap heroku/brew && brew install heroku
cd
to your project folder and run heroku login
.heroku: Press any key to open up the browser to login or q to exit:
Opening browser to https://cli-auth.heroku.com/auth/cli/browser/xxxx-xxxx-xxxx-xxxx-xxxx?requestor=xxxx.xxxx.xxxxLogging in... done
Logged in [email protected]
git init
and git commit
etc. # i. To create a Heroku application:
$ heroku apps:create scrapy_example_project
# ii. Add a remote to your local repository:
$ heroku git:remote -a scrapy_example_project
You would only need this step if your Scrapy project has a pipeline that stores scraped items in a PostgreSQL database. Take note that the free tier only has a limit of 10,000 rows at the time of writing this.
# settings.py
# This is just an example, you might be using a different variable name
DATABASE_CONNECTION_STRING = '{drivername}://{user}:{password}@{host}:{port}/{db_name}'.format(
drivername='postgresql',
user=os.environ.get('PG_USERNAME', 'postgres'),
password=os.environ.get('PG_PASSWORD'),
host=os.environ.get('PG_HOST', 'localhost'),
port=os.environ.get('PG_PORT', '5432'),
db_name=os.environ.get('PG_DATABASE', 'burplist'),
)
# Or alternatively:
DATABASE_CONNECTION_STRING = ‘postgres://xxxx:[email protected]:5432/xxxxxx
heroku run scrapy crawl examplespider
on your local terminal, you should see that it will attempt to run the crawler on your Heroku server.This section of the article shows you how you can run your crawlers/spiders periodically.
scrapy crawl examplespider
command periodically, simply select a time interval and save job.$ scrapy list | xargs -n 1 scrapy crawl
$ heroku run scrapy list | xargs -n 1 heroku run scrapy crawl
scrapy crawl examplespider
commands.bash task.sh weekly 3
to run on every Wednesday#!/bin/bash
# Currently Heroku Scheduler only supports scheduling at every 10min/hour/day interval
# Reference: https://dashboard.heroku.com/apps/burplist/scheduler
# To run every Monday
# ./task.sh weekly 1
# To run now
# ./task.sh
if [[ "$1" == "weekly" ]]; then
echo "Frequency: <Weekly> | Day of the week: <$2>"
if [ "$(date +%u)" = "$2" ]; then
echo "Starting 🕷 to get data from the 🕸..."
scrapy list | xargs -n 1 scrapy crawl
echo "Finished running all 🕷."
fi
else
echo "Frequency: <Now>"
echo "Starting 🕷 to get data from the 🕸..."
scrapy list | xargs -n 1 scrapy crawl
echo "Finished running all 🕷."
fi
Done!