16
loading...
This website collects cookies to deliver better user experience
UPDATE 2021-06-06
A few days after publishing this I came across a bug where agent jobs would be stuck in pending state. I've since fixed this and documented some additional changes I've made at the end of the post.
Also big thanks to zblesk for their blog post on Huginn which helped me iron out some of the environment variables!
version: "3"
services:
huginn:
command:
- /scripts/init
container_name: huginn
environment:
- SMTP_PORT=587
- SMTP_SERVER=<SMTP_SERVER>
- SMTP_PASSWORD=<SMTP_PASSWORD>
- SMTP_USER_NAME=<SMTP_USER_NAME>
- SMTP_ENABLE_STARTTLS_AUTO=true
- SMTP_AUTHENTICATION=plain
- SMTP_DOMAIN=<SMTP_DOMAIN>
- DATABASE_POOL=30
- TIMEZONE=London
- IMPORT_DEFAULT_SCENARIO_FOR_ALL_USERS=false
image: huginn/huginn:latest
ports:
- 3000:3000/tcp
restart: unless-stopped
user: "1000"
volumes:
- mysql:/var/lib/mysql
working_dir: /app
volumes:
mysql:
driver: local
TIMEZONE
to your timezone, I found that having an incorrectly set timezone caused Huginn jobs to be backed up in a pending state. I tried Europe/London
first but that caused the process to crash on boot; so I ultimately got it working with just London
.{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [
":443"
],
"routes": [
{
"handle": [
{
"handler": "subroute",
"routes": [
{
"handle": [
{
"handler": "reverse_proxy",
"upstreams": [
{
"dial": "192.168.2.15:3000"
}
]
}
]
}
]
}
],
"match": [
{
"host": [
"huginn.joannet.casa"
]
}
],
"terminal": true
},
// ... others removed for brevity
],
"tls_connection_policies": [
{
"match": {
"sni": [
"huginn.joannet.casa",
// ... others removed for brevity
]
}
},
{}
]
}
}
},
"tls": {
"automation": {
"policies": [
{
"issuer": {
"challenges": {
"dns": {
"provider": {
"api_token": "CLOUDFLARE_API_TOKEN",
"name": "cloudflare"
}
}
},
"module": "acme"
},
"subjects": [
"huginn.joannet.casa",
// ... others removed for brevity
]
}
]
}
}
}
}
https://huginn.joannet.casa
to the IP address of my Caddy server.Website Agent ------ invokes ------> Email Agent
try-huginn
.IMPORT_DEFAULT_SCENARIO_FOR_ALL_USERS=false
then you should not see any there.NHSScrape
<li>
node, which returns 76 results on the page we’re scraping. These are represented by the boxes that are highlighted yellow.<li>
nodes I’m not interested in, so that I only have 1 result left. As I’ve already made my first selection, any subsequent clicks will now filter those out. So now it’s just a case of playing whack-a-mole until all yellow highlighted fields are gone.<li>
elements in this range.<li>
properties also make up headings at the top of the page - so these appear highlighted in yellow too.<li>
range which needs to be filtered.//ul[(((count(preceding-sibling::*) + 1) = 6) and parent::*)]//li[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]
N.B. since writing this it looks like the xpath changed - I’ve updated the config below with the same. I retrieved it using the same method as described above.
As a future task it’d be great to see if we could be alerted on when the working status of a job fails - but it seems that feature is missing.
{
"expected_update_period_in_days": "2",
"url": "https://www.nhs.uk/conditions/coronavirus-covid-19/coronavirus-vaccination/coronavirus-vaccine/",
"type": "html",
"mode": "on_change",
"extract": {
"title": {
"xpath": "//ul[(((count(preceding-sibling::*) + 1) = 4) and parent::*)]//li[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]",
"value": "normalize-space(.)"
}
}
}
{
"subject": "NHS Coronavirus Page update",
"headline": "Vaccine age updated",
"expected_receive_period_in_days": "2"
}
SMTP_DOMAIN=gmail.com
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
SMTP_AUTHENTICATION=plain
SMTP_USER_NAME=$EMAIL_ADDRESS
SMTP_PASSWORD=$APP_PASSWORD
SMTP_ENABLE_STARTTLS_AUTO=true
DATABASE_POOL=30
on_change
which is the desired end mode, and will only trigger its receivers if there has been a change in the property being selected. If we change the mode to be all
then it will always invoke the receivers.services:
huginn:
command:
- /scripts/init
container_name: huginn_huginn
environment:
- SMTP_PORT=587
- SMTP_SERVER=<SMTP_SERVER>
- SMTP_PASSWORD=<SMTP_PASSWORD>
- SMTP_USER_NAME=<SMTP_USER_NAME>
- SMTP_ENABLE_STARTTLS_AUTO=true
- SMTP_AUTHENTICATION=plain
- SMTP_DOMAIN=<SMTP_DOMAIN>
- TIMEZONE=London
- DATABASE_POOL=30
- DATABASE_NAME=huginn
- DATABASE_USERNAME=huginn
- DATABASE_PASSWORD=<MYSQL_PASSWORD>
- DATABASE_HOST=huginn_mysql
- DATABASE_PORT=3306
- START_MYSQL=false
- DATABASE_ENCODING=utf8mb4
- IMPORT_DEFAULT_SCENARIO_FOR_ALL_USERS=false
- DOMAIN=huginn.joannet.casa
- INVITATION_CODE=<INVITATION_CODE>
image: huginn/huginn:latest
ports:
- 3000:3000/tcp
restart: unless-stopped
user: "1000"
working_dir: /app
depends_on:
- mysql
mysql:
image: mysql
container_name: huginn_mysql
restart: always
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=<MYSQL_ROOT_PASSWORD>
- MYSQL_DATABASE=huginn
- MYSQL_USER=huginn
- MYSQL_PASSWORD=<MYSQL_PASSWORD>
volumes:
- mysql:/var/lib/mysql
volumes:
mysql:
driver: local
mysql
container is pretty standard so I won't cover that here. There are some env vars I had to add to the huginn
container:DATABASE_HOST
- the database hostname to connect to, we can use the container name hereDATABASE_PORT
- the port to which to connect to the databaseDATABASE_NAME
- the name of the database to useDATABASE_USERNAME
- who we should connect to the database asDATABASE_PASSWORD
- authentication for the userDATABASE_ENCODING
- a requirement when using a newer version of MySQL as defined in the documentation
START_MYSQL
- whether to use a local mysql daemon or notIMPORT_DEFAULT_SCENARIO_FOR_ALL_USERS
- I don't care able the agents that are added by defaultINVITATION_CODE
- Lock down Huginn by requiring this code for new user sign upsDOMAIN
- the endpoint that Huginn is available atdepends_on
on the database container to assist with orchestration. On first boot however it takes some time for mysql to initialise the database, so Huginn may fail as the database is not yet ready to be connected to. Once the initialisation is done then reboot the Huginn container and it should be able to bootstrap the database fine.16