4. Deploy spiders

Author: Luis Rosenstrauch

This will be normally just for internal use.

Scrapyd is a daemon that can be started to schedule runs

https://doc.scrapy.org/en/latest/index.html

http://scrapyd.readthedocs.io/en/latest/

configure your live instance hostname in scrapy.cfg once you tested everything locally you can deploy to live scrapyd and schedule crawls using scrapyd-client

docker exec -ti cli bash

scrapyd-deploy live

once deployed you can interact directly with scrapyd through the webapi, either using the client

docker exec -ti cli bash scrapyd-client -t https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/ schedule -p Hoaxlyspiders climatefeedback.org

or from anywhere else.

curl https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/schedule.json -d project=Hoaxlyspiders -d spider=climatefeedback.org curl https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listprojects.json

curl https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listspiders.json?project=Hoaxlyspiders

A crawl can be scheduled to run regularly by deploying it to a dedicated server.

For portia spiders deployment should work normally but currently requires a workaround in our settings.

Previous3. Run a spider using the hoaxly-scraping-container NextDeveloping/Updating debunkCMS

Last updated 7 years ago