4. Deploy spiders

Author: Luis Rosenstrauch

This will be normally just for internal use.

Scrapyd is a daemon that can be started to schedule runs



configure your live instance hostname in scrapy.cfg once you tested everything locally you can deploy to live scrapyd and schedule crawls using scrapyd-client

docker exec -ti cli bash

scrapyd-deploy live

once deployed you can interact directly with scrapyd through the webapi, either using the client

docker exec -ti cli bash scrapyd-client -t https://htaccessusername:[email protected]/ schedule -p Hoaxlyspiders climatefeedback.org

or from anywhere else.

curl https://htaccessusername:[email protected]/schedule.json -d project=Hoaxlyspiders -d spider=climatefeedback.org curl https://htaccessusername:[email protected]/listprojects.json

curl https://htaccessusername:[email protected]/listspiders.json?project=Hoaxlyspiders

A crawl can be scheduled to run regularly by deploying it to a dedicated server.

For portia spiders deployment should work normally but currently requires a workaround in our settings.