4. Deploy spiders

Author: Luis Rosenstrauch

This will be normally just for internal use.

Scrapyd is a daemon that can be started to schedule runs

https://doc.scrapy.org/en/latest/index.htmlarrow-up-right

http://scrapyd.readthedocs.io/en/latest/arrow-up-right

configure your live instance hostname in scrapy.cfg once you tested everything locally you can deploy to live scrapyd and schedule crawls using scrapyd-client

docker exec -ti cli bash

scrapyd-deploy live

once deployed you can interact directly with scrapyd through the webapi, either using the client

docker exec -ti cli bash scrapyd-client -t https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/arrow-up-right schedule -p Hoaxlyspiders climatefeedback.org

or from anywhere else.

curl https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/schedule.jsonarrow-up-right -d project=Hoaxlyspiders -d spider=climatefeedback.org curl https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listprojects.jsonarrow-up-right

curl https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listspiders.json?project=Hoaxlyspidersarrow-up-right

A crawl can be scheduled to run regularly by deploying it to a dedicated server.

For portia spiders deployment should work normally but currently requires a workaround in our settings.

Last updated