4. Deploy spiders
Author: Luis Rosenstrauch
This will be normally just for internal use.
Scrapyd is a daemon that can be started to schedule runs
https://doc.scrapy.org/en/latest/index.html
http://scrapyd.readthedocs.io/en/latest/
configure your live instance hostname in scrapy.cfg once you tested everything locally you can deploy to live scrapyd and schedule crawls using scrapyd-client
docker exec -ti cli bash
scrapyd-deploy live
once deployed you can interact directly with scrapyd through the webapi, either using the client
docker exec -ti cli bash scrapyd-client -t
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/
schedule -p Hoaxlyspiders climatefeedback.org
or from anywhere else.
curl
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/schedule.json
-d project=Hoaxlyspiders -d spider=climatefeedback.org curl
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listprojects.json
curl
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listspiders.json?project=Hoaxlyspiders
A crawl can be scheduled to run regularly by deploying it to a dedicated server.
For portia spiders deployment should work normally but currently requires a workaround in our settings.
Last updated