hoax.ly documentation
  • Hoax.ly Documentation
  • User Documentation
    • About
    • General FAQs
    • Using the hoaxlybot
    • Using the hoax.ly Browser Extension
    • Using debunkCMS
    • Terms of Use
    • Data Privacy
  • Developer Documentation
    • hoax.ly technical architecture
    • Using the hoax.ly API
    • Adding new sites to the database
      • Normalizing ratings
      • Criteria for adding new sources
      • Technical steps to create spiders
        • 1. Setup environment
        • 2. Create spider
        • 3. Run a spider using the hoaxly-scraping-container
        • 4. Deploy spiders
    • Developing/Updating debunkCMS
    • Contributions
  • Polite scraping
  • Benutzerdokumentation
    • Über hoax.ly
    • FAQs
Powered by GitBook
On this page
  1. Developer Documentation
  2. Adding new sites to the database
  3. Technical steps to create spiders

4. Deploy spiders

Author: Luis Rosenstrauch

Previous3. Run a spider using the hoaxly-scraping-containerNextDeveloping/Updating debunkCMS

Last updated 6 years ago

This will be normally just for internal use.

Scrapyd is a daemon that can be started to schedule runs

configure your live instance hostname in scrapy.cfg once you tested everything locally you can deploy to live scrapyd and schedule crawls using scrapyd-client

docker exec -ti cli bash

scrapyd-deploy live

once deployed you can interact directly with scrapyd through the webapi, either using the client

docker exec -ti cli bash scrapyd-client -t schedule -p Hoaxlyspiders climatefeedback.org

or from anywhere else.

curl -d project=Hoaxlyspiders -d spider=climatefeedback.org curl

curl

A crawl can be scheduled to run regularly by deploying it to a dedicated server.

For portia spiders deployment should work normally but currently requires a workaround in our settings.

https://doc.scrapy.org/en/latest/index.html
http://scrapyd.readthedocs.io/en/latest/
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/schedule.json
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listprojects.json
https://htaccessusername:htaccesspassword@scrapyd.hoax.ly/listspiders.json?project=Hoaxlyspiders