hoax.ly documentation
  • Hoax.ly Documentation
  • User Documentation
    • About
    • General FAQs
    • Using the hoaxlybot
    • Using the hoax.ly Browser Extension
    • Using debunkCMS
    • Terms of Use
    • Data Privacy
  • Developer Documentation
    • hoax.ly technical architecture
    • Using the hoax.ly API
    • Adding new sites to the database
      • Normalizing ratings
      • Criteria for adding new sources
      • Technical steps to create spiders
        • 1. Setup environment
        • 2. Create spider
        • 3. Run a spider using the hoaxly-scraping-container
        • 4. Deploy spiders
    • Developing/Updating debunkCMS
    • Contributions
  • Polite scraping
  • Benutzerdokumentation
    • Über hoax.ly
    • FAQs
Powered by GitBook
On this page
  1. Developer Documentation
  2. Adding new sites to the database
  3. Technical steps to create spiders

1. Setup environment

Author: Luis Rosenstrauch

PreviousTechnical steps to create spidersNext2. Create spider

Last updated 6 years ago

Requirements:

docker-2.3.0 docker-compose-1.13.0

Note: make sure to run this on your host. This is needed for elasticsearch to work.

sudo sysctl -w vm.max_map_count=262144

Step 1: Get the code and Fetch the images

start by getting your copy of the spiderbreeder

$ git clone git@git.acolono.net:hoaxly/hoaxly-scraping-container.git

$ cd hoaxly-scraping-container

Login to our registry (using your gitlab credentials) to get at the images that you need in order to locally build and run spiders.

docker login registry.acolono.net:444 docker pull registry.acolono.net:444/hoaxly/hoaxly-storage-container

docker pull registry.acolono.net:444/hoaxly/hoaxly-scrapydaemon-container docker pull registry.acolono.net:444/hoaxly/hoaxly-scraping-container

Step 2: Spin up the local instances and initialize them.

In your project's rootfolder, run:

fin init

Open your preferred browser and go to:

http://portia.hoaxly.docksal