hoax.ly documentation
  • Hoax.ly Documentation
  • User Documentation
    • About
    • General FAQs
    • Using the hoaxlybot
    • Using the hoax.ly Browser Extension
    • Using debunkCMS
    • Terms of Use
    • Data Privacy
  • Developer Documentation
    • hoax.ly technical architecture
    • Using the hoax.ly API
    • Adding new sites to the database
      • Normalizing ratings
      • Criteria for adding new sources
      • Technical steps to create spiders
        • 1. Setup environment
        • 2. Create spider
        • 3. Run a spider using the hoaxly-scraping-container
        • 4. Deploy spiders
    • Developing/Updating debunkCMS
    • Contributions
  • Polite scraping
  • Benutzerdokumentation
    • Über hoax.ly
    • FAQs
Powered by GitBook
On this page

Polite scraping

This documentation is currently only available in English.

PreviousContributionsNextBenutzerdokumentation

Last updated 6 years ago

"The first rule of web crawling is you do not harm the website. The second rule of web crawling is you do NOT harm the website. We’re supporters of the democratization of web data, but not at the expense of the website’s owners. "

A polite crawler respects robots.txt A polite crawler never degrades a website’s performance A polite crawler identifies its creator with contact information A polite crawler is not a pain in the buttocks of system administrators

We are full committed to the defined by scrapinghub (although our spiders don't run on scrapinghub).

Tell us if you have an API we can use instead of scraping your site! If not available you can help us by providing structured content like schema.org.

What data do we scrape?

We just index metadata and link to the original source whenever possible. Our tools are aimed to get your content more visitors and readers. You can see it in action when you use the chatbot or the Browser Extension. The listed reviews are displayed as linked headings only with your organization name included.

Is your site being scraped by us and you have a complaint? Write us a mail at

polite scraping guidelines
ClaimReview
bot@hoax.ly