# 2. Create spider

## Create a new branch

![](/files/-LGRsEgG7mInKPJqLAI0)

Respository: <https://github.com/hoaxly/hoaxly-scraping-container>

## After setting up the environment visit <http://hoaxly.docksal:9001/#/projects/hoaxlyPortia>

## Enter url you want to scrape

![](/files/-LGRsK4HpFy_70ImulJd)

## Using the portia interface: visit the page where you want to start crawling through links

![Example start url](/files/-LGRs_OTUXolSTDkmGW-)

## Create a new spider

![](/files/-LGRsgChVKkFIvph5WVw)

## Follow a link to a sample item you want to scrape

![Sample item link](/files/-LGRslG4Z8m1jm5MdP1p)

![Sample item page](/files/-LGRsu5Q7llSB63FpAzH)

## Create a new sample annotation

![](/files/-LGRtI4SH_RP8irAV2wP)

## Select the appropriate schema (hoaxly)

TODO: screenshot of new schema

![](/files/-LGRtN6559OS8nUjK6zX)

![](/files/-LGRtQEzSVEASGVG5dhm)

## Annotate the first element by clicking on the visible project headline

## Select the appropriate field from schema

![](/files/-LGRu4Pswb-zXCeTsz5S)

## Repeat for all fields in the schema

![](/files/-LGRtiE45AfP8KRW1sjy)

![](/files/-LGRtxo8sWINUIVvneTU)

## Close sample

![](/files/-LGRuEHsxw1k4t-qhuJq)

## Configure url crawling schema

![](/files/-LGRuLyQOc5aTBJr_sff)

using regex:

![](/files/-LGRuPALp8W9UqecQPWc)

## Export spider as scrapy spider (python code)

## Add the new spider to the scrapy\_projects directory and commit the new spider

![](/files/-LGRuZuxzV3Pm3UEZgiZ)

&#x20;`% git add scrapy_projects/hoaxlyPortia/spiders/ -p`&#x20;

`% git commit scrapy_projects/hoaxlyPortia/spiders/`&#x20;

use a commit message that tells us what spider you are adding using which schema

## Create a merge request


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hoaxly.gitbook.io/documentation/developer-documentation/adding-new-sites-to-the-database/technical-steps-to-create-spiders/2.-create-spider.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
