2. Create spider

Author: Luis Rosenstrauch

Create a new branch

Respository: https://github.com/hoaxly/hoaxly-scraping-container

After setting up the environment visit http://hoaxly.docksal:9001/#/projects/hoaxlyPortia

Enter url you want to scrape

Create a new spider

Create a new sample annotation

Select the appropriate schema (hoaxly)

TODO: screenshot of new schema

Annotate the first element by clicking on the visible project headline

Select the appropriate field from schema

Repeat for all fields in the schema

Close sample

Configure url crawling schema

using regex:

Export spider as scrapy spider (python code)

Add the new spider to the scrapy_projects directory and commit the new spider

% git add scrapy_projects/hoaxlyPortia/spiders/ -p

% git commit scrapy_projects/hoaxlyPortia/spiders/

use a commit message that tells us what spider you are adding using which schema

Create a merge request

Last updated