2. Create spider
Author: Luis Rosenstrauch
Create a new branch

Respository: https://github.com/hoaxly/hoaxly-scraping-container
After setting up the environment visit http://hoaxly.docksal:9001/#/projects/hoaxlyPortia
Enter url you want to scrape

Using the portia interface: visit the page where you want to start crawling through links

Create a new spider

Follow a link to a sample item you want to scrape


Create a new sample annotation

Select the appropriate schema (hoaxly)
TODO: screenshot of new schema


Annotate the first element by clicking on the visible project headline
Select the appropriate field from schema

Repeat for all fields in the schema


Close sample

Configure url crawling schema

using regex:

Export spider as scrapy spider (python code)
Add the new spider to the scrapy_projects directory and commit the new spider

% git add scrapy_projects/hoaxlyPortia/spiders/ -p
% git commit scrapy_projects/hoaxlyPortia/spiders/
use a commit message that tells us what spider you are adding using which schema
Create a merge request
Last updated