Should use the Pipeline framework as its base.
Steps
- Fetch data from websites
- Gathers data from FFG and FWF
- Configuration about how the website is scraped
Making it dynamic and configurable should make it easier to adjust to website changes
- Results: CSV file of gathered data
- Post-processing:
- Any quick and simple processing that can be done on the data to make further steps easier
- FFG:
- Combine organisations for every project into single JSON array
- Combine outputs and save them to data storage