Code Review

https://www.youtube.com/watch?v=eIWFnNz8mF4&t=217s

It is a concurrent scraper in @golang

How it works:

You call the binary (iterscraper) and you give it a URL "http://foo.com/%d" where '%d' is a pattern that will be replaced by an ID. e.g. 'http://foo.com/1' up to 'http://foo.com/9'. Then you can use how many go-routines you want to use at the same time (-concurrency). Then you chose where you should be writing the output (-output) to. Then the '-nameQuery, -addressQuery, emailQuery' are the CSS selectors we are going to be using to find whatever we are looking for (name? address? e-mail) in the URL (e.g. 'http://foo.com/1').

A basic package used for scraping information from a website where URLs contain an incrementing integer. Information is retrieved from HTML5 elements, and outputted as a CSV.

1. Fetch the code:
go get github.com/philipithomas/iterscraper

2. Go to the code
cd github.com/philipithomas/iterscraper

3. Create a new branch
git checkout -b work

4. Open VSCode
code .

main.go
-------
* Defines all the flags
* Parses those flags
* Uses a WaitGroup and Channels to communicate with different parts of the work
* The different parts are 3:
  * emitTasks --> generates every single task that we need to do. Every task is a URL with an ID.
                  It sends the task to the 'taskChan' channel
  * scrape --> Is a worker that will receive the task from the taskChannel, parse the URL and find
               whatever we need to find and then send the results to the 'dataChan' channel.
  * writeSites --> Writes all the output to a CSV file