crawling website with hashes in url
Created by: jlvdh
What is the current behavior?
Crawling a website that uses # (hashes) for url navigation does not crawl the pages that use #
The urls using # are not followed.
If the current behavior is a bug, please provide the steps to reproduce
Try crawling a website like mykita.com/en/
What is the motivation / use case for changing the behavior?
Though hashes are not ment to change a page, it is sometimes used by webdevelopers. It would be great to provide an option to crawl urls containing hashes so headless chrome crawler could be used for these pages.
Some research shows it might be an issue with puppeteer:
https://github.com/GoogleChrome/puppeteer/issues/257
And there seems to be a workaround:
https://github.com/GoogleChromeLabs/puppeteer-examples/blob/master/hash_navigation.js