Scraping Internal Links of a Website with Python

Following script generate a list of internal URLs of a website. To parse the HTML sources, we use BeautifulSoup and urllib libraries in Python. First list is the links that are in the main page. After that, we will follow each link to get other URLs. It goes to level 2 depth links, but you can adapt it to your needs.

This list of links can be used for a lot of reasons. I will use this list to test some security vulnerabilities.

We need an initial website from where the scan starts. We will make a single GET request to generate first list of internal links.

Latest version: GitHub

