Files
web_scraper/README.md
2024-01-15 10:36:36 +01:00

688 B

Web Scraper

Simple web scraping with Beautiful Soup 4 (BS4) and Selenium (headless browser).

Cookie banner can be handled, look into parse_urls.py. Therefore wait until the buttons for the banner is loaded, click on it and wait again until the content of the site is loaded. But do this only for the first URL, for the next URLs, the cookies are already set.

It's easy with BS4 to scrape infos out of HTML. To get a div with id, write elem = soup.find('div', id='my_id'). To find children (or children of children, etc.) of that element, write children = elem.findAll('span').

Chromedriver

To use Selenium with Chrome, you need to download ChromeDriver (just google it).