handle Cookie banner with Selenium

This commit is contained in:
Andre Heber
2024-01-15 10:36:36 +01:00
parent 24b64ec248
commit a1a1bc757b
5 changed files with 208 additions and 18 deletions

10
README.md Normal file
View File

@ -0,0 +1,10 @@
# Web Scraper
Simple web scraping with Beautiful Soup 4 (BS4) and Selenium (headless browser).
Cookie banner can be handled, look into `parse_urls.py`. Therefore wait until the buttons for the banner is loaded, click on it and wait again until the content of the site is loaded. But do this only for the first URL, for the next URLs, the cookies are already set.
It's easy with BS4 to scrape infos out of HTML. To get a `div` with id, write `elem = soup.find('div', id='my_id')`. To find children (or children of children, etc.) of that element, write `children = elem.findAll('span')`.
## Chromedriver
To use Selenium with Chrome, you need to download ChromeDriver (just google it).