handle Cookie banner with Selenium

2024-01-15 10:36:36 +01:00
parent 24b64ec248
commit a1a1bc757b
5 changed files with 208 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,10 @@
+# Web Scraper
+
+Simple web scraping with Beautiful Soup 4 (BS4) and Selenium (headless browser).
+
+Cookie banner can be handled, look into `parse_urls.py`. Therefore wait until the buttons for the banner is loaded, click on it and wait again until the content of the site is loaded. But do this only for the first URL, for the next URLs, the cookies are already set.
+
+It's easy with BS4 to scrape infos out of HTML. To get a `div` with id, write `elem = soup.find('div', id='my_id')`. To find children (or children of children, etc.) of that element, write `children = elem.findAll('span')`.
+
+## Chromedriver
+To use Selenium with Chrome, you need to download ChromeDriver (just google it).