handle Cookie banner with Selenium
This commit is contained in:
10
README.md
Normal file
10
README.md
Normal file
@ -0,0 +1,10 @@
|
||||
# Web Scraper
|
||||
|
||||
Simple web scraping with Beautiful Soup 4 (BS4) and Selenium (headless browser).
|
||||
|
||||
Cookie banner can be handled, look into `parse_urls.py`. Therefore wait until the buttons for the banner is loaded, click on it and wait again until the content of the site is loaded. But do this only for the first URL, for the next URLs, the cookies are already set.
|
||||
|
||||
It's easy with BS4 to scrape infos out of HTML. To get a `div` with id, write `elem = soup.find('div', id='my_id')`. To find children (or children of children, etc.) of that element, write `children = elem.findAll('span')`.
|
||||
|
||||
## Chromedriver
|
||||
To use Selenium with Chrome, you need to download ChromeDriver (just google it).
|
||||
Reference in New Issue
Block a user