Before you can run your scripts that include Selenium you have to set up a web driver. You will need to download the correct drivers, and initiate the driver in your code. When I first started working with Selenium automation I was annoyed by the setup of the Selenium Web Driver. Therefore, I have built a small class that will set up the driver on any system. This article might be a bit more difficult to understand if you have never created classes in Python, although not a lot of configuration is required for the script to work.
Selenium web driver
The Selenium web driver allows you to automate QA processes, as well as create scrapers. With this driver, you can load JavaScript on your website, which is not possible with modules like Scrapy, and Urllib. Conceptually you can imagine the Selenium driver to be the controller of your browser. The driver will tell your browser what to do and you will be able to load all dynamic elements that are dependent on JavaScript. The driver is an actual (executable) file on your machine. The most common drivers are:
This article describes how to setup a Chromium web driver using a simple class (CustomSelenium)
Initiate the Selenium web driver setup class
I have tried to write a robust class that takes allows you to specify your operating system and the version of your Google Chrome. You can check your version by going to “chrome://version” in Google Chrome. The headless argument allows you to run Chrome headless, meaning that the browser is not displayed and will run faster. Additionally, after you run this class for the first time, you will have downloaded all necessary files to run the Selenium web driver. You can set rebuild = False to prevent downloading the chrome driver multiple times.
- your_os: Operating System. Possible values: mac, windows, linux
- your_version: Google Chrome Version
- headless: Run Selenium headless
- rebuild: Whether to rebuild (i.e. redownload) chromedriver
Don’t forget to install the required modules using the following command in your terminal
pip install selenium
pip install requests
import glob
import os
import pathlib
import zipfile
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import requests
class CustomSelenium:
OS_MAPPER = {"mac": "mac64",
"windows": "win32",
"linux": "linux64"}
def __init__(self, your_os, your_version, headless, rebuild=False):
""" Initiate Build Selenium class
Args:
your_os (str): Operating System. Possible values: mac, windows, linux
your_version (str): Google Chrome Version
headless (bool): Run Selenium headless
rebuild (bool): Whether to rebuild (i.e. redownload) chromedriver
"""
self.driver = None
self.wait = None
self.opso = self.OS_MAPPER[your_os.lower()]
self.headless = headless
self.version = str(your_version).rsplit(".", 1)[0]
self.latest_release = ""
self.chromedriver_path = str(cwd) + '/chromedriver'
self.download_path = str(cwd) + '/downloads'
self.chrome_options = webdriver.ChromeOptions()
# Rebuild if forced or if no chromedriver exists yet
nr_chromedriver_files = glob.glob(
os.path.join(self.download_path, "*"))
if rebuild or len(nr_chromedriver_files) == 0:
self.get_latest_release()
self.download_chromedriver()
else:
print("Chromedriver already downloaded, so no need for rebuilding")
self.init_driver()
Download Chrome web driver and initiate web driver
I have created two functions for automating the download of the chrome web driver. As you have already specified your chrome version when initiating the class, the get_latest_release function will be able to find the correct web driver for your Chrome. With download_chromedriver we will download the zip file containing the chrome driver from the Google servers. The function also takes care of unzipping the file and storing it in a folder called /chromedriver.
def get_latest_release(self):
""" Get latest Chromedriver release based on Chrome version """
response = requests.get("https://chromedriver.storage.googleapis.com/LATEST_RELEASE_" + str(self.version))
print("Response code for: " + str(response.status_code))
self.latest_release = response.text
def download_chromedriver(self):
""" Download chromedriver """
# Download chromedriver based on operating system and latest chromedriver release
response = requests.get("https://chromedriver.storage.googleapis.com/" +
self.latest_release + "/chromedriver_" + self.opso + ".zip")
if response != "":
if not os.path.isdir(cwd + "/downloads"):
os.mkdir(cwd + "/downloads")
with open(os.open(cwd + "/downloads/" + "chromedriver.zip", os.O_CREAT | os.O_WRONLY, 0o755), "wb") as file:
file.write(response.content)
file = zipfile.ZipFile(self.download_path + "/chromedriver.zip")
if not os.path.isdir(self.chromedriver_path):
os.mkdir(self.chromedriver_path)
file.extractall(path=self.chromedriver_path)
os.chmod(self.chromedriver_path+"/chromedriver", 0o755)
The last function init_driver returns the web driver so you can use it wherever you want in your code. As we are using a class, the driver will now be part of this class.
def init_driver(self):
""" Sets Selenium options and returns driver
Returns:
[obj]: Selenium Driver
"""
# Set UserAgent to prevent issues with blocking bot
self.chrome_options.add_argument(
"user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36")
# Set headless
if self.headless:
self.chrome_options.add_argument('headless')
# Initiate driver
driver = webdriver.Chrome(
self.chromedriver_path + "/chromedriver", options=self.chrome_options)
self.driver = driver
self.wait = WebDriverWait(self.driver, 20)
return self.driver
A practical example of the easy setup Selenium web driver
After copying all the code above, you are able to use this class in your main code. The code below shows an example of initiating the CustomSelenium class. And subsequently, to navigate to a specific URL.
URL = "https://google.com"
cwd = os.getcwd()
# Build selenium with Custom Selenium Class
cust_sel = CustomSelenium("mac","88.0.4324.96",
headless=False)
# Initiate driver
driver = cust_sel.driver
print("Opening: %s" %URL)
# Navigate to URL
driver.get(URL)
You can find the full code in this GitHub repository. In the future, I will write an article explaining how I build a scraper for the availability of the PS5 using Selenium. If you are more interested in Selenium, you can also check out my article describing the different kinds of waits in Selenium Web Driver.
Please let me know if you think this information is useful!
Pingback: How to use implicit/explicit wait in Selenium Webdriver - Automation Help
Pingback: Create a simple web scraper using Selenium - Automation Help
Pingback: Selenium Web Driver Chrome Profile in Python - Automation Help
Pingback: Python Selenium Rotating Proxies - Automation Help