Selenium like a Ninja¶

In order to be a real ninja scraper, you will have to build a custom selenium driver 😎

Writing a custom Selenium driver offers several benefits, particularly when dealing with dynamic and complex web pages that may have measures to detect and block automated scraping. It provides a level of control and customization that is necessary for effective scraping of modern web applications.

Our goal 🚀¶

Marche à suivre

L'URL à utiliser pour cet exercice est : https://www.welcometothejungle.com/fr/jobs?page=1&refinementList%5Bprofession_name.fr.Tech%5D%5B%5D=Data%20Analysis&refinementList%5Bcontract_type_names.fr%5D%5B%5D=CDI

On peut remarquer que :

Il y a plusieurs pages de résultats, que l'on peut parcourir simplement en changeant page=k dans l'URL. Il y a 30 postes proposés par page de résultats.
Welcome to the jungle a implémenté des mesures anti-scraping. En particulier, une partie du HTML est cachée lorsque l'on requête la page avec requests. Il est indispensable de commencer à scroller la page pour lancer le code JavaScript qui révèle le contenu caché.

$\rightarrow$ Pour résoudre ce problème, on ne peut se contenter de BeautifulSoup. Il faut Simuler le comportement d'une vraie personne qui parcourt la page avec sa souris, c'est donc Selenium qu'il nous faut

$\rightarrow$ Voilà une fonction qui permet de simuler un scroll de page jusqu'à la ième offre d'emploi :

def scroll(driver, i):
        scroll_delta = int(250)
        scroll_delta += 140*i
        driver.execute_script("window.scrollBy(0, "+ str(scroll_delta) + ")")

Une autre mesure anti-scraping concerne les noms de classes, les ids et même les liens vers des images dans le code HTML. Tous ces noms sont aléatoires (ex:class="sc-1flb27e-5 cdtiMs") et changent à chaque chargement de la page.

$\rightarrow$ Une bonne nouvelle quand même : toutes les classes ne sont pas aléatoires, certaines restent fixes. Pour les noms aléatoires, certaines lettres du nom sont fixes également. On peut donc toujours utiliser des similarités pour désigner certains tags spécifiques (ex : le tag header, contenant le nombre total de résultats, commence toujours par "hd").

$\rightarrow$ Pour exploiter cette faille, il est conseillé d'utiliser la méthode Selenium find_elements_by_css_selector() pour désigner des tags précis, car cette méthode permet de d'identifier un tag par un texte partiel (ex: driver.find_elements_by_css_selector("header[class^='hd']") pour toutes les classes de headers qui commencent par "hd").
Au bout du compte, on souhaite sauvegarder le contenu de chaque offre d'emploi dans un fichier .txt.

$\rightarrow$ Il va donc falloir cliquer sur chaque offre d'emploi avec la méthode .click() de Selenium. Pour chaque offre d'emploi, le contenu de l'offre est stocké dans un dictionnaire à l'intérieur d'un tag <script>. On peut utiliser la méthode json.loads() pour manipuler ce dictionnaire. On peut finalement l'enregistrer en .txt avec les fonctions open() et .write.
Sauvegarder le contenu de chaque offre d'emploi dans une database postgres puis mongodb.

$\rightarrow$Utiliser un dataframe comme structure intermédiaire.
$\rightarrow$Quel est le problème de postgres?
$\rightarrow$Quel est la différence avec mongodb?

Custom Selenium driver¶

In [263]:

Copied!





def initialize_driver(headers_list, proxy_list):
    options = Options()
    #select a random user-agent from the list
    user_agent = random.choice(headers_list)["User-Agent"]
    options.add_argument(f"user-agent={user_agent}")
    
    #select a random proxy from the list
    proxy = random.choice(proxy_list)
    if proxy:
        options.add_argument(f"--proxy-server={proxy}")
    
    #add some common options
    options.add_argument("--headless")
    options.add_argument("--disable-extensions")
    options.add_argument("--ignore-certificate-errors")

    #initialize Chrome WebDriver with the specified options
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)

    #set implicit wait of 10sec
    driver.implicitly_wait(10)

    return driver

# Example usage
custom_driver = initialize_driver(headers_list, proxy_list)
def initialize_driver(headers_list, proxy_list):
    options = Options()
    #select a random user-agent from the list
    user_agent = random.choice(headers_list)["User-Agent"]
    options.add_argument(f"user-agent={user_agent}")
    
    #select a random proxy from the list
    proxy = random.choice(proxy_list)
    if proxy:
        options.add_argument(f"--proxy-server={proxy}")
    
    #add some common options
    options.add_argument("--headless")
    options.add_argument("--disable-extensions")
    options.add_argument("--ignore-certificate-errors")

    #initialize Chrome WebDriver with the specified options
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)

    #set implicit wait of 10sec
    driver.implicitly_wait(10)

    return driver

# Example usage
custom_driver = initialize_driver(headers_list, proxy_list)

In [264]:

Copied!

custom_driver 
custom_driver

Out[264]:

<selenium.webdriver.chrome.webdriver.WebDriver (session="a8096916214b4e61e3b2cdc9b2f57dcc")>

Get the main page¶

Our goal here is very simple: writing a python function called MainPage to get this url : https://www.welcometothejungle.com/fr/jobs?page=1&configure%5Bfilters%5D=website.reference%3Awttj_fr&configure%5BhitsPerPage%5D=30&aroundQuery=France&refinementList%5Boffice.country_code%5D%5B%5D=FR&refinementList%5Bcontract_type_names.fr%5D%5B%5D=CDI&refinementList%5Bcontract_type_names.fr%5D%5B%5D=Stage&query=%22data%20analyst%22&range%5Bexperience_level_minimum%5D%5Bmin%5D=0&range%5Bexperience_level_minimum%5D%5Bmax%5D=1

and do a sleep of 3 seconds.

def MainPage(driver, url):
    '''Go the the first page and sleep(3)'''
    #code here

In [ ]:

Copied!





def MainPage(driver, url):
    '''Go the the first page'''
    driver.get(url)
    sleep(3)
def MainPage(driver, url):
    '''Go the the first page'''
    driver.get(url)
    sleep(3)

Get the number of offers per page¶

Write a python function who return the number of job offer in a page

def nbOffers(driver):
    try:
        #code here 

    except Exception as e:
        print("An error occurred in NB_OFFER:", str(e))
        return 0  # Or handle the exception as needed

Example usage of the nbOffers function :

url = f"https://www.welcometothejungle.com/fr/jobs?page=1&configure%5Bfilters%5D=website.reference%3Awttj_fr&configure%5BhitsPerPage%5D=30&aroundQuery=France&refinementList%5Boffice.country_code%5D%5B%5D=FR&refinementList%5Bcontract_type_names.fr%5D%5B%5D=CDI&refinementList%5Bcontract_type_names.fr%5D%5B%5D=Stage&query=%22data%20analyst%22&range%5Bexperience_level_minimum%5D%5Bmin%5D=0&range%5Bexperience_level_minimum%5D%5Bmax%5D=1"
MainPage(driver, url)
nb_offers = nbOffers(driver)

Then write a python function to get all the jobs post :

def nbOffers_tot(driver):
    try:
        #code here

    except Exception as e:
        print("An error occurred in NB_OFFER TOTAL:", str(e))
        return 0  # Or handle the exception as needed

In [326]:

Copied!





#ouverture de la page et récupération du nombre d'offres
url = f"https://www.welcometothejungle.com/fr/jobs?page=1&configure%5Bfilters%5D=website.reference%3Awttj_fr&configure%5BhitsPerPage%5D=30&aroundQuery=France&refinementList%5Boffice.country_code%5D%5B%5D=FR&refinementList%5Bcontract_type_names.fr%5D%5B%5D=CDI&refinementList%5Bcontract_type_names.fr%5D%5B%5D=Stage&query=%22data%20analyst%22&range%5Bexperience_level_minimum%5D%5Bmin%5D=0&range%5Bexperience_level_minimum%5D%5Bmax%5D=1"
MainPage(driver, url)
nb_offers = nbOffers(driver)
nb_offerst = nbOffers_tot(driver)
print(f"\nNumbers offers tot : {nb_offerst} \nNumber of offers per page : {nb_offers}")
#ouverture de la page et récupération du nombre d'offres
url = f"https://www.welcometothejungle.com/fr/jobs?page=1&configure%5Bfilters%5D=website.reference%3Awttj_fr&configure%5BhitsPerPage%5D=30&aroundQuery=France&refinementList%5Boffice.country_code%5D%5B%5D=FR&refinementList%5Bcontract_type_names.fr%5D%5B%5D=CDI&refinementList%5Bcontract_type_names.fr%5D%5B%5D=Stage&query=%22data%20analyst%22⦥%5Bexperience_level_minimum%5D%5Bmin%5D=0⦥%5Bexperience_level_minimum%5D%5Bmax%5D=1"
MainPage(driver, url)
nb_offers = nbOffers(driver)
nb_offerst = nbOffers_tot(driver)
print(f"\nNumbers offers tot : {nb_offerst} \nNumber of offers per page : {nb_offers}")

Numbers offers tot : 68 
Number of offers per page : 30

`Click` and `getText` functions¶

Write a python function who click on a given selenium element :

def Click(driver, pos):
    '''Click on the link'''
    try:
        #code here
    except Exception as e:
        print("An error occurred in CLICK:", str(e))
        return 0  # Or handle the exception as needed

Then write a function who get the text with beatifulsoup of a job post, save it into a list and a txt file :

def GetText(driver, jobs):
    sleep(3)
    #code here
    try:
        
    except Exception as e:
        print(f"Error HTML PARSING: {e}")

In [ ]:

Put it into a loop 👨‍🍳👩‍🍳¶

Write a simple loop over page in order to put all the jobs into a list named jobs you can add a break statement for the debuging part, it can be long 🤓

In [329]:

Copied!

# boucle de scraping
# boucle de scraping

write job : https://www.welcometothejungle.com/fr/companies/securitesociale/jobs/data-analyst-appui-au-pilotage-f-h_beauvais_LSS_jb46pYN?q=b26759548f42311cc511a99f6b39e87c&o=2290808

1/68write job : https://www.welcometothejungle.com/fr/companies/pwc/jobs/senior-data-analyst-deals-m-a-lyon-cdi-h-f_neuilly-sur-seine?q=0ef9d769ff2a53f24418d1bb9396bc64&o=2255786

2/68

Clean the data¶

Here our mission is simple clean the data as least a little in order to insert them into a PostgresSQL database.

Transform our job list into a pandas Dataframe
Clean the text inside the description column

Write a function called extract_salary_info() who split the baseSalary columns into ['minSalary', 'maxSalary', 'currency', 'salaryUnit']
Extract the name variable inside the hiringOrganization column
Extract the addressLocality variable inside the JobLocation column
Drop the columns ['@context','baseSalary','educationRequirements','experienceRequirements','FAQPage']

In [319]:

Out[319]:

	@context	@type	baseSalary	datePosted	description	employmentType	educationRequirements	experienceRequirements	hiringOrganization	industry	jobLocation	qualifications	title	validThrough	FAQPage
0	http://schema.org	JobPosting	{'@type': 'MonetaryAmount', 'currency': 'EUR',...	2023-12-08T07:47:36.407881Z	<p>Le Data Analyst « Appui au Pilotage » prend...	FULL_TIME	{'@type': 'EducationalOccupationalCredential',...	{'@type': 'OccupationalExperienceRequirements'...	{'@type': 'Organization', 'name': 'La Sécurité...	Administration publique	[{'@type': 'Place', 'address': {'@type': 'Post...	Que ce soit avec SQL, Business Object XI, Exce...	Data Analyst Appui au Pilotage (F/H)	2024-03-07T07:47:36.407Z	[{'@type': 'Question', 'name': 'Le télétravail...
1	http://schema.org	JobPosting	NaN	2023-12-07T20:45:19.106393Z	<p>Vous souhaitez intégrer une équipe pluridis...	INTERN	{'@type': 'EducationalOccupationalCredential',...	{'@type': 'OccupationalExperienceRequirements'...	{'@type': 'Organization', 'name': 'Societe Gen...	Banque, FinTech / InsurTech, Finance	[{'@type': 'Place', 'address': {'@type': 'Post...	Vous préparez un Bac +4/5 en Ecole d'Ingénieur...	Data Analyst	2024-03-06T20:45:19.106Z	[{'@type': 'Question', 'name': 'Le télétravail...
2	http://schema.org	JobPosting	{'@type': 'MonetaryAmount', 'currency': 'EUR',...	2023-12-07T18:31:11.878063Z	<p>Tu seras chargé.e de transformer la donnée ...	INTERN	{'@type': 'EducationalOccupationalCredential',...	{'@type': 'OccupationalExperienceRequirements'...	{'@type': 'Organization', 'name': 'Hello Watt'...	Environnement / Développement durable, Energie...	[{'@type': 'Place', 'address': {'@type': 'Post...	Tu as d’excellentes compétences en Excel, VBA ...	Data Analyst (H/F) - Stage	2024-03-06T18:31:11.878Z	[{'@type': 'Question', 'name': 'L'envoi d'un C...
3	http://schema.org	JobPosting	{'@type': 'MonetaryAmount', 'currency': 'EUR',...	2023-12-07T13:00:47.450767Z	<p>Afin de mieux comprendre nos clients et leu...	INTERN	{'@type': 'EducationalOccupationalCredential',...	{'@type': 'OccupationalExperienceRequirements'...	{'@type': 'Organization', 'name': 'Matera', 's...	SaaS / Cloud Services, Immobilier commercial, ...	[{'@type': 'Place', 'address': {'@type': 'Post...	😀 Idéalement,Tu es en dernière année d’école d...	Data Analyst - Stage de 6 mois	2024-03-06T13:00:47.450Z	[{'@type': 'Question', 'name': 'Le télétravail...
4	http://schema.org	JobPosting	NaN	2023-12-07T11:36:53.671284Z	<p><img loading="lazy" width="22" alt="🤔" src=...	FULL_TIME	{'@type': 'EducationalOccupationalCredential',...	{'@type': 'OccupationalExperienceRequirements'...	{'@type': 'Organization', 'name': 'Carrefour',...	Grande distribution, E-commerce, Grande consom...	[{'@type': 'Place', 'address': {'@type': 'Post...	👥 Profil : De formation BAC+ 3 minimumVous dis...	Data Analyst (F/H)	2024-03-06T11:36:53.671Z	[{'@type': 'Question', 'name': 'Le télétravail...

Databases insertion¶

Our goal in this part is to insert our result data into a postgres database 😎

We will use docker to run our postgres database in a simple way with this command :

docker run --name posttest -d -p 5432:5432 -e POSTGRES_PASSWORD=fred postgres:alpine

You mission is simple : write data to the database in a job_table table !

You can use this sample code to connect your database :

from sqlalchemy import create_engine

# Database credentials
user = 'postgres'
password = 'fred'
host = '0.0.0.0'  # or the IP if your PostgreSQL server is running elsewhere
port = '5432'       # default port for PostgreSQL used by our docker above
db = 'postgres'

# Create the connection
engine = create_engine(f'postgresql://{user}:{password}@{host}:{port}/{db}')

Then write a little script who connect to the database and list all the tables inside then perform a verification query (e.g., selecting the first 5 rows)

In [179]:

Copied!

!pip install psycopg2 sqlalchemy
!pip install psycopg2 sqlalchemy

Collecting psycopg2
  Downloading psycopg2-2.9.9.tar.gz (384 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 384.9/384.9 kB 1.6 MB/s eta 0:00:0000:0100:01
  Preparing metadata (setup.py) ... done
Requirement already satisfied: sqlalchemy in /Users/mac/.pyenv/versions/3.7.0/lib/python3.7/site-packages (1.3.17)
Building wheels for collected packages: psycopg2
  Building wheel for psycopg2 (setup.py) ... done
  Created wheel for psycopg2: filename=psycopg2-2.9.9-cp37-cp37m-macosx_10_15_x86_64.whl size=143078 sha256=9c312631bb53f10c92d5c58c5f3dc9b3f4233fa20f2c5aa3d6b7d0cdcff65b51
  Stored in directory: /Users/mac/Library/Caches/pip/wheels/2e/80/51/f1ee56ddad6078839563bc276734ab2609ba3aaab8aaa942ff
Successfully built psycopg2
Installing collected packages: psycopg2
Successfully installed psycopg2-2.9.9

[notice] A new release of pip is available: 23.0.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip

In [183]:

Copied!

!docker run --name posttest -d -p 5432:5432 -e POSTGRES_PASSWORD=fred postgres:alpine
!docker run --name posttest -d -p 5432:5432 -e POSTGRES_PASSWORD=fred postgres:alpine

114d5b36e4f1667e98d5ec1b36c5541d248465e420ea8047b1824d3eab1d873b

In [225]:

Copied!





from sqlalchemy import create_engine

# Database credentials
user = 'postgres'
password = 'fred'
host = '0.0.0.0'  # or the IP if your PostgreSQL server is running elsewhere
port = '5432'       # default port for PostgreSQL
db = 'postgres'

# Create the connection
engine = create_engine(f'postgresql://{user}:{password}@{host}:{port}/{db}')
from sqlalchemy import create_engine

# Database credentials
user = 'postgres'
password = 'fred'
host = '0.0.0.0'  # or the IP if your PostgreSQL server is running elsewhere
port = '5432'       # default port for PostgreSQL
db = 'postgres'

# Create the connection
engine = create_engine(f'postgresql://{user}:{password}@{host}:{port}/{db}')

In [221]:

Copied!

jobs_df
jobs_df

Out[221]:

	@type	datePosted	description	employmentType	hiringOrganization	industry	jobLocation	qualifications	title	validThrough	minSalary	maxSalary	currency	salaryUnit
0	JobPosting	2023-12-08T07:47:36.407881Z	Le Data Analyst « Appui au Pilotage » prendra ...	FULL_TIME	La Sécurité Sociale	Administration publique	Beauvais	Que ce soit avec SQL, Business Object XI, Exce...	Data Analyst Appui au Pilotage (F/H)	2024-03-07T07:47:36.407Z	38000.0	45000.0	EUR	YEARLY
1	JobPosting	2023-12-07T20:45:19.106393Z	Vous souhaitez intégrer une équipe pluridiscip...	INTERN	Societe Generale	Banque, FinTech / InsurTech, Finance	Fontenay-Sous-Bois	Vous préparez un Bac +4/5 en Ecole d'Ingénieur...	Data Analyst	2024-03-06T20:45:19.106Z	NaN	NaN	None	None
2	JobPosting	2023-12-07T18:31:11.878063Z	Tu seras chargé.e de transformer la donnée uti...	INTERN	Hello Watt	Environnement / Développement durable, Energie...	Paris	Tu as d’excellentes compétences en Excel, VBA ...	Data Analyst (H/F) - Stage	2024-03-06T18:31:11.878Z	1000.0	1600.0	EUR	MONTHLY
3	JobPosting	2023-12-07T13:00:47.450767Z	Afin de mieux comprendre nos clients et leurs ...	INTERN	Matera	SaaS / Cloud Services, Immobilier commercial, ...	Paris	😀 Idéalement,Tu es en dernière année d’école d...	Data Analyst - Stage de 6 mois	2024-03-06T13:00:47.450Z	1000.0	1200.0	EUR	NONE
4	JobPosting	2023-12-07T11:36:53.671284Z	Le saviez-vous ? : Nous rejoindre, c'est rejoi...	FULL_TIME	Carrefour	Grande distribution, E-commerce, Grande consom...	Mondeville	👥 Profil : De formation BAC+ 3 minimumVous dis...	Data Analyst (F/H)	2024-03-06T11:36:53.671Z	NaN	NaN	None	None

Verification

In [231]:

Copied!

import psycopg2

#connect to the PostgreSQL database
conn = psycopg2.connect(dbname='postgres', user=user, password=password, host=host, port=port)

#create a cursor object
cursor = conn.cursor()
import psycopg2

#connect to the PostgreSQL database
conn = psycopg2.connect(dbname='postgres', user=user, password=password, host=host, port=port)

#create a cursor object
cursor = conn.cursor()

Tables in the database:
('job_table',)
('postgres',)

Table 'job_table' exists.

First 5 rows of 'job_table':
        @type                   datePosted  \
0  JobPosting  2023-12-08T07:47:36.407881Z   
1  JobPosting  2023-12-07T20:45:19.106393Z   
2  JobPosting  2023-12-07T18:31:11.878063Z   
3  JobPosting  2023-12-07T13:00:47.450767Z   
4  JobPosting  2023-12-07T11:36:53.671284Z   

                                         description employmentType  \
0  Le Data Analyst « Appui au Pilotage » prendra ...      FULL_TIME   
1  Vous souhaitez intégrer une équipe pluridiscip...         INTERN   
2  Tu seras chargé.e de transformer la donnée uti...         INTERN   
3  Afin de mieux comprendre nos clients et leurs ...         INTERN   
4  Le saviez-vous ? : Nous rejoindre, c'est rejoi...      FULL_TIME   

    hiringOrganization                                           industry  \
0  La Sécurité Sociale                            Administration publique   
1     Societe Generale               Banque, FinTech / InsurTech, Finance   
2           Hello Watt  Environnement / Développement durable, Energie...   
3               Matera  SaaS / Cloud Services, Immobilier commercial, ...   
4            Carrefour  Grande distribution, E-commerce, Grande consom...   

          jobLocation                                     qualifications  \
0            Beauvais  Que ce soit avec SQL, Business Object XI, Exce...   
1  Fontenay-Sous-Bois  Vous préparez un Bac +4/5 en Ecole d'Ingénieur...   
2               Paris  Tu as d’excellentes compétences en Excel, VBA ...   
3               Paris  😀 Idéalement,Tu es en dernière année d’école d...   
4          Mondeville  👥 Profil : De formation BAC+ 3 minimumVous dis...   

                                  title              validThrough  minSalary  \
0  Data Analyst Appui au Pilotage (F/H)  2024-03-07T07:47:36.407Z    38000.0   
1                          Data Analyst  2024-03-06T20:45:19.106Z        NaN   
2            Data Analyst (H/F) - Stage  2024-03-06T18:31:11.878Z     1000.0   
3        Data Analyst - Stage de 6 mois  2024-03-06T13:00:47.450Z     1000.0   
4                   Data Analyst (F/H)   2024-03-06T11:36:53.671Z        NaN   

   maxSalary currency salaryUnit  
0    45000.0      EUR     YEARLY  
1        NaN     None       None  
2     1600.0      EUR    MONTHLY  
3     1200.0      EUR       NONE  
4        NaN     None       None

MongoDB¶

Same mission with mongo :

docker run -d --name example-mongo -p 27017:27017 mongo

Connect the mongo database and do a dummy query like find the number of documents where the currency is 'EUR'

In [217]:

Copied!

#!pip install pymongo
#!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.6.1.tar.gz (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 1.5 MB/s eta 0:00:00a 0:00:010m
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting dnspython<3.0.0,>=1.16.0
  Downloading dnspython-2.3.0-py3-none-any.whl (283 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 283.7/283.7 kB 4.3 MB/s eta 0:00:0000:01
Building wheels for collected packages: pymongo
  Building wheel for pymongo (pyproject.toml) ... done
  Created wheel for pymongo: filename=pymongo-4.6.1-cp37-cp37m-macosx_10_15_x86_64.whl size=476397 sha256=4cc192763ce1f76a4f62b2c9b2f24d4841753fe967d1d496d71855531db33f7b
  Stored in directory: /Users/mac/Library/Caches/pip/wheels/4b/ea/fc/232ddbfbc8e6df7a8db6bfe11167efdde03604c9be02bc3527
Successfully built pymongo
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.3.0 pymongo-4.6.1

[notice] A new release of pip is available: 23.0.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip

In [ ]:

Copied!

!docker run -d --name example-mongo -p 27017:27017 mongo
!docker run -d --name example-mongo -p 27017:27017 mongo

In [234]:

Copied!

from pymongo import MongoClient
from pymongo import MongoClient

In [ ]:

Selenium like a Ninja¶

Our goal 🚀¶

Custom Selenium driver¶

Get the main page¶

Get the number of offers per page¶

Click and getText functions¶

Put it into a loop 👨‍🍳👩‍🍳¶

Clean the data¶

Databases insertion¶

MongoDB¶

`Click` and `getText` functions¶