Computer vision deployment

Deployment of a machine learning (ML) algorithm refers to the process of making an ML model available for use in a production environment where it can make predictions or decisions based on new data. This involves several steps and considerations to ensure the model operates efficiently, reliably, and securely.

For our part in this tutorial we will focusing on a simple use case with FastAPI and not dealing with session, tokens, load balancing, SLA... and other complex things involving a massive deployment 😅

Infrastructure Setup / Integration

Setting up the infrastructure to host the model is one of the main task to do as you may know. This often include servers or cloud services. Hosting the model on a dedicated server or using cloud services like AWS, Google Cloud, or Azure or custom on premise VM.

This process 'always' use containerization technologies like Docker to create portable, consistent environments in order to pass code through servers 🐳

But you can also integrating the model into the application or system where it will be used like creating RESTful APIs or other interfaces through which the model can be accessed like we will see in this tutorial. Or you can also deploy the model as a microservice that other parts of the application can call.

FastAPI

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python based on standard Python type hints.

We will use the FastAPI framework to deploy our yolo model because it is fast and it's a light API 😂

![](https://www.quantmetry.com/wp-content/uploads/2020/11/schema-api-1024x562.png)

Run the installation guide create a file main.py file and run it according to the documentation then go to the http://127.0.0.1:8000/docs to see your app running !

Refactor face detection in API

Our mission now is to write a route to call our face detection model model prediction function 😎. First we have to set up our env and install the required libraries here :

requirements.txt

deepface
fastapi
uvicorn
numpy
opencv-python

Then we just have to refactor our script in the facial emotion dection tp into a fastapi POST endpoint here :

main.py

from fastapi import FastAPI, File, UploadFile
import cv2
from deepface import DeepFace
import numpy as np
from typing import List

app = FastAPI()

# Load face cascade classifier
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

@app.post("/detect-emotions/")
async def detect_emotions(file: UploadFile = File(...)):
    # Read image file
    contents = await file.read()
    np_array = np.frombuffer(contents, np.uint8)
    frame = cv2.imdecode(np_array, cv2.IMREAD_COLOR)

    # Convert frame to grayscale
    gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Convert grayscale frame to RGB format
    rgb_frame = cv2.cvtColor(gray_frame, cv2.COLOR_GRAY2RGB)

    # Detect faces in the frame
    faces = face_cascade.detectMultiScale(gray_frame, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

    results = []
    for (x, y, w, h) in faces:
        # Extract the face ROI (Region of Interest)
        face_roi = rgb_frame[y:y + h, x:x + w]

        # Perform emotion analysis on the face ROI
        result = DeepFace.analyze(face_roi, actions=['emotion'], enforce_detection=False)

        # Determine the dominant emotion
        emotion = result[0]['dominant_emotion']

        # Append the result to the list
        results.append({
            "box": {"x": int(x), "y": int(y), "width": int(w), "height": int(h)},
            "emotion": emotion
        })

        # Draw rectangle around face and label with predicted emotion (optional, for visualization)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
        cv2.putText(frame, emotion, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)

    # Convert frame back to image format to send as a response (optional, for visualization)
    _, img_encoded = cv2.imencode('.jpg', frame)
    img_bytes = img_encoded.tobytes()

    return {
        "faces": results,
        "image": img_bytes.hex()  # Convert bytes to hex string to send as JSON
    }

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Then you can run this in your terminal with the command below :

python main.py

You should see this kind of output on the server side with a 200 HTTP code indicating that everything is good :

INFO:     Started server process [413422]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:39878 - "GET / HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:39878 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:39878 - "GET / HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:39878 - "GET /docs HTTP/1.1" 200 OK
INFO:     127.0.0.1:39878 - "GET /openapi.json HTTP/1.1" 200 OK
2024-06-02 21:31:59.987528: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-02 21:31:59.988998: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-02 21:32:00.265667: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
24-06-02 21:32:00 - facial_expression_model_weights.h5 will be downloaded...
Downloading...
From: https://github.com/serengil/deepface_models/releases/download/v1.0/facial_expression_model_weights.h5
To: /home/benjamin/.deepface/weights/facial_expression_model_weights.h5
100%|█████████████████████████████████████████████████████████████████████████| 5.98M/5.98M [00:00<00:00, 110MB/s]
INFO:     127.0.0.1:47074 - "POST /detect-emotions/ HTTP/1.1" 200 OK

More deployment with FastAPI

Docker

Now it is time to package our app with docker in order to deploy it easily on any on-premise or cloud provider servers 🚀

Below is the Dockerfile to containerize the FastAPI application based on the Python 3.11-slim image. This Dockerfile installs the necessary dependencies and sets up the application to run in a Docker container.

# Use the official Python 3.11 slim image
FROM python:3.11-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements.txt file
COPY requirements.txt .

# Install the dependencies
RUN apt-get update && apt-get install -y \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender-dev \
    && pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Expose the port the app runs on
EXPOSE 8000

# Command to run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Then build the Docker image: Run the following command in the terminal:

docker build -t fastapi-emotion-detection .

And finally run the Docker container: After the image is built, you can run it using:

docker run -p 8000:8000 fastapi-emotion-detection

Scalability

The scalability part is just making sure the deployment can handle varying loads.

Note

Your scalabilities skills will often depend of the context of your work, for example if you work at Netflix it will be very different than a random companies with few clients 🤓

But basically this is the two thing you have to know if you are dealing with global deployment :

Load Balancing: Distributing incoming requests across multiple instances of the model to handle high traffic.
Auto-scaling: Automatically adjusting the number of instances based on demand.

That's why many people use some end-to-end platform for deploying machine learning models (like robowflow we've seen before, if you pay you can host and run your models on their interface) is often easier and more efficient, especially for large-scale deployments, because these platforms offer integrated solutions that handle many aspects of the deployment process.