Machine Learning with OpenCV¶

Computer Vision and Machine Learning are closely related fields in the domain of artificial intelligence (AI) that often intersect in various applications, but they also have distinct characteristics and focal points. Here's an overview to help clarify their similarities, key common points, and differences. 😎

Similarities and Key Common Points¶

Data-Driven Approach: Both fields heavily rely on data to learn patterns or characteristics. Machine learning models improve their performance by being trained on large datasets, and computer vision systems utilize machine learning algorithms to understand and interpret visual data.
AI Subfields: Both are considered subfields of artificial intelligence. Machine learning provides the foundation for computers to learn from data and make decisions, while computer vision specifically focuses on enabling machines to interpret and understand visual information from the world.
Feature Extraction and Pattern Recognition: A critical aspect of both fields is the ability to identify patterns. In machine learning, this could be recognizing patterns in customer behavior or financial trends. In computer vision, it might involve identifying shapes, edges, or objects within an image.
Use of Neural Networks: Deep learning, a subset of machine learning, is particularly prevalent in both areas. Convolutional Neural Networks (CNNs), a type of deep learning model, are extensively used in computer vision for tasks like image classification, object detection, and more.

Differences¶

Scope and Application: Machine learning is a broader field that encompasses a wide range of data types and learning tasks (e.g., regression, classification, clustering). Computer vision is more specialized, focusing solely on the processing and analysis of visual information.

Data Type: The primary difference lies in the type of data they deal with. Computer vision specifically works with visual data (images and videos), while machine learning can work with a variety of data types, including numbers, text, images, and more.
Challenges and Techniques: The challenges faced in computer vision often revolve around interpreting complex visual data under varying conditions (e.g., different lighting, occlusions, perspectives). This requires specific techniques such as image segmentation, object detection, and image generation. Machine learning, on the other hand, deals with a broader set of challenges like overfitting, underfitting, and feature selection, applicable across different types of data.
Interdisciplinary Nature: While both fields are interdisciplinary, computer vision often intersects more with optics, signal processing, and geometry, due to its focus on visual data. Machine learning intersects with statistics, probability, and computer science, given its broad application across various types of data and tasks.

In essence, while computer vision and machine learning share foundational principles and techniques, especially through the application of deep learning algorithms, they diverge in their focus, challenges, and applications. Computer vision can be viewed as an application of machine learning with a specific emphasis on visual data, embodying unique challenges that require specialized solutions.

In [2]:

Copied!





import cv2
import matplotlib.pyplot as plt
import numpy as np 

def display_image_in_notebook(img):
    # Convert the image from BGR to RGB
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # Display the image using matplotlib
    plt.imshow(img_rgb)
    #if you want to remove the graduation on the axis
    #plt.axis('off')
    plt.show()

# Load our test image
img = cv2.imread('./data/frame_1.png', 1)

# Display the image in Jupyter notebook
display_image_in_notebook(img)
import cv2
import matplotlib.pyplot as plt
import numpy as np 

def display_image_in_notebook(img):
    # Convert the image from BGR to RGB
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # Display the image using matplotlib
    plt.imshow(img_rgb)
    #if you want to remove the graduation on the axis
    #plt.axis('off')
    plt.show()

# Load our test image
img = cv2.imread('./data/frame_1.png', 1)

# Display the image in Jupyter notebook
display_image_in_notebook(img)

No description has been provided for this image

Our mission¶

Let's code a script in order to analyze the vibration of Arabidopsis stems through mp4 videos. For this we will need some things to do, first identifies and tracks the red marker 🟥 on the top of the stem against a black background. This tracking facilitates the generation of a one-dimensional vibration waveform, based on the marker's coordinates.

Subsequently, the damped natural frequency (denoted as ωd) is calculated utilizing the Fast Fourier Transform algorithm. Our script will produces outputs which lists the calculated ωd values, alongside some graphical representations, including the raw vibration waveform.

This script was developed for my best friend Felix Barbut Phd in the field of plant biology in the Umeå Plant Science Centre 🎓

All the functions I used for this notebook are available in this gist here

Tracking the red marker 🕵🏽¶

Let's code a opencv function where the goal is to locate the centroid of red marker 🟥. This function is particularly useful in computer vision applications requiring color-based object tracking or detection. Let's dissect this function to understand its components and how it accomplishes its goal.

Color Space Conversion¶

Initially, our function will converts the color space of the input frame from BGR (Blue, Green, Red) to HSV (Hue, Saturation, Value) using the OpenCV library. HSV is preferred for color-based filtering tasks because it separates color information (hue) from lighting conditions (saturation and value), making it easier to identify colors under varying lighting.

Defining Red Color Range in HSV¶

Due to the circular nature of the hue channel in the HSV color space, red color appears at both ends of the spectrum. Therefore, the function defines two ranges for red: one near the beginning (0-10) and one near the end (170-180) of the hue channel. This ensures that all shades of red, from bright to dark, are accounted for, regardless of lighting conditions.

Creating a Mask for Red Objects¶

Using the defined red color ranges, the function creates two masks that identify the red areas within the frame and then combines these masks using a bitwise OR operation. This results in a single mask that highlights all the red objects in the frame.

We will use the cv2.inRange and cv2.bitwise_or

Optional Image Preprocessing¶

If the use_preprocessing flag is set to True, the function applies morphological operations (dilation followed by erosion) to the mask. These operations help in refining the mask by closing small holes within detected objects and separating objects that are close to each other, enhancing the mask's quality for better centroid calculation.

Noise Reduction¶

Let's add a noise reduction with cv2.medianBlur, the function further processes the mask with a median blur, a technique effective at reducing salt-and-pepper noise. This step smooths the edges of the detected red areas, improving the accuracy of the centroid calculation.

Computing the Centroid¶

Our function calculates the moments of the mask, which are statistical measures used to describe the shape of an object. From these moments, it computes the area of the detected red object(s) and their centroid. If no red object is detected (i.e., the area is zero), the function returns None, indicating the absence of red in the frame. Otherwise, it calculates and returns the coordinates of the centroid (cx, cy), representing the center of the detected red area(s).

In [8]:

Copied!





def get_red_centroid(frame, use_preprocessing=False, use_noise_reduction=False):
    # Convert the frame to HSV
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    
    # define the lower and upper bounds of the "red" color in the HSV color space
    # Note: the Hue value for red can be near 0 or near 180 in the HSV color space, 
    # so we need to define two ranges and combine them
    lower_red_1 = np.array([0, 100, 100])
    upper_red_1 = np.array([10, 255, 255])
    lower_red_2 = np.array([170, 100, 100])
    upper_red_2 = np.array([180, 255, 255])

    # create a mask for the red color
    mask_1 = cv2.inRange(hsv, lower_red_1, upper_red_1)
    mask_2 = cv2.inRange(hsv, lower_red_2, upper_red_2)
    mask = cv2.bitwise_or(mask_1, mask_2)
    
    # apply image preprocessing
    if use_preprocessing:
        kernel = np.ones((5,5),np.uint8)
        mask = cv2.dilate(mask,kernel,iterations = 1)
        mask = cv2.erode(mask,kernel,iterations = 1)

    # apply noise reduction
    if use_noise_reduction:
        mask = cv2.medianBlur(mask, 5)
    
    # compute the moments of the mask image
    moments = cv2.moments(mask)
    area = moments['m00']

    # if the area is zero, return None (no red detected)
    if area == 0:
        return None

    # compute the centroid
    cx = int(moments['m10'] / area)
    cy = int(moments['m01'] / area)

    return (cx, cy)
def get_red_centroid(frame, use_preprocessing=False, use_noise_reduction=False):
    # Convert the frame to HSV
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    
    # define the lower and upper bounds of the "red" color in the HSV color space
    # Note: the Hue value for red can be near 0 or near 180 in the HSV color space, 
    # so we need to define two ranges and combine them
    lower_red_1 = np.array([0, 100, 100])
    upper_red_1 = np.array([10, 255, 255])
    lower_red_2 = np.array([170, 100, 100])
    upper_red_2 = np.array([180, 255, 255])

    # create a mask for the red color
    mask_1 = cv2.inRange(hsv, lower_red_1, upper_red_1)
    mask_2 = cv2.inRange(hsv, lower_red_2, upper_red_2)
    mask = cv2.bitwise_or(mask_1, mask_2)
    
    # apply image preprocessing
    if use_preprocessing:
        kernel = np.ones((5,5),np.uint8)
        mask = cv2.dilate(mask,kernel,iterations = 1)
        mask = cv2.erode(mask,kernel,iterations = 1)

    # apply noise reduction
    if use_noise_reduction:
        mask = cv2.medianBlur(mask, 5)
    
    # compute the moments of the mask image
    moments = cv2.moments(mask)
    area = moments['m00']

    # if the area is zero, return None (no red detected)
    if area == 0:
        return None

    # compute the centroid
    cx = int(moments['m10'] / area)
    cy = int(moments['m01'] / area)

    return (cx, cy)

Plot the displacement data¶

In [9]:

Copied!





from stemvib.utils import *
frames = extract_frames('./stemvib/Video_5.MP4')
detected_frames = 0
results = []
for frame in frames:
    centroids = get_red_centroid(frame)
    if centroids:
        results.append(centroids)
    detected_frames+=1
    #save_frames(marked_frame,'./test_Col0_3a_trim/threshold_aug')
displacement = centroids_to_displacements(results)
plot_displacement(displacement,'Video TEST 5')
from stemvib.utils import *
frames = extract_frames('./stemvib/Video_5.MP4')
detected_frames = 0
results = []
for frame in frames:
    centroids = get_red_centroid(frame)
    if centroids:
        results.append(centroids)
    detected_frames+=1
    #save_frames(marked_frame,'./test_Col0_3a_trim/threshold_aug')
displacement = centroids_to_displacements(results)
plot_displacement(displacement,'Video TEST 5')

Compute the damped natural frequency `ωd`¶

You can see the beautiful sinusoid above of the stem vibration, now we can proceed to computing the damped natural frequency (denoted as ωd) with the Fast Fourier Transform algorithm from scipy

In [10]:

Copied!





dif_array1, start_point1 = displacement_to_difference(displacement)
hanning_array1 = transform_hanning(displacement, start_point1)
freqs1, power1, major_freq1 = displacement_to_major_freq(hanning_array1)
major_freq1
dif_array1, start_point1 = displacement_to_difference(displacement)
hanning_array1 = transform_hanning(displacement, start_point1)
freqs1, power1, major_freq1 = displacement_to_major_freq(hanning_array1)
major_freq1

Out[10]:

array([8.3449235])

For more information about the functions go to the gist in the on the top of the file. Let's give a word about all these functions above 🤓

displacement_to_difference : convert displacement data
transform_hanning : compute the hanning window of the signal
displacement_to_major_freq : compute the Hz frequency from the smoothed signal

Wrap it up into a loop¶

In order to process multiple video in a folder let's wrap our code in a function called process_folder_videos(folder_path) and dump all the data in a pandas.DataFrame because as data lovers we all appreciate the flexibility of you're good old dataFrame friend.

In [ ]:

Copied!





def process_folder_videos(folder_path):
    # Initialize dataframe
    columns = ['title', 'len_frames', 'detected', 'major_freq1', 'freqs1', 'power1']
    df = pd.DataFrame(columns=columns)

    # Check if folder exists
    if not os.path.isdir(folder_path):
        print(f'Folder {folder_path} does not exist')
        return df

    # Iterate through all files in the folder
    for filename in os.listdir(folder_path):
        if filename.endswith(".avi") or filename.endswith(".mp4"): # Add or modify the file extensions that you're interested in
            print(f'\n--- Processing Video {filename}---\n')
            filepath = os.path.join(folder_path, filename)
            
            # Split the filename and the file extension
            filebase, fileext = os.path.splitext(filename)

            # Create output filename
            output_filename = filebase + "_output" + fileext
            output_filepath = os.path.join(folder_path, output_filename)

            try:
                # Process the video and get results
                major_freq1 = process_track_compute(filepath, output_filepath)

                # Add the results to the dataframe
                df = df.append({'title': filename, 
                                'len_frames': major_freq1[4], 
                                'detected': major_freq1[3], 
                                'major_freq1': major_freq1[0], 
                                'freqs1': major_freq1[1], 
                                'power1': major_freq1[2]}, ignore_index=True)
            except Exception as e:
                print(f'Error processing file {filename}: {e}')
                
    return df
def process_folder_videos(folder_path):
    # Initialize dataframe
    columns = ['title', 'len_frames', 'detected', 'major_freq1', 'freqs1', 'power1']
    df = pd.DataFrame(columns=columns)

    # Check if folder exists
    if not os.path.isdir(folder_path):
        print(f'Folder {folder_path} does not exist')
        return df

    # Iterate through all files in the folder
    for filename in os.listdir(folder_path):
        if filename.endswith(".avi") or filename.endswith(".mp4"): # Add or modify the file extensions that you're interested in
            print(f'\n--- Processing Video {filename}---\n')
            filepath = os.path.join(folder_path, filename)
            
            # Split the filename and the file extension
            filebase, fileext = os.path.splitext(filename)

            # Create output filename
            output_filename = filebase + "_output" + fileext
            output_filepath = os.path.join(folder_path, output_filename)

            try:
                # Process the video and get results
                major_freq1 = process_track_compute(filepath, output_filepath)

                # Add the results to the dataframe
                df = df.append({'title': filename, 
                                'len_frames': major_freq1[4], 
                                'detected': major_freq1[3], 
                                'major_freq1': major_freq1[0], 
                                'freqs1': major_freq1[1], 
                                'power1': major_freq1[2]}, ignore_index=True)
            except Exception as e:
                print(f'Error processing file {filename}: {e}')
                
    return df

Add ML tracking¶

In this section we will see how to increase our detected frames ration (aka the images with the red marker on it) in order to plot the stem displacement in the better way possible. To do this we will explore only one otpion which is the DBScan algorithm 😎

If you do not know how the DBScan algorithm works, you can check the article about it in the ML section of the course

In [ ]:

Copied!





def compute_centroid_with_dbscan(keypoints, prev_centroid=None, max_distance=3, eps=3, min_samples=2):
    # Prepare data
    points = np.array([kp.pt for kp in keypoints])

    # Check if keypoints are available
    if len(points) == 0:
        print("No keypoints detected. Returning None as the centroid.")
        return None

    # If previous centroid is provided and max_distance is specified,
    # only keep points within max_distance of prev_centroid
    if prev_centroid is not None and max_distance is not None:
        distances = np.sqrt((points[:, 0] - prev_centroid[0])**2 + (points[:, 1] - prev_centroid[1])**2)
        points = points[distances <= max_distance]

    # Apply DBSCAN clustering
    clustering = DBSCAN(eps=eps, min_samples=min_samples).fit(points)

    # Find the label of the largest cluster
    labels, counts = np.unique(clustering.labels_, return_counts=True)
    largest_cluster_label = labels[np.argmax(counts)]

    # Calculate the centroid of the largest cluster
    largest_cluster_points = points[clustering.labels_ == largest_cluster_label]
    centroid = np.mean(largest_cluster_points, axis=0)

    return centroid
def compute_centroid_with_dbscan(keypoints, prev_centroid=None, max_distance=3, eps=3, min_samples=2):
    # Prepare data
    points = np.array([kp.pt for kp in keypoints])

    # Check if keypoints are available
    if len(points) == 0:
        print("No keypoints detected. Returning None as the centroid.")
        return None

    # If previous centroid is provided and max_distance is specified,
    # only keep points within max_distance of prev_centroid
    if prev_centroid is not None and max_distance is not None:
        distances = np.sqrt((points[:, 0] - prev_centroid[0])**2 + (points[:, 1] - prev_centroid[1])**2)
        points = points[distances <= max_distance]

    # Apply DBSCAN clustering
    clustering = DBSCAN(eps=eps, min_samples=min_samples).fit(points)

    # Find the label of the largest cluster
    labels, counts = np.unique(clustering.labels_, return_counts=True)
    largest_cluster_label = labels[np.argmax(counts)]

    # Calculate the centroid of the largest cluster
    largest_cluster_points = points[clustering.labels_ == largest_cluster_label]
    centroid = np.mean(largest_cluster_points, axis=0)

    return centroid

Now let's add our compute_centroid_with_dbscan() function into our video processing function, because we have a lot of imagination as engineer (you know it if you are here lol) let's name it process_video_ml(video_path, output_folder, tolerance=0.5, max_distance=3) like the other video processing function we have seen 🤓

In [ ]:

Copied!





def process_video_ml(video_path, output_folder, tolerance=0.5, max_distance=3):
    # Open the video
    cap = cv2.VideoCapture(video_path)
    
    # Check if video opened successfully
    if not cap.isOpened():
        print("Error: Could not open video.")
        return
    
    # Frame number counter
    frame_num = 0
    
    # List to store the processed frames
    processed_frames = []
    centroids = []

    # Variables to store the previous frame and centroid
    prev_frame = None
    prev_centroid = None
    
    # Process the video frame by frame
    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()
        
        # Break the loop if we reach the end of the video
        if not ret:
            break
        
        # Apply the get_red_centroid_threshold function to get the centroid
        centroid, marked_frame = get_red_centroid_threshold(frame, tolerance, prev_centroid)
        
        #add ml to reduce noise 
        if frame_num != 0:
            centroid = compute_centroid_with_dbscan(keypoints, prev_centroid, max_distance)

        # If the centroid was not found, and we have a previous centroid, apply ORB technique
        if not centroid and prev_centroid is not None:
            # Get features near the previous centroid
            keypoints, descriptors = get_features_near_centroid(prev_frame, prev_centroid, max_distance)
            
            # Compute the centroid from keypoints
            centroid = compute_centroid_of_keypoints(keypoints)
            
            # Draw a circle at the new centroid if it was found
            if centroid:
                centroids.append(centroid)
                marked_frame = cv2.circle(marked_frame, (int(centroid[0]), int(centroid[1])), 15, (0, 255, 0), 3)
                
        # Append the processed frame to the list
        processed_frames.append(marked_frame)
        
        # Update the previous frame and centroid
        prev_frame = frame
        prev_centroid = centroid
        
        # Increment frame number
        frame_num += 1
    
    # Release the video capture object
    cap.release()
    
    # Save the processed frames to the output folder
    save_frames(processed_frames, output_folder)
    
    return centroids
def process_video_ml(video_path, output_folder, tolerance=0.5, max_distance=3):
    # Open the video
    cap = cv2.VideoCapture(video_path)
    
    # Check if video opened successfully
    if not cap.isOpened():
        print("Error: Could not open video.")
        return
    
    # Frame number counter
    frame_num = 0
    
    # List to store the processed frames
    processed_frames = []
    centroids = []

    # Variables to store the previous frame and centroid
    prev_frame = None
    prev_centroid = None
    
    # Process the video frame by frame
    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()
        
        # Break the loop if we reach the end of the video
        if not ret:
            break
        
        # Apply the get_red_centroid_threshold function to get the centroid
        centroid, marked_frame = get_red_centroid_threshold(frame, tolerance, prev_centroid)
        
        #add ml to reduce noise 
        if frame_num != 0:
            centroid = compute_centroid_with_dbscan(keypoints, prev_centroid, max_distance)

        # If the centroid was not found, and we have a previous centroid, apply ORB technique
        if not centroid and prev_centroid is not None:
            # Get features near the previous centroid
            keypoints, descriptors = get_features_near_centroid(prev_frame, prev_centroid, max_distance)
            
            # Compute the centroid from keypoints
            centroid = compute_centroid_of_keypoints(keypoints)
            
            # Draw a circle at the new centroid if it was found
            if centroid:
                centroids.append(centroid)
                marked_frame = cv2.circle(marked_frame, (int(centroid[0]), int(centroid[1])), 15, (0, 255, 0), 3)
                
        # Append the processed frame to the list
        processed_frames.append(marked_frame)
        
        # Update the previous frame and centroid
        prev_frame = frame
        prev_centroid = centroid
        
        # Increment frame number
        frame_num += 1
    
    # Release the video capture object
    cap.release()
    
    # Save the processed frames to the output folder
    save_frames(processed_frames, output_folder)
    
    return centroids

And that's it ! We have code a script who detect the vibration of a stem thanks to openCV and compute the displacement and the frequency, hope you have learn a thing or two 😎

Machine Learning with OpenCV¶

Similarities and Key Common Points¶

Differences¶

Our mission¶

Tracking the red marker 🕵🏽¶

Color Space Conversion¶

Defining Red Color Range in HSV¶

Creating a Mask for Red Objects¶

Optional Image Preprocessing¶

Noise Reduction¶

Computing the Centroid¶

Plot the displacement data¶

Compute the damped natural frequency ωd¶

Wrap it up into a loop¶

Add ML tracking¶

Compute the damped natural frequency `ωd`¶