Machine Learning with OpenCV¶
Computer Vision and Machine Learning are closely related fields in the domain of artificial intelligence (AI) that often intersect in various applications, but they also have distinct characteristics and focal points. Here's an overview to help clarify their similarities, key common points, and differences. 😎
Similarities and Key Common Points¶
Data-Driven Approach: Both fields heavily rely on data to learn patterns or characteristics. Machine learning models improve their performance by being trained on large datasets, and computer vision systems utilize machine learning algorithms to understand and interpret visual data.
AI Subfields: Both are considered subfields of artificial intelligence. Machine learning provides the foundation for computers to learn from data and make decisions, while computer vision specifically focuses on enabling machines to interpret and understand visual information from the world.
Feature Extraction and Pattern Recognition: A critical aspect of both fields is the ability to identify patterns. In machine learning, this could be recognizing patterns in customer behavior or financial trends. In computer vision, it might involve identifying shapes, edges, or objects within an image.
Use of Neural Networks: Deep learning, a subset of machine learning, is particularly prevalent in both areas. Convolutional Neural Networks (CNNs), a type of deep learning model, are extensively used in computer vision for tasks like image classification, object detection, and more.
Differences¶
Scope and Application: Machine learning is a broader field that encompasses a wide range of data types and learning tasks (e.g., regression, classification, clustering). Computer vision is more specialized, focusing solely on the processing and analysis of visual information.
- Data Type: The primary difference lies in the type of data they deal with. Computer vision specifically works with visual data (images and videos), while machine learning can work with a variety of data types, including numbers, text, images, and more.
- Challenges and Techniques: The challenges faced in computer vision often revolve around interpreting complex visual data under varying conditions (e.g., different lighting, occlusions, perspectives). This requires specific techniques such as image segmentation, object detection, and image generation. Machine learning, on the other hand, deals with a broader set of challenges like overfitting, underfitting, and feature selection, applicable across different types of data.
- Interdisciplinary Nature: While both fields are interdisciplinary, computer vision often intersects more with optics, signal processing, and geometry, due to its focus on visual data. Machine learning intersects with statistics, probability, and computer science, given its broad application across various types of data and tasks.
In essence, while computer vision and machine learning share foundational principles and techniques, especially through the application of deep learning algorithms, they diverge in their focus, challenges, and applications. Computer vision can be viewed as an application of machine learning with a specific emphasis on visual data, embodying unique challenges that require specialized solutions.
import cv2
import matplotlib.pyplot as plt
import numpy as np
def display_image_in_notebook(img):
# Convert the image from BGR to RGB
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Display the image using matplotlib
plt.imshow(img_rgb)
#if you want to remove the graduation on the axis
#plt.axis('off')
plt.show()
# Load our test image
img = cv2.imread('./data/frame_1.png', 1)
# Display the image in Jupyter notebook
display_image_in_notebook(img)
Our mission¶
Let's code a script in order to analyze the vibration of Arabidopsis stems through mp4 videos. For this we will need some things to do, first identifies and tracks the red marker 🟥 on the top of the stem against a black background. This tracking facilitates the generation of a one-dimensional vibration waveform, based on the marker's coordinates.
Subsequently, the damped natural frequency (denoted as ωd
) is calculated utilizing the Fast Fourier Transform algorithm. Our script will produces outputs which lists the calculated ωd
values, alongside some graphical representations, including the raw vibration waveform.
This script was developed for my best friend Felix Barbut Phd in the field of plant biology in the Umeå Plant Science Centre 🎓
All the functions I used for this notebook are available in this gist here
Tracking the red marker 🕵🏽¶
Let's code a opencv function where the goal is to locate the centroid of red marker 🟥. This function is particularly useful in computer vision applications requiring color-based object tracking or detection. Let's dissect this function to understand its components and how it accomplishes its goal.
Color Space Conversion¶
Initially, our function will converts the color space of the input frame from BGR (Blue, Green, Red) to HSV (Hue, Saturation, Value) using the OpenCV library. HSV is preferred for color-based filtering tasks because it separates color information (hue) from lighting conditions (saturation and value), making it easier to identify colors under varying lighting.
Defining Red Color Range in HSV¶
Due to the circular nature of the hue channel in the HSV color space, red color appears at both ends of the spectrum. Therefore, the function defines two ranges for red: one near the beginning (0-10) and one near the end (170-180) of the hue channel. This ensures that all shades of red, from bright to dark, are accounted for, regardless of lighting conditions.
Creating a Mask for Red Objects¶
Using the defined red color ranges, the function creates two masks that identify the red areas within the frame and then combines these masks using a bitwise OR operation. This results in a single mask that highlights all the red objects in the frame.
We will use the
cv2.inRange
andcv2.bitwise_or
Optional Image Preprocessing¶
If the use_preprocessing flag is set to True, the function applies morphological operations (dilation followed by erosion) to the mask. These operations help in refining the mask by closing small holes within detected objects and separating objects that are close to each other, enhancing the mask's quality for better centroid calculation.
Noise Reduction¶
Let's add a noise reduction with cv2.medianBlur
, the function further processes the mask with a median blur, a technique effective at reducing salt-and-pepper noise. This step smooths the edges of the detected red areas, improving the accuracy of the centroid calculation.
Computing the Centroid¶
Our function calculates the moments of the mask, which are statistical measures used to describe the shape of an object. From these moments, it computes the area of the detected red object(s) and their centroid. If no red object is detected (i.e., the area is zero), the function returns None, indicating the absence of red in the frame. Otherwise, it calculates and returns the coordinates of the centroid (cx, cy)
, representing the center of the detected red area(s).
def get_red_centroid(frame, use_preprocessing=False, use_noise_reduction=False):
# Convert the frame to HSV
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# define the lower and upper bounds of the "red" color in the HSV color space
# Note: the Hue value for red can be near 0 or near 180 in the HSV color space,
# so we need to define two ranges and combine them
lower_red_1 = np.array([0, 100, 100])
upper_red_1 = np.array([10, 255, 255])
lower_red_2 = np.array([170, 100, 100])
upper_red_2 = np.array([180, 255, 255])
# create a mask for the red color
mask_1 = cv2.inRange(hsv, lower_red_1, upper_red_1)
mask_2 = cv2.inRange(hsv, lower_red_2, upper_red_2)
mask = cv2.bitwise_or(mask_1, mask_2)
# apply image preprocessing
if use_preprocessing:
kernel = np.ones((5,5),np.uint8)
mask = cv2.dilate(mask,kernel,iterations = 1)
mask = cv2.erode(mask,kernel,iterations = 1)
# apply noise reduction
if use_noise_reduction:
mask = cv2.medianBlur(mask, 5)
# compute the moments of the mask image
moments = cv2.moments(mask)
area = moments['m00']
# if the area is zero, return None (no red detected)
if area == 0:
return None
# compute the centroid
cx = int(moments['m10'] / area)
cy = int(moments['m01'] / area)
return (cx, cy)
Plot the displacement data¶
from stemvib.utils import *
frames = extract_frames('./stemvib/Video_5.MP4')
detected_frames = 0
results = []
for frame in frames:
centroids = get_red_centroid(frame)
if centroids:
results.append(centroids)
detected_frames+=1
#save_frames(marked_frame,'./test_Col0_3a_trim/threshold_aug')
displacement = centroids_to_displacements(results)
plot_displacement(displacement,'Video TEST 5')
Compute the damped natural frequency ωd
¶
You can see the beautiful sinusoid above of the stem vibration, now we can proceed to computing the damped natural frequency (denoted as ωd
) with the Fast Fourier Transform algorithm from scipy
dif_array1, start_point1 = displacement_to_difference(displacement)
hanning_array1 = transform_hanning(displacement, start_point1)
freqs1, power1, major_freq1 = displacement_to_major_freq(hanning_array1)
major_freq1
array([8.3449235])
For more information about the functions go to the gist in the on the top of the file. Let's give a word about all these functions above 🤓
displacement_to_difference
: convert displacement datatransform_hanning
: compute the hanning window of the signaldisplacement_to_major_freq
: compute the Hz frequency from the smoothed signal
Wrap it up into a loop¶
In order to process multiple video in a folder let's wrap our code in a function called process_folder_videos(folder_path)
and dump all the data in a pandas.DataFrame
because as data lovers we all appreciate the flexibility of you're good old dataFrame friend.
def process_folder_videos(folder_path):
# Initialize dataframe
columns = ['title', 'len_frames', 'detected', 'major_freq1', 'freqs1', 'power1']
df = pd.DataFrame(columns=columns)
# Check if folder exists
if not os.path.isdir(folder_path):
print(f'Folder {folder_path} does not exist')
return df
# Iterate through all files in the folder
for filename in os.listdir(folder_path):
if filename.endswith(".avi") or filename.endswith(".mp4"): # Add or modify the file extensions that you're interested in
print(f'\n--- Processing Video {filename}---\n')
filepath = os.path.join(folder_path, filename)
# Split the filename and the file extension
filebase, fileext = os.path.splitext(filename)
# Create output filename
output_filename = filebase + "_output" + fileext
output_filepath = os.path.join(folder_path, output_filename)
try:
# Process the video and get results
major_freq1 = process_track_compute(filepath, output_filepath)
# Add the results to the dataframe
df = df.append({'title': filename,
'len_frames': major_freq1[4],
'detected': major_freq1[3],
'major_freq1': major_freq1[0],
'freqs1': major_freq1[1],
'power1': major_freq1[2]}, ignore_index=True)
except Exception as e:
print(f'Error processing file {filename}: {e}')
return df
Add ML tracking¶
In this section we will see how to increase our detected frames ration (aka the images with the red marker on it) in order to plot the stem displacement in the better way possible. To do this we will explore only one otpion which is the DBScan algorithm 😎
If you do not know how the DBScan algorithm works, you can check the article about it in the ML section of the course
def compute_centroid_with_dbscan(keypoints, prev_centroid=None, max_distance=3, eps=3, min_samples=2):
# Prepare data
points = np.array([kp.pt for kp in keypoints])
# Check if keypoints are available
if len(points) == 0:
print("No keypoints detected. Returning None as the centroid.")
return None
# If previous centroid is provided and max_distance is specified,
# only keep points within max_distance of prev_centroid
if prev_centroid is not None and max_distance is not None:
distances = np.sqrt((points[:, 0] - prev_centroid[0])**2 + (points[:, 1] - prev_centroid[1])**2)
points = points[distances <= max_distance]
# Apply DBSCAN clustering
clustering = DBSCAN(eps=eps, min_samples=min_samples).fit(points)
# Find the label of the largest cluster
labels, counts = np.unique(clustering.labels_, return_counts=True)
largest_cluster_label = labels[np.argmax(counts)]
# Calculate the centroid of the largest cluster
largest_cluster_points = points[clustering.labels_ == largest_cluster_label]
centroid = np.mean(largest_cluster_points, axis=0)
return centroid
Now let's add our compute_centroid_with_dbscan()
function into our video processing function, because we have a lot of imagination as engineer (you know it if you are here lol) let's name it process_video_ml(video_path, output_folder, tolerance=0.5, max_distance=3)
like the other video processing function we have seen 🤓
def process_video_ml(video_path, output_folder, tolerance=0.5, max_distance=3):
# Open the video
cap = cv2.VideoCapture(video_path)
# Check if video opened successfully
if not cap.isOpened():
print("Error: Could not open video.")
return
# Frame number counter
frame_num = 0
# List to store the processed frames
processed_frames = []
centroids = []
# Variables to store the previous frame and centroid
prev_frame = None
prev_centroid = None
# Process the video frame by frame
while True:
# Capture frame-by-frame
ret, frame = cap.read()
# Break the loop if we reach the end of the video
if not ret:
break
# Apply the get_red_centroid_threshold function to get the centroid
centroid, marked_frame = get_red_centroid_threshold(frame, tolerance, prev_centroid)
#add ml to reduce noise
if frame_num != 0:
centroid = compute_centroid_with_dbscan(keypoints, prev_centroid, max_distance)
# If the centroid was not found, and we have a previous centroid, apply ORB technique
if not centroid and prev_centroid is not None:
# Get features near the previous centroid
keypoints, descriptors = get_features_near_centroid(prev_frame, prev_centroid, max_distance)
# Compute the centroid from keypoints
centroid = compute_centroid_of_keypoints(keypoints)
# Draw a circle at the new centroid if it was found
if centroid:
centroids.append(centroid)
marked_frame = cv2.circle(marked_frame, (int(centroid[0]), int(centroid[1])), 15, (0, 255, 0), 3)
# Append the processed frame to the list
processed_frames.append(marked_frame)
# Update the previous frame and centroid
prev_frame = frame
prev_centroid = centroid
# Increment frame number
frame_num += 1
# Release the video capture object
cap.release()
# Save the processed frames to the output folder
save_frames(processed_frames, output_folder)
return centroids
And that's it ! We have code a script who detect the vibration of a stem thanks to openCV and compute the displacement and the frequency, hope you have learn a thing or two 😎