Vehicle Detection and Tracking

The goals / steps of this project are the following:

Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier.
Implement a sliding-window technique and use the trained classifier to search for vehicles in images.
Run the pipeline on a video stream and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
Estimate a bounding box for vehicles detected.

Explanation of how I extracted HOG features from the training images.

Since the images in the training data are in .png format, I loaded them in using matplotlib.image so that they are already normalized. This means that we do not need an additional normalization step when training the classifier. After reading in the images, I converted the features to the YCrCb color channel since it produced the best results. I then extracted the HOG features using the function get_hog_features. Below are examples of HOG feature extraction from both vehicle and non vehicle images.

Explanation of how I settled on my final choice of HOG parameters.

I used all 3 channels with 18 orientations, 8 pixels per cell and 2 cells per block. This configuration allowed for the best balance between accuracy and speed.

Description of how I trained a classifier using my selected HOG features.

First, each image in the training set was converted to a horizontal feature vector. Then the dataset was split between between training and testing data with 20% of the data being used for the test set. An SVM classifier was used to train the classifier.

Description of how I implemented a sliding window search and how I decided what scales to search and how much to overlap window.

The sliding window technique was used to detect vehicles. First, each image is passed into the function gethotwindows() that detects vehicles in the image. Since the upper half of each image mostly contains the sky, only the bottom half of the image is used to search for vehicles. As for the x axis, information from the previous image is stored so that we minimize the range over which we search for cars in the x axis. I used a window size of 75 x 75 with an overlap of 80%. After that, the HOG features are extracted and pushed into the classifier that determines whether a vehicle is present or not.

Description of how I implemented some kind of filter for false positives and some method for combining overlapping bounding boxes.

First, a heatmap for each of the dectections is created using the function add_heat(). A moving list of the last 10 heatmap images is generated and updated. At the final output heatmap, an average of the last 10 heatmaps is created which helps us reduce false positives. A detection in one image but not in any others gets averaged out. Then in order to ensure that we do not get any false positives, the heatmap is thresholded to allow output where we see an overlap of at 2 heatmaps.

Brief discussion on any problems / issues I faced in your implementation of this project. Where will the pipeline likely fail? What could I do to make it more robust?

Although it is good, the vehicle detection piepline is still not perfect. Sometimes it takes several seconds to detect a car that is present in the images. Other times, it mistakenly classifies 2 vehicles as 1 when they are close to each other. The processing time to create the output video is also very long, which makes it impractical for real-time applications. In order to make it more robust, a better classifier such as neural nets can be used.

Link to your final video output.

Here's the link to my video result

# Load libraries
import glob
import time

import cv2
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
from skimage.feature import hog
from sklearn.externals import joblib
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import pickle
from scipy.ndimage.measurements import label

% matplotlib inline

# Define hyperparameters
color_space = 'YCrCb'  # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 18  # HOG orientations
pix_per_cell = 8  # HOG pixels per cell
cell_per_block = 2  # HOG cells per block
hog_channel = "ALL"  # Can be 0, 1, 2, or "ALL"
spatial_size = (64, 64)  # Spatial binning dimensions
hist_bins = 64  # Number of histogram bins
spatial_feat = False  # Spatial features on or off
hist_feat = False  # Histogram features on or off
hog_feat = True  # HOG features on or off

# Define a function to return HOG features and visualization
def get_hog_features(img, orient, pix_per_cell, cell_per_block, 
                        vis=False, feature_vec=True):
    # Call with two outputs if vis==True
    if vis == True:
        features, hog_image = hog(img, orientations=orient, 
                                  pixels_per_cell=(pix_per_cell, pix_per_cell),
                                  cells_per_block=(cell_per_block, cell_per_block), 
                                  transform_sqrt=True, 
                                  visualise=vis, feature_vector=feature_vec)
        return features, hog_image
    # Otherwise call with one output
    else:      
        features = hog(img, orientations=orient, 
                       pixels_per_cell=(pix_per_cell, pix_per_cell),
                       cells_per_block=(cell_per_block, cell_per_block), 
                       transform_sqrt=True, 
                       visualise=vis, feature_vector=feature_vec)
        return features

# Define a function to compute binned color features  
def bin_spatial(img, size=(32, 32)):
    # Use cv2.resize().ravel() to create the feature vector
    features = cv2.resize(img, size).ravel() 
    # Return the feature vector
    return features

# Define a function to compute color histogram features 
# NEED TO CHANGE bins_range if reading .png files with mpimg!
def color_hist(img, nbins=32, bins_range=(0, 256)):
    # Compute the histogram of the color channels separately
    channel1_hist = np.histogram(img[:,:,0], bins=nbins, range=bins_range)
    channel2_hist = np.histogram(img[:,:,1], bins=nbins, range=bins_range)
    channel3_hist = np.histogram(img[:,:,2], bins=nbins, range=bins_range)
    # Concatenate the histograms into a single feature vector
    hist_features = np.concatenate((channel1_hist[0], channel2_hist[0], channel3_hist[0]))
    # Return the individual histograms, bin_centers and feature vector
    return hist_features

# Define a function to extract features from a list of images
# Have this function call bin_spatial() and color_hist()
def extract_features(imgs, color_space='RGB', spatial_size=(32, 32),
                        hist_bins=32, orient=9, 
                        pix_per_cell=8, cell_per_block=2, hog_channel=0,
                        spatial_feat=True, hist_feat=True, hog_feat=True):
    # Create a list to append feature vectors to
    features = []
    # Iterate through the list of images
    for file in imgs:
        file_features = []
        # Read in each one by one
        image = mpimg.imread(file)
        # apply color conversion if other than 'RGB'
        if color_space != 'RGB':
            if color_space == 'HSV':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
            elif color_space == 'LUV':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2LUV)
            elif color_space == 'HLS':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)
            elif color_space == 'YUV':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2YUV)
            elif color_space == 'YCrCb':
                feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2YCrCb)
        else: feature_image = np.copy(image)      

        if spatial_feat == True:
            spatial_features = bin_spatial(feature_image, size=spatial_size)
            file_features.append(spatial_features)
        if hist_feat == True:
            # Apply color_hist()
            hist_features = color_hist(feature_image, nbins=hist_bins)
            file_features.append(hist_features)
        if hog_feat == True:
        # Call get_hog_features() with vis=False, feature_vec=True
            if hog_channel == 'ALL':
                hog_features = []
                for channel in range(feature_image.shape[2]):
                    hog_features.append(get_hog_features(feature_image[:,:,channel], 
                                        orient, pix_per_cell, cell_per_block, 
                                        vis=False, feature_vec=True))
                hog_features = np.ravel(hog_features)        
            else:
                hog_features = get_hog_features(feature_image[:,:,hog_channel], orient, 
                            pix_per_cell, cell_per_block, vis=False, feature_vec=True)
            # Append the new feature vector to the features list
            file_features.append(hog_features)
        features.append(np.concatenate(file_features))
    # Return list of feature vectors
    return features

# Define a function that takes an image,
# start and stop positions in both x and y, 
# window size (x and y dimensions),  
# and overlap fraction (for both x and y)
def slide_window(img, x_start_stop=[None, None], y_start_stop=[None, None], 
                    xy_window=(64, 64), xy_overlap=(0.5, 0.5), window_list = None):
    # If x and/or y start/stop positions not defined, set to image size
    if x_start_stop[0] == None:
        x_start_stop[0] = 0
    if x_start_stop[1] == None:
        x_start_stop[1] = img.shape[1]
    if y_start_stop[0] == None:
        y_start_stop[0] = 0
    if y_start_stop[1] == None:
        y_start_stop[1] = img.shape[0]
    # Compute the span of the region to be searched    
    xspan = x_start_stop[1] - x_start_stop[0]
    yspan = y_start_stop[1] - y_start_stop[0]
    # Compute the number of pixels per step in x/y
    nx_pix_per_step = np.int(xy_window[0]*(1 - xy_overlap[0]))
    ny_pix_per_step = np.int(xy_window[1]*(1 - xy_overlap[1]))
    # Compute the number of windows in x/y
    nx_windows = np.int(xspan/nx_pix_per_step) - 1
    ny_windows = np.int(yspan/ny_pix_per_step) - 1
    # Initialize a list to append window positions to
    if window_list == None:
        window_list = []
    # Loop through finding x and y window positions
    # Note: you could vectorize this step, but in practice
    # you'll be considering windows one by one with your
    # classifier, so looping makes sense
    for ys in range(ny_windows):
        for xs in range(nx_windows):
            # Calculate window position
            startx = xs*nx_pix_per_step + x_start_stop[0]
            endx = startx + xy_window[0]
            starty = ys*ny_pix_per_step + y_start_stop[0]
            endy = starty + xy_window[1]

            # Append window position to list
            window_list.append(((startx, starty), (endx, endy)))
    # Return the list of windows
    return window_list

# Define a function to draw bounding boxes
def draw_boxes(img, bboxes, color=(0, 0, 255), thick=6):
    # Make a copy of the image
    imcopy = np.copy(img)
    # Iterate through the bounding boxes
    for bbox in bboxes:
        # Draw a rectangle given bbox coordinates
        cv2.rectangle(imcopy, bbox[0], bbox[1], color, thick)
    # Return the image copy with boxes drawn
    return imcopy

# Define a function to extract features from a single image window
# This function is very similar to extract_features()
# just for a single image rather than list of images
def single_img_features(img, color_space='RGB', spatial_size=(32, 32),
                        hist_bins=32, orient=9, 
                        pix_per_cell=8, cell_per_block=2, hog_channel=0,
                        spatial_feat=True, hist_feat=True, hog_feat=True):    
    #1) Define an empty list to receive features
    img_features = []
    #2) Apply color conversion if other than 'RGB'
    if color_space != 'RGB':
        if color_space == 'HSV':
            feature_image = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
        elif color_space == 'LUV':
            feature_image = cv2.cvtColor(img, cv2.COLOR_RGB2LUV)
        elif color_space == 'HLS':
            feature_image = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
        elif color_space == 'YUV':
            feature_image = cv2.cvtColor(img, cv2.COLOR_RGB2YUV)
        elif color_space == 'YCrCb':
            feature_image = cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb)
    else: feature_image = np.copy(img)      
    #3) Compute spatial features if flag is set
    if spatial_feat == True:
        spatial_features = bin_spatial(feature_image, size=spatial_size)
        #4) Append features to list
        img_features.append(spatial_features)
    #5) Compute histogram features if flag is set
    if hist_feat == True:
        hist_features = color_hist(feature_image, nbins=hist_bins)
        #6) Append features to list
        img_features.append(hist_features)
    #7) Compute HOG features if flag is set
    if hog_feat == True:
        if hog_channel == 'ALL':
            hog_features = []
            for channel in range(feature_image.shape[2]):
                hog_features.extend(get_hog_features(feature_image[:,:,channel], 
                                    orient, pix_per_cell, cell_per_block, 
                                    vis=False, feature_vec=True))      
        else:
            hog_features = get_hog_features(feature_image[:,:,hog_channel], orient, 
                        pix_per_cell, cell_per_block, vis=False, feature_vec=True)
        #8) Append features to list
        img_features.append(hog_features)

    #9) Return concatenated array of features
    return np.concatenate(img_features)

# Define a function you will pass an image 
# and the list of windows to be searched (output of slide_windows())
def search_windows(img, windows, clf, scaler, color_space='RGB', 
                    spatial_size=(32, 32), hist_bins=32, 
                    hist_range=(0, 256), orient=9, 
                    pix_per_cell=8, cell_per_block=2, 
                    hog_channel=0, spatial_feat=True, 
                    hist_feat=True, hog_feat=True):

    #1) Create an empty list to receive positive detection windows
    on_windows = []
    #2) Iterate over all windows in the list
    for window in windows:
        #3) Extract the test window from original image
        test_img = cv2.resize(img[window[0][1]:window[1][1], window[0][0]:window[1][0]], (64, 64))      
        #4) Extract features for that window using single_img_features()
        features = single_img_features(test_img, color_space=color_space, 
                            spatial_size=spatial_size, hist_bins=hist_bins, 
                            orient=orient, pix_per_cell=pix_per_cell, 
                            cell_per_block=cell_per_block, 
                            hog_channel=hog_channel, spatial_feat=spatial_feat, 
                            hist_feat=hist_feat, hog_feat=hog_feat)
        #5) Scale extracted features to be fed to classifier
        test_features = scaler.transform(np.array(features).reshape(1, -1))
        #6) Predict using your classifier
        prediction = clf.predict(test_features)
        #7) If positive (prediction == 1) then save the window
        if prediction == 1:
            on_windows.append(window)
    #8) Return windows for positive detections
    return on_windows

# Train model
def train():
    vehiclesFiles = glob.glob("vehicles/*/*.png", recursive=True)
    Vehicles = []
    for image in vehiclesFiles:
        Vehicles.append(image)
    nonVehiclesFiles = glob.glob("non-vehicles/*/*.png", recursive=False)
    notVehicles = []
    for image in nonVehiclesFiles:
        notVehicles.append(image)


    Vehicles_features = extract_features(Vehicles, color_space=color_space,
                                    spatial_size=spatial_size, hist_bins=hist_bins,
                                    orient=orient, pix_per_cell=pix_per_cell,
                                    cell_per_block=cell_per_block,
                                    hog_channel=hog_channel, spatial_feat=spatial_feat,
                                    hist_feat=hist_feat, hog_feat=hog_feat)

    notVehicles_features = extract_features(notVehicles, color_space=color_space,
                                       spatial_size=spatial_size, hist_bins=hist_bins,
                                       orient=orient, pix_per_cell=pix_per_cell,
                                       cell_per_block=cell_per_block,
                                       hog_channel=hog_channel, spatial_feat=spatial_feat,
                                       hist_feat=hist_feat, hog_feat=hog_feat)

    X = np.vstack((Vehicles_features, notVehicles_features)).astype(np.float64)


    X_scaler = StandardScaler().fit(X)

    scaled_X = X_scaler.transform(X)

    y = np.hstack((np.ones(len(Vehicles_features)), np.zeros(len(notVehicles_features))))

    rand_state = np.random.randint(0, 100)
    X_train, X_test, y_train, y_test = train_test_split(
        scaled_X, y, test_size=0.2, random_state=rand_state)

    svm = LinearSVC()

    svm.fit(X_train, y_train)

    return X_scaler, svm

X_scaler, svm = train()

def add_heat(heatmap, bbox_list):
    # Iterate through list of bboxes
    for box in bbox_list:
        # Add += 1 for all pixels inside each bbox
        # Assuming each "box" takes the form ((x1, y1), (x2, y2))
        heatmap[box[0][1]:box[1][1], box[0][0]:box[1][0]] += 1

    # Return updated heatmap
    return heatmap

def apply_threshold(heatmap, threshold):
    # Zero out pixels below the threshold
    heatmap[heatmap <= threshold] = 0
    # Return thresholded map
    return heatmap

def draw_labeled_bboxes(img, labels):
    # Iterate through all detected cars
    for car_number in range(1, labels[1]+1):
        # Find pixels with each car_number label value
        nonzero = (labels[0] == car_number).nonzero()
        # Identify x and y values of those pixels
        nonzeroy = np.array(nonzero[0])
        nonzerox = np.array(nonzero[1])
        # Define a bounding box based on min/max x and y
        bbox = ((np.min(nonzerox), np.min(nonzeroy)), (np.max(nonzerox), np.max(nonzeroy)))
        # Draw the box on the image
        cv2.rectangle(img, bbox[0], bbox[1], (0,0,255), 6)
    # Return the image
    return img



def gethotwindows(image,previous=None,count=0):
    y_start_stop = [int(image.shape[0]/2), image.shape[0]]  

    x_start_stop = [None, None]

    windows = slide_window(image, x_start_stop=x_start_stop, y_start_stop=[int(image.shape[0]/2)+25,int(image.shape[0]/2+100)],
                             xy_window=(75, 75), xy_overlap=(0.8, 0.8),window_list=None)

    slide_window(image, x_start_stop=x_start_stop, y_start_stop=[int(image.shape[0]/2+75), image.shape[0]],
                            xy_window=(100, 100), xy_overlap=(0.8, 0.8),window_list=windows)

    hot_windows = search_windows(image, windows, svm, X_scaler, color_space=color_space,
                                 spatial_size=spatial_size, hist_bins=hist_bins,
                                 orient=orient, pix_per_cell=pix_per_cell,
                                 cell_per_block=cell_per_block,
                                 hog_channel=hog_channel, spatial_feat=spatial_feat,
                                 hist_feat=hist_feat, hog_feat=hog_feat)
    return hot_windows



def detect_vehicles(image,showheatmap=False,holder=None):

    image_copy = np.copy(image)
    image = image.astype(np.float32) / 255

    count = 0 
    previousLabels = None
    if holder is not None:
        previousLabels = holder.labels
        count = holder.iteration

    hot_windows = gethotwindows(image,previous=previousLabels,count=count)

    heat = np.zeros_like(image[:, :, 0]).astype(np.float)
    heatmap = add_heat(heatmap=heat, bbox_list=hot_windows)

    if holder is None:
        holder = ImageHolder()

    if len(holder.previousHeat)<holder.averageCount:
        for i in range(holder.averageCount):
            holder.previousHeat.append(np.copy(heatmap).astype(np.float))

    holder.previousHeat[holder.iteration%holder.averageCount] = heatmap
    total = np.zeros(np.array(holder.previousHeat[0]).shape)

    for value in holder.previousHeat:
        total = total + np.array(value)

    averageHeatMap = total/holder.averageCount

    averageHeatMap = apply_threshold(averageHeatMap,2)

    if showheatmap:
        plt.imshow(heatmap)
        plt.show()
    labels = label(averageHeatMap)

    holder.labels = labels
    holder.iteration = holder.iteration + 1

    window_img = draw_labeled_bboxes(image_copy, labels)

    return window_img,averageHeatMap


class ImageHolder:
    def __init__(self):
        self.previousHeat = []
        self.labels = []
        self.iteration = 0 
        self.averageCount = 10

imageHolder = ImageHolder()

def ProcessImage(image):
    img,holder = detect_vehicles(image=image,holder=imageHolder)
    return img

non_vehicle = "non-vehicles/Extras/extra1.png"
vehicle = "vehicles/GTI_Far/image0000.png"
def display_hog_img(file):
    image = mpimg.imread(file)
    feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2YCrCb)
    plt.imshow(feature_image)
    plt.show()
    plt.imshow(feature_image[:,:,0])
    plt.show()
    plt.imshow(feature_image[:,:,1])
    plt.show()
    plt.imshow(feature_image[:,:,2])
    plt.show()

    for channel in range(feature_image.shape[2]):
        features, hog_image = get_hog_features(img=feature_image[:, :, channel],
                                               orient=orient,
                                               cell_per_block=cell_per_block,
                                               pix_per_cell=pix_per_cell,vis=True)
        plt.imshow(hog_image)
        plt.show()


display_hog_img(non_vehicle)
display_hog_img(vehicle)

png

images = glob.glob("test_images/*")
for file in images:
    image = mpimg.imread(file)
    feature_image = cv2.cvtColor(image, cv2.COLOR_RGB2YCrCb)
    plt.imshow(feature_image)
    plt.show()
    plt.imshow(feature_image[:,:,0])
    plt.show()
    plt.imshow(feature_image[:,:,1])
    plt.show()
    plt.imshow(feature_image[:,:,2])
    plt.show()

    for channel in range(feature_image.shape[2]):
        features, hog_image = get_hog_features(img=feature_image[:, :, channel],
                                               orient=orient,
                                               cell_per_block=cell_per_block,
                                               pix_per_cell=pix_per_cell,vis=True)
        plt.imshow(hog_image)
        plt.show()

    window_img,heatmap = detect_vehicles(image)
    plt.imshow(window_img)
    plt.show()
    plt.imshow(heatmap)    
    plt.show()

png

import imageio
from imageio.plugins import ffmpeg

from moviepy.editor import VideoFileClip
from IPython.display import HTML

imageHolder = ImageHolder()

white_output = 'output.mp4'
clip1 = VideoFileClip("project_video.mp4")
white_clip = clip1.fl_image(ProcessImage) #NOTE: this function expects color images!!
%time white_clip.write_videofile(white_output, audio=False)

[MoviePy] >>>> Building video output.mp4
[MoviePy] Writing video output.mp4


100%|█████████▉| 1260/1261 [2:40:51<00:07,  7.94s/it]


[MoviePy] Done.
[MoviePy] >>>> Video ready: output.mp4

CPU times: user 2h 39min 57s, sys: 46.3 s, total: 2h 40min 43s
Wall time: 2h 40min 51s