diff --git a/Report/main.pdf b/Report/main.pdf index 9d27741..b57c9f9 100644 Binary files a/Report/main.pdf and b/Report/main.pdf differ diff --git a/Report/main.tex b/Report/main.tex index 753f2b5..00a8e7c 100644 --- a/Report/main.tex +++ b/Report/main.tex @@ -1,4 +1,3 @@ -% File: 8BallPool video analysis \documentclass[letterpaper,12pt]{article} % DO NOT CHANGE THIS \usepackage{amsmath} @@ -103,7 +102,7 @@ % nouns, adverbs, adjectives should be capitalized, including both words in hyphenated terms, while % articles, conjunctions, and prepositions are lower case unless they % directly follow a colon or long dash -\title{8BallPool video analysis} +\title{8-Ball Pool video analysis} \author{ %Authors Michele Sprocatti\textsuperscript{\rm 1}, %\equalcontrib, @@ -123,13 +122,13 @@ \{michele.sprocatti\textsuperscript{\rm 1}, alberto.pasqualetto.2\textsuperscript{\rm 2}, michela.schibuola\textsuperscript{\rm 3}\}@studenti.unipd.it \end{center} -% TODO write about restrictions in the input datasets size/aspect ratio - \input{section/introduction} \input{section/workload} -\input{section/elements} +\input{section/executables} + +\input{section/parts} \input{section/results} diff --git a/Report/section/ballsDetection.tex b/Report/section/ballsDetection.tex index bf331c1..e79742f 100644 --- a/Report/section/ballsDetection.tex +++ b/Report/section/ballsDetection.tex @@ -1,10 +1,10 @@ \subsection{Balls detection} -To detect balls, Michele proposed a multi-step preprocessing approach. Initially, the table region is isolated using an approach similar to the segmentation described before. Then the corners area is removed to prevent Hough Circle transform to find them as false positives. Subsequently k-means clustering was applied to the image with k=5 (the number of balls type plus the playing field). The resulting clusterized \texttt{Mat} is converted to gray-scale to be used as \texttt{HoughCircle} input. The gray-scale output colors were selected to be as different as possible from each other once the color space is changed. +To detect balls, Michele proposed a multi-step preprocessing approach. Initially, the table region is isolated using an approach similar to the segmentation described before. Then the corner-defined area is removed to prevent the Hough Circle transform from finding them as false positives. Subsequently, k-means clustering was applied to the image with k=5 (the number of balls type plus the playing field). The resulting clustered \texttt{Mat} is converted to gray-scale to be used as \texttt{HoughCircles} input. The gray-scale output colors were selected to be as different as possible from each other once the color space is changed. -Circle parameters, such as radius and center color, were analyzed to identify potential ball regions. By calculating the mean radius of in-table circles with center not selected by the color mask, a radius range was established. Circles within this radius range were considered for further analysis. +Circle parameters, such as radius and center color, are analyzed to identify potential ball regions. By calculating the mean radius of in-table circles with a center not selected by the color mask, a radius range is established. Circles within this radius range are then considered for further analysis. -Ball classification involved creating a circular mask, computing the gray-scale histogram, and excluding background pixels from the values of the histogram. Peak values in the histogram were used to differentiate between striped and solid balls, while HSV color space analysis is used to distinguish white and black balls. +Ball classification involves creating a circular mask, computing the gray-scale histogram, and excluding background pixels from the values of the histogram. Peak values in the histogram are used to differentiate between striped and solid balls, while HSV color space analysis is used to distinguish white and black balls. -After finding the balls, the team identified an optimization opportunity. Since there's only one white ball and one black ball, Michele implemented non-maxima suppression for white and black balls independently, in order to improve performance. +After finding the balls, the team identified an optimization opportunity. Since in an 8-ball game, there must always be one and only one white ball and one and only one black ball, Michele implemented non-maxima suppression for white and black balls independently, in order to improve performance. The result of the detection process is then used to segment the balls. diff --git a/Report/section/conclusions.tex b/Report/section/conclusions.tex index 8240b3b..2c14aa3 100644 --- a/Report/section/conclusions.tex +++ b/Report/section/conclusions.tex @@ -1,2 +1,10 @@ \section{Conclusions} -Our approach demonstrates consistent performance across the dataset. Notably, table detection achieves high accuracy. However, ball classification presents some challenges due to their varying sizes and colors that sometimes are similar to the one of the playing field, also solid and striped balls are difficult to distinguish because % TODO riflessi e poca parte bianca visibile +Our approach demonstrates consistent performance across the dataset. Notably, table detection achieves high accuracy. +However, ball classification presents some challenges due to their varying sizes and colors that sometimes are similar to the one of the playing field, also solid and striped balls are difficult to distinguish, this happens because \texttt{HoughCircles} finds a smaller circle compared to the ball diameter which excludes too much white pixels of the ball, this problem is really hard to solve because footage with perspective requires a ranged radius to address balls at different distances from the camera. + +In other cases a striped ball can be detected as white if the \texttt{HoughCircles} output is smaller than the real ball and includes a lot of white pixels and few colored pixels, this problem has been mitigated by non-maximum suppression. + +Solid balls are much similar to the black ball if there is a large shadow inside the circle found by \texttt{HoughCircles}, this problem has been mitigated by non-maximum suppression. + + +Detection and segmentation parameters are optimized for the provided dataset. The algorithm may require adjustments for different datasets, especially for videos with different aspect ratios or resolutions. diff --git a/Report/section/executables.tex b/Report/section/executables.tex index 9b96482..c4ffbd5 100644 --- a/Report/section/executables.tex +++ b/Report/section/executables.tex @@ -1,10 +1,9 @@ \section{Executables} The program contains 4 different executables: \begin{itemize} - \item \texttt{8BallPool}: the main executable that, given a video file, it processes it and creates the output video with the superimposed minimap. + \item \texttt{8BallPool}: the main executable that, given a video file path from command line input, processes it and creates the output video with the superimposed minimap. \item \texttt{TestAllClip}: it is the executable used to test the detection and segmentation in the first and last frame of all videos through AP and IoU by comparing them with the ground truth. - \item \texttt{ShowSegmentationColored}: is an helper which has been used to show the ground truth of the segmentation of a particular frame using human-readable colors and it was also used as a test for the code that computes the metrics because it computes the performance of the ground truth on itself. + \item \texttt{ShowSegmentationColored}: is a helper executable that has been used to show the ground truth of the segmentation of a particular frame using human-readable colors and it was also used as a test for the code that computes the metrics because it computes the performance of the ground truth on itself. \item \texttt{ComputePerformance}: is used to compute the performance across the dataset so the mAP and the mIoU. \end{itemize} -% TODO cmd parameters diff --git a/Report/section/metrics.tex b/Report/section/metrics.tex index 34a9590..42a768d 100644 --- a/Report/section/metrics.tex +++ b/Report/section/metrics.tex @@ -1,7 +1,6 @@ \section{Metrics} The \texttt{computePerformance} executable handles both mAP and mIoU calculations. -% TODO check if explanations are correct \noindent\textbf{\emph{mAP (mean Average Precision)}}: \begin{enumerate} \item Predictions are performed for all first and last frames in a video. @@ -12,9 +11,9 @@ \section{Metrics} \noindent\textbf{\emph{mIoU (mean Intersection over Union)}}: \begin{enumerate} - \item IoU is calculated for the first and the last frame for each video. - \item The average IoU is then computed for each object class across all 20 images (10 videos each one with 2 frame) in the dataset. + \item IoU is calculated for the first and the last frames of each video. + \item The average IoU is then computed for each object class across all 20 images (10 videos each one with 2 frames) in the dataset. \item Finally, the mIoU is obtained by averaging the IoU values obtained in the last step. \end{enumerate} -The 8BallPool executable displays the performance metrics (AP and IoU) achieved by the method for the specific input video. +The \texttt{8BallPool} executable displays the performance metrics (AP and IoU) achieved by the method for the specific input video. diff --git a/Report/section/minimap.tex b/Report/section/minimap.tex index 39f6dfb..f20d3c7 100644 --- a/Report/section/minimap.tex +++ b/Report/section/minimap.tex @@ -7,8 +7,8 @@ \subsection{Minimap creation} \end{itemize} \subsubsection{Empty minimap image} -As a first step, an image of an empty billiard table has been selected, and its corner positions and dimensions have been stored in constant variables by testing different values. In particular Alberto had the idea of converting the image into a byte array and inserting it in a header file through ImageMagick (\url{https://imagemagick.org/}). -This step has been performed with the aim of creating a self-contained executable without the need of the png image dependency. +As a first step, an image of an empty billiard table has been selected, and its corner positions and dimensions have been stored in constant variables by testing different values. In particular, Alberto had the idea of converting the image into a byte array and inserting it in a header file through ImageMagick (\url{https://imagemagick.org/}). +This step has been performed with the aim of creating a self-contained executable without the need for the \texttt{.png} image dependency. The byte array is then used to create a \texttt{Mat} object through the \texttt{imdecode} function. \subsubsection{Computation of the transformation matrix} @@ -20,7 +20,7 @@ \subsubsection{Check if the corners are in the required order} To check this information, the “percentage of table” with respect to the pocket in a rectangle placed in the center of the edge (with dimensions proportional to the real table and pocket dimensions) has been computed for all the edges. This computation has been done in the table image previously transformed and cropped to the table dimensions; in this way, the center between two corners corresponds to the real one (otherwise, if the table has some perspective effect, the center between the two corners may not correspond to the real one). Then, the edges have been ordered by using this percentile. To understand how the corners were oriented, three cases have been considered: \begin{itemize} \item If the edges with "more pocket" are opposite edges, then they are the longest edges; This happens, for example, in Figure \ref{fig:game2_clip1_orientation}. - \item If the edge with "more pocket" is opposite to the one with "less pocket", then they are not the longest edges; This happen, for example, in Figure \ref{fig:game3_clip1_orientation} and Figure \ref{fig:game4_clip1_orientation}, when there is an occlusion or much noise in the center of the edge with "more pocket". + \item If the edge with "more pocket" is opposite to the one with "less pocket", then they are not the longest edges; This happens, for example, in Figure \ref{fig:game3_clip1_orientation} and Figure \ref{fig:game4_clip1_orientation}, when there is an occlusion or much noise in the center of the edge with "more pocket". \item Otherwise, there is uncertainty, and then, probably, the one with "more pocket" is the longest edge. \end{itemize} If the table is not horizontal as expected (for example in Figure \ref{fig:game1_clip1_orientation}), then all the edges are rotated and the transformation matrix is re-computed. @@ -39,7 +39,7 @@ \subsubsection{Check if the corners are in the required order} \caption{Transformation of the table to the minimap table size} %\label{fig:game1_clip1_mask} \end{subfigure} - \caption{game1\_clip1 first frame. The table is transformed in a wrong way, because the pockets are located in the shortest edges rather than the longest ones.} + \caption{game1\_clip1 first frame. The table is transformed in the wrong way because the pockets are located in the shortest edges rather than the longest ones.} \label{fig:game1_clip1_orientation} \end{figure} @@ -54,7 +54,7 @@ \subsubsection{Check if the corners are in the required order} %\label{fig:game2_clip1_mask} \includegraphics[width=0.48\textwidth]{images/TableOrientation/g2_c1_mask.jpg} } - \caption{game2\_clip1 first frame. The table is correctly transformed. In this case the pockets are lightly visible, but they allow to detect the correct orientation.} + \caption{game2\_clip1 first frame. The table is correctly transformed. In this case, the pockets are lightly visible, but they allow the detection of the correct orientation.} \label{fig:game2_clip1_orientation} \end{figure} @@ -69,7 +69,7 @@ \subsubsection{Check if the corners are in the required order} %\label{fig:game3_clip1_mask} \includegraphics[width=0.48\textwidth]{images/TableOrientation/g3_c1_mask.jpg} } - \caption{game3\_clip1 first frame. The table is correctly transformed. In this case, the center of one of the shortest edges has some noise due to the person playing the game; the result is correct, because in the opposite edge there is no noise.} + \caption{game3\_clip1 first frame. The table is correctly transformed. In this case, the center of one of the shortest edges has some noise due to the person playing the game; the result is correct because in the opposite edge there is no noise.} \label{fig:game3_clip1_orientation} \end{figure} @@ -84,13 +84,15 @@ \subsubsection{Check if the corners are in the required order} %\label{fig:game4_clip1_mask} \includegraphics[width=0.48\textwidth]{images/TableOrientation/g4_c1_mask.jpg} } - \caption{game4\_clip1 first frame. The table is correctly transformed. In this case, the center of one of the shortest edges has some noise due to the light of the table; the result is correct, because in the opposite edge there is no noise.} + \caption{game4\_clip1 first frame. The table is correctly transformed. In this case, the center of one of the shortest edges has some noise due to the light of the table; the result is correct because in the opposite edge there is no noise.} \label{fig:game4_clip1_orientation} \end{figure} \subsubsection{Draw the minimap with tracking lines and balls} -Given the transformation matrix and the ball positions in the frame, it is possible to compute the positions of the balls in the minimap. This computation has been done in the \texttt{drawMinimap} method. Every time this method is called, the ball positions and the positions of the balls in the previous frame (if they have been computed by the tracker) are computed by using the \texttt{perspectiveTransform} method. For each ball in the frame, a line between the previous position and the current position is drawn on the minimap image, passed as a parameter by reference such that all the tracking lines are kept in a single image (Figure \ref{fig:game2_clip1_tracking}). Then this image is cloned into a copy, and the current balls are drawn on it. This image is then returned (Figure \ref{fig:game2_clip1_balls}). This implementation idea comes from Alberto. +Given the transformation matrix and the ball positions in the frame, it is possible to compute the positions of the balls in the minimap. This computation has been done in the \texttt{drawMinimap} method. Every time this method is called, the ball positions and the positions of the balls in the previous frame (if they have been computed by the tracker) are computed by using the + +\noindent\texttt{perspectiveTransform} method. For each ball in the frame, a line between the previous position and the current position is drawn on the minimap image, passed as a parameter by reference such that all the tracking lines are kept in a single image (Figure \ref{fig:game2_clip1_tracking}). Then this image is cloned into a copy, and the current balls are drawn on it. This image is then returned (Figure \ref{fig:game2_clip1_balls}). This implementation idea comes from Alberto. \begin{figure}[H] \centering @@ -110,4 +112,6 @@ \subsubsection{Draw the minimap with tracking lines and balls} \label{fig:game2_clip1_balls_and_tracking} \end{figure} -The ideas of using and the implementation of \texttt{getPerspectiveTransform} and \texttt{perspectiveTransform}, and how to check the orientation of the table were from Michela. +The implementation and the ideas of using \texttt{getPerspectiveTransform} and + +\noindent\texttt{perspectiveTransform}, and how to check the orientation of the table were from Michela. diff --git a/Report/section/elements.tex b/Report/section/parts.tex similarity index 63% rename from Report/section/elements.tex rename to Report/section/parts.tex index 05a53a2..29673f0 100644 --- a/Report/section/elements.tex +++ b/Report/section/parts.tex @@ -1,6 +1,4 @@ -\section{Elements of our project} % TODO change name -% TODO ThE FOLLOWING ARE AGAIN SECTIONS -> FIX -\input{section/executables} +\section{Parts of the project} \input{section/tableDetection} \input{section/tableSegmentation} \input{section/ballsDetection} diff --git a/Report/section/radiusAttempt.tex b/Report/section/radiusAttempt.tex index c41f1c0..4d9b8f9 100644 --- a/Report/section/radiusAttempt.tex +++ b/Report/section/radiusAttempt.tex @@ -18,7 +18,7 @@ \subsubsection{Attempt to find the ball radius relative to the distance and pers \item Otherwise it is a value between 0 and 1, which indicates the percentage of slope between the camera and the table; for example, if the value is 0.5, then the camera is about 45° from the table. \end{itemize} -To compute the final interval, the minimum and maximum values are computed by subtracting and incrementing a value, which increases with the percentage of slope (more the slope, more the variance) by multiplying the percentage of slope with the mean radius previously computed, and a precision value is added due to some other variables in the images. +To compute the final interval, the minimum, and maximum values are computed by subtracting and incrementing a value, which increases with the percentage of slope (more the slope, more the variance) by multiplying the percentage of slope with the mean radius previously computed, and a precision value is added due to some other variables in the images. \begin{equation} min\_radius = mean\_radius - mean\_radius \times percentage\_slope - precision \end{equation} diff --git a/Report/section/results.tex b/Report/section/results.tex index 0ea92c2..74a33b9 100644 --- a/Report/section/results.tex +++ b/Report/section/results.tex @@ -1,10 +1,7 @@ \section{Results} Table detection exhibits very high accuracy across the dataset, in particular for each initial frame of each video 4 corner points are always identified, the assumption that we made is that the camera does not move during a single clip so, once the table is detected in the first frame, we can use that information for all the other frames in the same video. -In contrast, ball detection is influenced by k-means clustering. To achieve consistent and satisfactory results, a fixed random seed is incorporated into the code. %TODO scrivere perchè - -This method results in an average mAP of 0.51 for the dataset. % TODO non molto alto, spiegare perchè -% TODO more quantitative results +In contrast, ball detection is influenced by k-means clustering. To achieve consistent and satisfactory results, a fixed random seed is incorporated into the code. The seed is fixed because the k-means++ algorithms used to initialize the centroids are, in this way, made reproducible and to be able to focus on optimizing the performance of the algorithm. \subsection{Quantitative results} @@ -64,14 +61,11 @@ \subsection{Quantitative results} \label{tab: performance across dataset} \end{table} -% TODO tables per ogni video? - -While the algorithm successfully detects tables, backgrounds, and both white and black balls with high Intersection over Union (IoU), it struggles with solid and striped balls due to inaccurate distinction. % TODO "distinction" è la parola giusta? non credo +While the algorithm successfully detects tables, backgrounds, and both white and black balls with high Intersection over Union (IoU), it struggles with solid and striped balls due to inaccurate distinction between the two. This leads to a lower overall mean Average Precision (mAP) that is focused only on balls, but a good mean Intersection over Union (mIoU) because the background and playing field are well segmented. \subsection{Qualitative results} -Some qualitative results are presented below. - -% TOOD descrivere un minimo qualitativamente le immagini con qualcosa di vero su tutto il dataset (osservazioni sul funzionamento/qualità) +Sometimes the detection finds some incorrect circles near hands; we tried to solve this problem by tuning the radius circle parameter, performing dilation on the color mask, and removing some false positive circles if they include pixels with colors very similar to hands skin one, but the problem is still present a few clips. +Some qualitative results are presented below. \input{section/outputImages.tex} diff --git a/Report/section/tableDetection.tex b/Report/section/tableDetection.tex index c5920e5..06df022 100644 --- a/Report/section/tableDetection.tex +++ b/Report/section/tableDetection.tex @@ -1,5 +1,5 @@ \subsection{Table detection} -A mask-based approach was implemented for table detection, exploiting the fact that in billiard games' footage, the table should be always located in the middle of the frame; this assumption is confirmed by all the videos in the dataset. +A mask-based approach was implemented for table detection, exploiting the fact that in billiard game footage, the table should be always located in the middle of the frame; this assumption is confirmed by all the videos in the dataset. The mask was generated by identifying the most common color in the image's central columns; the color is represented by a range in the Hue component (with respect to HSV color space). Building upon this initial step, Michele exploits the Canny edge detector and OpenCV's HoughLinesP function. Then the function analyzes the found intersections and merges nearby points, ensuring the consistent identification of 4 corner points corresponding to the table's corners in the processed dataset. diff --git a/Report/section/tableSegmentation.tex b/Report/section/tableSegmentation.tex index d8e8c92..f4ea35a 100644 --- a/Report/section/tableSegmentation.tex +++ b/Report/section/tableSegmentation.tex @@ -1,9 +1,9 @@ \subsection{Table segmentation} -In order to isolate the table with high precision, Michele employed a combination on two different binary masks through a voting system. +In order to isolate the table with high precision, Michele employed a combination of two different binary masks through a voting system. The involved masks are: \begin{description} - \item[Color-based mask] Created by identifying pixels corresponding to the table's color range. This mask is very robust on shadows and reflections since it defined through the Hue component of the HSV color space, which is invariant to brightness. + \item[Color-based mask] Created by identifying pixels corresponding to the table's color range. This mask is very robust on shadows and reflections since it is defined through the Hue component of the HSV color space, which is invariant to brightness. \item[k-means clustering mask] Generated by applying the k-means algorithm on the image to separate the table from the background. Useful when the table color range is too broad to be captured by the color-based mask. \end{description} These two masks are combined through a voting system, which classifies as table pixels classified as such by at least one of the two masks gaining the best of both approaches. -Finally the isolated area is limited to the pixels inside the polygon defined by the previously detected corners. +Finally, the isolated area is limited to the pixels inside the polygon defined by the previously detected corners. diff --git a/Report/section/tracking.tex b/Report/section/tracking.tex index c3e0bec..a7eac79 100644 --- a/Report/section/tracking.tex +++ b/Report/section/tracking.tex @@ -1,14 +1,14 @@ \subsection{Tracking} The tracking has been performed exploiting the \texttt{TrackerCSRT} class of the OpenCV library. -In our application a new class \texttt{BilliardTracker} has been implemented, which is responsible for the tracking of all the balls on the billiard table while adding the lower level implementation of OpenCV \texttt{TrackerCSRT}. The class works by creating a new \texttt{TrackerCSRT} object for each ball to be tracked. +In our application, a new class \texttt{BilliardTracker} has been implemented, which is responsible for the tracking of all the balls on the billiard table while adding the lower-level implementation of OpenCV \texttt{TrackerCSRT}. The class works by creating a new \texttt{TrackerCSRT} object for each ball to be tracked. During the tracking process, at every frame, the tracker updates the global \texttt{Ball} vector with the new position of the ball through a pointer which will be used to draw the ball and its trace on the minimap. -The position of the ball is only updated if the updated bounding box is moved by at least a certain threshold, which is set to the 70\% of the IoU between the previous and the new bounding box. This is done to avoid some false positives that may occur during the tracking which lead to small wiggles in the ball position, even if they are steady. -The IoU threshold value is a trade off between the wiggle reduction and the time frequency sampling of the ball position, used to draw the trace of the ball on the minimap. +The position of the ball is only updated if the updated bounding box is moved by at least a certain threshold, which is set to 70\% of the IoU between the previous and the new bounding box. This is done to avoid some false positives that may occur during the tracking which lead to small wiggles in the ball position, even if they are steady. +The IoU threshold value is a trade-off between the wiggle reduction and the time-frequency sampling of the ball position, used to draw the trace of the ball on the minimap. -If the ball is no more visible since it has been scored, then the relative tracking is stopped. +If the ball is no longer visible since it has been scored, then the relative tracking is stopped. -Based on our experiments the tracking is performed on a 10 pixels padded version of the ball's bounding box since the tracker gains much more performances in its ability to track the ball, even without occlusions. +Based on our experiments the tracking is performed on a 10-pixel padded version of the ball's bounding box since the tracker gains much more performance in its ability to track the ball, even without occlusions. -This tracking implementation reveals to be very robust and accurate on all the frames of all videos of the dataset, but it is also very slow; this is the bottleneck of the application, since the tracking is performed on every frame of the video, and the tracking of each ball is performed independently from the others. +This tracking implementation is revealed to be very robust and accurate on all the frames of all videos of the dataset, but it is also very slow; this is the bottleneck of the application, since the tracking is performed on every frame of the video, and the tracking of each ball is performed independently from the others. diff --git a/Report/section/video.tex b/Report/section/video.tex index 304cc82..1698639 100644 --- a/Report/section/video.tex +++ b/Report/section/video.tex @@ -2,7 +2,7 @@ \subsection{Video creation} To create the output video Michele used the \texttt{VideoWriter} class of OpenCV. For each frame of the input video tracking is performed and the minimap is created, then the minimap is superimposed on the frame and the frame is saved in the output video. -To avoid creating the output video file and then run in some exception resulting in a non-readable file, Alberto thought about using a temporary file and then renaming it to the output file only at the end if no exception has been thrown and the program is sure that the file is complete and readable. +To avoid creating the output video file and then running in some exception resulting in a non-readable file, Alberto thought about using a temporary file and then renaming it to the output file only at the end if no exception has been thrown and the program is sure that the file is complete and readable. % To create the output video we take the minimap and the current frame and Michele decide to do as follows: % At the beginning the function computes two values: the scaling factor for the minimap and the offset (the first row where the minimap is placed). For the scaling factor Michele thought that the minimap should have 0.3 of the total columns of the frame, so the scaling factor is computed as follows: diff --git a/Report/section/workload.tex b/Report/section/workload.tex index 5776781..c59cdc2 100644 --- a/Report/section/workload.tex +++ b/Report/section/workload.tex @@ -16,4 +16,3 @@ \subsection{Working hour per member} \item[Michela:] 50 hours \item[Alberto:] 50 hours \end{description} -% TODO hours