Deep learning has been applied everywhere. From imagenet  to disease identification  to large-scale video classification  to text classification , there are barely any areas where people have not applied deep learning. But interestingly, there has been very little work in applying data science and deep learning to the game of cricket. This post is a detailed overview of my final year project at the FAST National University. We have developed a deep learning based system that is able to do many tasks in cricket in an automated way. Some of these tasks are:
- Scene Segmentation from a Cricket Match Video
- Scene Classification
- Automatic Commentary Generation
- Targeted Highlights Generation
- Player Identification
- Player Stats Extraction
These tasks are usually done through manual effort or not done at all. We believe that automating these tasks is extremely advantageous. For example, in many countries like Pakistan, there is no detailed ball-by-ball commentary for county cricket matches. Moreover, each player has aggregated stats and it is not possible for anyone to look at the strengths and weaknesses of a player in detail. For example, one may want to look at the area where the players gets out most of the time, or the area where the player plays most of his/her shots. Our system is able to do most of these tasks with a fairly decent accuracy. Let us define our problem statement first.
Give a video \(V\), we want to extract scenes \(S1,S2, …, Sn\) and classify them. For each scene which is also a shot, we want to generate a description of the shot based on a set of labels. For a specific type of scene, we want to to generate highlights e.g highlights for all shots in the match. Finally, given a shot, we want to identify both the bowler and the batsman in the shot. Apart from scene segmentation, all of our problems are video classification problems.
One of the most novel things in this project is the data. Our data collection was done over a period of several weeks and we have the first of its kind data set. Our data consists of 20 IPL matches from the official youtube channel of IPL. Each match is around 4 hours long. We extracted scenes from 5 matches and hand labeled each scene for multiple labels such as shot type, shot aggression, shot area, batsman name, bowler name, bowl length etc. These 5 matches data is used for training purposes and testing is done on the remaining fifteen matches.
The figure below shows the complete block diagram of our deep learning based system.
First of all, our system takes in a cricket match video and segments each scene from the video. Each scene is then classified into a set of labels. After classification, either highlights can be generated or players can be further identified. Moreover, following scene classification, a template based commentary can be generated. Let us explain each block in detail now.
The first part in the pipeline is scene segmentation. We use a naive method for segmentation which is based on the histogram difference of frames. We assume that a new scene starts as soon as the histogram difference of current frame differs drastically with the previous three frames. If this difference is greater than a certain threshold, we start a new scene and save the previous frames in a scene.
The figure above shows a bar chart of the histogram distance (cosine) of current frame with the mean of distance with previous three frames. We can see that whenever a scene shifts, there is a spike in distance. Although this assumption is very naive, we observed empirically that it worked well for our problem setting. Using this approach, we were able to segment a match into a large number of individual scenes. Before moving to the next part, we hand labeled each scene. This was a tedious task and required us a few weeks.
This is where the deep learning magic begins. Since we had collected labeled data before this step, we just have to do supervised learning now. For this problem, we used a combination of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). CNNs were used to extract features from each frame of the video while RNNs were used to process the sequential features. The figure below shows our model architecture.
We use this same model for each label i.e we actually predict multiple labels after the dense layer. For simplicity, I have just shown one output block. We implemented this model in keras and trained for 20 epochs. For CNN, we used a pre-trained VGG19 on imagenet. For RNNs, we used an BiLSTM with 1024 units. Let us look at some results to see how our model is performing. These results are on a test data set.
We can see that the results are pretty decent and the model is able to classify many scenes correctly.
Now that our model is able to predict each scene correctly, we can generate targeted highlights. More concretely, given an input parameter \(p\), where \(p\) is one of the labels in our model, we can collect all the scenes with label \(p\) and append them into a video. The resulting video would be the highlight of the match with a targeted parameter. Below, we show some highlights for the label ‘Shot’
This was perhaps the most interesting part. The implications of this component are very high. Given we can identify each player in a shot and multiple labels for that shot (shot aggression, shot region etc), we can build statistics for each county cricket player. This will better enable the coaches and cricket boards to select the new talent. Moreover, a player can look at his past performances and analyze his data thoroughly thus working on his weaknesses and strengths. This has huge benefits given it’s done with a reasonable accuracy. Using our same model as shown previously, we trained for batsman and bowler prediction among 10 players. These players included Virat Kohli, Shane Watson, Darren Sammy, Faulkner and a few more. The reason we have run this module for only 10 players is because of the laborious effort that was required in labeling each shot for a batsman and bowler name. We were already tired with all the previous labeling and did not label batsmen and bowlers at a large scale. Let us look at some results to see how model is able to predict players. One interesting thing to note here is that we did not do any segmentation, localization or object detection to identify players and then label them. We are still passing the complete frames in the clip as input. Below is a clip that shows predictions for the batsman in a shot.
The last module of our pipeline is commentary generation. This again ties in with the scene classification module. Using a template based text, we fill in the labels to generate a static commentary for each shot. Let us look at a figure to better understand this.
The bold text in the caption is the commentary we generate. Although this commentary is not as interesting as the ones people write in Cricinfo, we believe that this is still a huge step forward in automating the game using deep learning models.
Till now, you must have realized that we are only using one single architecture to do multi class classification and using it to do multiple things. We have shown qualitative results above but below, we present some quantitative metrics like accuracy, precision for different labels in our modules.
The figure above shows our metrics’ scores for scene classification. We can see that the model is fairly accurate while predicting shots and fielding scenes. The only label for which the accuracy is a bit low is the fielder scene. That usually happens because the model sometimes is not able to distinguish between a batsman and a fielder. The figure below shows metrics for shot labels.
We can see that the accuracy on average is low as compared to scene classification. This makes sense since shot classification must be a hard problem given the data we had (240 balls x 5 matches = 1200 shots).
To conclude, I think there is a lot of potential in applying the latest deep learning trends to the game of cricket. For example, many state of the art object detection models can be used for automatic score extraction. More granular measurements can be made e.g ball swing using techniques in optical flow analysis. One of the biggest hurdles we faced was the presence of labeled data. We have tried to make a step towards automation in cricket. We hope this work catches the eyes of a few people and we can continue working on this. Our aim is to deploy such components at a large scale and make cricket less laborious for coaches, cricket boards and players. Finally, although our results are highly encouraging, they are by no means perfect and our model definitely made mistakes. But we hope that this post will be a stepping stone in the direction of ‘automation in cricket’
 Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.
 Lakhani, Paras, and Baskaran Sundaram. “Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks.” Radiology 284.2 (2017): 574-582
 Karpathy, Andrej, et al. “Large-scale video classification with convolutional neural networks.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014.
 Zhang, Xiang, Junbo Zhao, and Yann LeCun. “Character-level convolutional networks for text classification.” Advances in neural information processing systems. 2015.