Download PDFOpen PDF in browserBehaviors Violence Detection of Surveillance Video Using Spatial-Temporal Convolution and Atrous ConvolutionalEasyChair Preprint 53576 pages•Date: April 20, 2021AbstractDetecting moving objects in each frame is an essential step in video analysis and violence detection. In this paper, a new method for separating frames containing motion information and detecting violence in them is presented. In the proposed method, frames containing motion information are separated and their roughness is detected at two levels of the network. At level one, Atrose Convolution receives input video to the network and Separates frames containing motion information by applying semantic segmentation to network entry frames then transfers them to the level of the two networks, spatial-temporal convolution, for violence detection. Finally, in order to ensure the correct operation of the network, the regression unit, after checking the output of the information, classifies it into two classes, rough and non-rough, and considers a score for them. The closer the score is to 0, the less violence is detected, and the closer the score is to 1, the more violence is detected. To show the accuracy of the proposed algorithm, two sets of data have been examined, the total accuracy obtained from them is equal to 96% in the ucf-crime data set and also 93% of the surveillance video data set. Keyphrases: Adaptive Key Frame, Atrous Convolutional, Dvsnet, Flownet, spatial-temporal Convolutional
|