This research paper presents a study of concealed weapon detection using image processing and machine learning. In order to attempt to replace the traditional method of detecting hidden weapons i.e. x-ray method with an automated and possibly a less error prone procedure, potential alternate techniques such as neural networks and image fusion have been studied and explored to identify the best possible solution. We propose a method to fuse Thermal/IR image with the conventional RGB image or HSV image in order to reduce image noise and retain all the critical features of the image to achieve both weapon detection and facial feature extraction.
I. INTRODUCTION
When we consider the topic of security and terrorism, the first word that comes to our mind is weapons. All the unwanted attacks have been made possible by the presence of weapons, be it hijacking or any attack aimed to fill people’s mind with fear. Security is of utmost importance, not only for us, but also for the people that are around us. Nowadays, there are various measures that are taken to ensure the safety of the people everywhere. But we can never be too sure about how secure those measures actually are. Taking airport for example, smuggling and carrying illegal arms from one place to another is not usual but still there have been some instances of the same. This is where concealed weapon detection comes into the picture. We aim to control this carriage and smuggling of illegal weapons to eliminate, or at least reduce the possibility of such attacks, thereby ensuring safety of the citizens, and ultimately bringing down the rate of crime. The basic aim of concealed weapon detection is to detect if the person is carrying any weapon. Concealed weapon detection (CWD) is an increasingly important topic in the general area of law enforcement and it appears to be a critical technology for dealing with terrorism, which appears to be the most significant law enforcement problem for the next decade. Existing image sensor technologies for concealed weapon detection include thermal/infrared (IR), millimeter wave (MMW), and X-ray. Apart from these techniques, image fusion has been identified as a key technology to achieve improved detection of concealed weapon. Image fusion is a process of combining complementary information from multiple sensor images to generate a single image that contains a more accurate description of the scene than any of the individual images. With this technique, we aim to achieve the following objectives:
II. PROCESS FLOW
Algorithm to implement concealed weapon detection using the proposed method:
III. IR/THERMAL AND RGB IMAGING
Thermal imaging is based upon the science of infrared energy (otherwise known as “heat”), which is emitted from all objects. This energy from an object is also referred to as the “heat signature”, and the quantity of radiation emitted tends to be proportional to the overall heat of the object. The fundamental idea behind using thermal image instead of a conventional image is that since the weapon is concealed behind a piece of clothing, the normal RGB image would not contain the necessary details in order to identify the weapon, but since the thermal signature(temperature) of the weapon will be drastically different from the thermal signature of human body, the weapon will be distinctly identifiable in the thermal image. Apart from the IR/thermal image an RGB image will also be captured in order to retain the facial characteristics of the person carrying the weapon.
RGB image is the conventional image which we capture using our phone’s camera or any camera. It consists of three channels i.e. Red, Green and Blue
A. Convert IR Image to HSV
The IR/Thermal image captured will be converted to the HSV color model. HSV stands for Hue, Saturation and Vibrance (brightness). Since the IR/Thermal image is also partly dependent on the amount of light that is hitting the object, some information may be lost in those images. In order to preserve as much information as possible, this step is carried out as descriptions in terms of brightness and hue can potentially be more relevant. This concept of conversion is taken from [4].
B. Complement of IR/Thermal Image
Since the IR/Thermal image can sometimes have irregular levels of brightness and by extension information, which can make object discrimination difficult, we take the complement of IR/Thermal image to remove the darkness and improve feature extraction. This operation is done by mathematically subtracting the component of IR matrix from 255 because the intensity of IR image range between 0 to 255 or by reverse the value of each component.
IV. METHODOLOGY OF IMAGE PROCESSING
Figure 2 is showing the detailed system design from input to output in two steps of getting the results of weapon detection. The Image Processing stage is the most complicated stage of this method as a lot of manipulation operations need to take place in order to get an image that is robust enough so that the concealed weapon can be detected with maximum accuracy. There has been a lot of research lately in the domain of image processing. [4] explains how DWT (Discrete Wavelet Transform) can be a potentially strong method in the case of RGB and IR images, while [2] explores how well is the LatLRR (Latent Low Rank Representation) for fusing the IR/Thermal Image with RGB or HSV images. Fig. 3. Compares the results of DWT Image fusion with LatLRR technique. Since the results from DWT technique were more containing a more robust image of the hidden weapon, it was decided to use DWT for the objective of concealed weapon detection.
A. Image Preparation and Pre-processing
Any image taken by sensor can have surplus and other extra parts such as the dark or semi dark background. The pre-processing is necessary to remove the undesirable parts in whole image and give the images in proper situation. Also, in order to perform image fusion, we need to make sure that both the input images are of the same size i.e. dimensions. Since these two input images are taken from two different image sensing devices so they are of different size. Hence, we will need to resize the input images.
B. Combining the RGB and IR/Thermal Image Before Image Fusion
Before the image fusion step, we combine the RGB image with the complement of IR/Thermal image by performing the addition operation on the images. This is done in order to ensure that the input images can be as robust as possible and can contain maximum amount of information.
Since the requirement that the input images (HSV, IR/Thermal + RGB) have same dimensional qualities is satisfied, the fusion procedure can start. The advantages of image fusion over visual comparison of multi-modality are: (a) the fusion technique is useful to correct for variability in orientation, position and dimension; (b) it allows precise anatomic and physiological correlation; (c) it permits regional quantization. After considering the various fusion processes ([4] and [2]), it is finalized that DWT is the most robust process to accomplish this stage. Discrete wavelet transform (DWT) is a spatial frequency decomposition that provides a flexible multi resolution analysis of an image. Many image processing operations like denoising, contrast enhancement, edge detection, segmentation, texture analysis and compression can be easily and successfully performed in the wavelet domain. Wavelet techniques thus provide a powerful set of tools for image enhancement and analysis together with a common framework for various fusion tasks. DWT involves converting the image from locative domain to frequency domain. The frequency domain in DWT divides the image in four part such as LL, LH, HL, HH). The LL part have the information of saliency parts i.e., the details in the image and the (LH, HL, HH) have information of global parts i.e., contour of the objects in the image. Multi-scale decomposition of the image based on DWT extract low frequency information, as well as, horizontal, vertical and diagonal directions of the high frequency details. The first band is called as approximated coefficient band. And other three are detailed featured coefficients consisting the 3 alignments of the edges of the image: horizontal, vertical, and diagonal. The extracted DWT coefficients of input images are utilized for doing the fusion process. Both of our images consist of 3 channels therefore we will fuse these channels individually and then combining these channels again to form a pseudo-colored image. Then we are applying IDWT to get reconstructed fused image. The results obtained in [10] clearly show that the important area (area of hidden weapon) has a high degree of changing with respect to the other areas in an image. The process is clearly visibly explained in Fig. 4.
V. WEAPON DETECTION
Once the binary image is received after the image fusion and processing step, the image is passed on for the neural network to identify if a weapon is present or not. In order to achieve this, various computer vision algorithms can be utilized. [6] talks about using the YOLOv6 algorithm specifically for object detection, which in this case is a weapon. According to [6], using this approach can isolate only the important segments of the image reducing the possibility of obtaining false positives. These segments will then be used as input for the firearm detection model which is a Convolutional Neural Network. For the detection of the firearm VGG Net can be used since it uses small convolutional filters while implementing large number of layers. According to [8], Fuzzy KNN (K-Nearest Neighbours) can be another potential technique which can be utilized for weapon detection in x-ray images. Since the scope of this research is to establish a method which can replace the conventional method, it was decided to explore other potential alternatives.
A. Convolutional Neural Networks
A Convolutional Neural Network (CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. Convolutional neural networks are composed of multiple layers of artificial neurons. Artificial neurons, a rough imitation of their biological counterparts, are the mathematical functions that calculate the weighted sum of multiple inputs and outputs an activation value. The behavior of each neuron is defined by its weights. When fed with the pixel values, the artificial neurons of a CNN pick out various visual features. CNNs are one of the oldest and one of the most popular neural networks used for Computer Vision. In the case of weapon detection, CNN can be utilized but the time complexity of CNN is a very significant factor why CNN will not be used for this Objective.
. B. Logistic Regression
Although using this concept for object detection might mean deviation from the tried and tested methods, it can prove to be a very powerful technique [14]. Basically, logistic regression is a statistical method for predicting binary classes. The outcome or target variable is dichotomous in nature. Dichotomous means there are only two possible classes. This means that this algorithm will classify images into weapon vs non weapons instead of just identifying the hidden object. It computes the probability of an event occurrence, which in this case is the presence of weapon. This algorithm can further be tweaked to identify the weapon by classifying the weapon as gun v/s knife and firearms v/s other weapons.
VI. TEST RESULTS
A. Image Processing
The Image processing stage mentioned in section IV was carried out on a couple of sample images and the test results are presented in Fig. 6, 7, 8, 9.
The segmentation procedure mentioned in section IV. D. has been carried out so that the fused image can be processed completely in order for the weapon to be detected without any hassles or problems. The results are shown in Fig. 10, 11, 12, 13.
. C. Weapon Detection/Classification
In order to detect/classify the weapon, we have used logistic regression as specified in section 5.2. A neural network model was created and was trained using a custom dataset created by us containing 1759 images of weapons. One of the major advantages of using logistic regression is that it is does not require a very large sample size of dataset to predict with high accuracy. Fig. 14 shows the performance parameters of the model.
In order to verify the performance parameters, 32 random images were passed to the model for testing. Table 1 shows the outcome of testing.
Table. 1. Testing Result Statistics