# A T-CNN time collection classification methodology primarily based on Gram matrix

As talked about above, time collection are transformed to Gram time-domain photos, and the Gram time-domain photos are used because the enter matrix of convolutional neural networks for classification. As a way to clear up the issues of advanced computation and gradual coaching velocity of convolutional neural networks, we suggest a technique primarily based on the Toeplitz matrix product to exchange the convolution operation of the convolution layer, and introduce the thought of triplet community into the loss operate to enhance the effectivity and accuracy of classification.

### Convolution primarily based on Toeplitz matrix multiplication

The convolution operation primarily based on the Toeplitz matrix product is proven in Fig. 3. In Fig. 3, the darkish blue sq. represents the convolution kernel, and the sunshine blue sq. represents the matrix being convoluted. The convolution kernel is 2 × 2, the unconvoluted matrix is 3 × 3, and the step dimension is 1. The normal convolution is proven within the higher a part of Fig. 3. The convolution kernel strikes successively on the matrix to be convoluted in response to the step dimension of 1, and it requires 4 traversals of the whole matrix to be convoluted. After every traversal, the convolution kernel and the matrix half with its repeated sum are multiplied and collected, and the obtained worth is the native convolution end result on the corresponding place. For the reason that conventional convolution must traverse the entire picture, the computational complexity is excessive.

As proven within the decrease a part of Fig. 3, primarily based on the Toeplitz matrix product convolution, every 3 × 3 course of matrix obtained by the convolution kernel traversal matrix is expanded in keeping with row order to acquire a 4 × 1 × 9 row matrix, forming a big matrix H. Then, the convolution matrix is expanded right into a 9 × 1 column vector X in keeping with row association order. The product of the massive matrix H constructed by the convolution kernel and the column vector X to be constructed by the convolution matrix successfully replaces the convolution computation. Particularly, the convolution kernel matrix H consists of 6 small matrices, that are respectively the matrix within the pink field and the matrix within the yellow field, in addition to the zero matrix within the two white components.

In Fig. 3, the matrix within the pink field conforms to the definition type of the Toeplitz matrix. Equally, the matrix within the yellow field and the zero matrix are Toeplitz matrices. Subsequently, the convolution kernel matrix constructed is a big Toeplitz matrix composed of a number of small Toeplitz matrices. The product of the Toeplitz matrix is used to exchange the normal convolution operation. The convolution kernel is immediately constructed into the convolution kernel matrix with out traversing the picture so as of step dimension, and the product of the 2 matrices is calculated to cut back the computational complexity.

Definition 2 Toeplitz matrix: A matrix with the identical parts on every diagonal line from the highest left to the underside proper is a Toeplitz matrix, which has the properties (A_{i,j} = A_{i + 1,j + 1} = a_{i – j}). Mathematically,

$$A = left( {start{array}{*{20}c} {a_{0} } & {a_{ – 1} } & {a_{ – 2} } & {…} & {…} & {a_{{ – left( {n – 1} proper)}} } {a_{1} } & {a_{0} } & {a_{ – 1} } & ddots & {} & vdots {a_{2} } & {a_{1} } & ddots & ddots & ddots & vdots vdots & ddots & ddots & ddots & {a_{ – 1} } & {a_{ – 2} } vdots & {} & ddots & {a_{1} } & {a_{0} } & {a_{ – 1} } {a_{n – 1} } & {…} & {…} & {a_{2} } & {a_{1} } & {a_{0} } finish{array} } proper)$$

(9)

#### Toeplitz convolution kernel matrix Building

As a way to substitute the convolution calculation with the Toeplitz matrix multiplication operation, the convolution kernel matrix H is constructed because the Toeplitz convolution kernel matrix Ht. Given any convolution kernel matrix as comply with:

$$H = left( {start{array}{*{20}c} {h_{11} } & {h_{12} } & cdots & {h_{1D} } {h_{21} } & {h_{22} } & cdots & {h_{2D} } vdots & vdots & vdots & vdots {h_{C1} } & {h_{C2} } & cdots & {h_{CD} } finish{array} } proper)$$

(10)

The corresponding development steps of the Toeplitz convolution kernel matrix are as follows:

1. (1)

A small Toeplitz matrix is generated from every row factor of the convolution kernel matrix. For the reason that dimension of the convolution kernel matrix is C × D, the convolution kernel matrix H is split into C Toeplitz matrices: H0, H1, H2, H3, …, Hc-1, the place H0 is the zero interpolation of the factor h11 within the first row and first column of H, the variety of inserted zeros is the variety of columns within the convolution kernel matrix H minus 1, and the interpolation result’s taken as the primary row of H0. Then h12 is interpolated because the second row in response to the properties of the Toeplitz matrix till the two × D-1 rows are fashioned and the H0 development is accomplished. By analogy, Hi is the matrix obtained by interpolating the (i – 1) row parts of H. For instance, the convolution kernel matrix is (H = left[ {begin{array}{*{20}c} 1 & 2 3 & 4 end{array} } right]), then H is split into two matrices (H_{0} = left[ {begin{array}{*{20}c} 1 & 0 2 & 1 0 & 2 end{array} } right]) and (H_{1} = left[ {begin{array}{*{20}c} 3 & 0 4 & 3 0 & 4 end{array} } right]).

2. (2)

The small Toeplitz matrix obtained in Step (1) is fashioned into a big Toeplitz matrix:

$$H_{t} = left( {start{array}{*{20}c} {H_{0} } & 0 & {…} & 0 & 0 {H_{1} } & {H_{0} } & ddots & vdots & vdots {H_{2} } & {H_{1} } & ddots & 0 & 0 vdots & {H_{2} } & ddots & {H_{0} } & 0 {H_{c – 2} } & vdots & ddots & {H_{1} } & {H_{0} } {H_{c – 1} } & {H_{c – 2} } & vdots & {H_{2} } & {H_{1} } 0 & {H_{c – 1} } & {H_{c – 2} } & vdots & {H_{2} } 0 & 0 & {H_{c – 1} } & {H_{c – 2} } & vdots vdots & vdots & vdots & {H_{c – 1} } & {H_{c – 2} } 0 & 0 & 0 & cdots & {H_{c – 1} } finish{array} } proper)$$

(11)

Within the instance in Step (1), (H_{t} = left[ {begin{array}{*{20}c} {H_{0} } & 0 {H_{1} } & {H_{0} } 0 & {H_{1} } end{array} } right]) is obtained by Eq. (11), the place 0 represents a zero matrix of three × 2.

#### Toeplitz matrix convolution

After acquiring the Toeplitz convolution kernel matrix from “Toeplitz convolution kernel matrix development” part 8, the normal convolution might be changed by the Toeplitz matrix multiplication utilizing Eq. (12).

$$X*H = H_{t} instances X_{T}$$

(12)

the place (X = left( {start{array}{*{20}c} {x_{11} } & {x_{12} } & cdots & {x_{1B} } {x_{21} } & {x_{22} } & cdots & {x_{2B} } vdots & vdots & vdots & vdots {x_{A1} } & {x_{A2} } & cdots & {x_{AB} } finish{array} } proper)) denotes the matrix to be convolved, (H = left( {start{array}{*{20}c} {h_{11} } & {h_{12} } & cdots & {h_{1D} } {h_{21} } & {h_{22} } & cdots & {h_{2D} } vdots & vdots & vdots & vdots {h_{C1} } & {h_{C2} } & cdots & {h_{CD} } finish{array} } proper)) denotes the convolution kernel, Ht is the Toeplitz convolution kernel matrix in “Toeplitz convolution kernel matrix development” part, and XT is the column vector obtained by arranging all the weather of X in row order. Utilizing the total convolution methodology, the matrix to be convolved is stuffed with zeros, and the end result returns all the information after convolution. The row variety of the convolution end result matrix is M = A + C − 1 and the column variety of the convolution end result matrix is N = B + D − 1.

For instance, when (X = left[ {begin{array}{*{20}c} 5 & 6 7 & 8 end{array} } right]), then (X_{T} = left[ {begin{array}{*{20}c} 5 & 6 & 7 & 8 end{array} } right]^{T}), and the outcomes that use convolution calculation is (X*H = left[ {begin{array}{*{20}c} 5 & 6 7 & 8 end{array} } right]*left[ {begin{array}{*{20}c} 1 & 2 3 & 4 end{array} } right] = left[ {begin{array}{*{20}c} 5 & {16} & {12} {22} & {60} & {40} {21} & {52} & {32} end{array} } right]). The end result that makes use of convolution operation primarily based on the Toeplitz matrix is.

(H_{t} instances X_{T} = left[ {begin{array}{*{20}c} {H_{0} } & 0 {H_{1} } & {H_{0} } 0 & {H_{1} } end{array} } right] instances left[ {begin{array}{*{20}c} 5 & 6 & 7 & 8 end{array} } right]^{T} = left[ {begin{array}{*{20}c} 5 & {16} & {12} & {22} & {60} & {40} & {21} & {52} & {32} end{array} } right]^{T}).

Then the calculated column vector is rewritten right into a 3 × 3 matrix in response to M = A + C − 1 = 3 and N = B + D − 1 = 3, which is similar because the outcomes of the convolution calculation.

We use the Toeplitz matrix multiplication to successfully substitute the convolution operation. When it comes to time complexity, the enter time-domain picture dimension is A × B, and the convolution kernel dimension is C × D. The convolution operation requires the convolution kernel to constantly traverse the time area picture and calculate A × B × C × D instances multiplication.

When utilizing the Toeplitz matrix multiplication, it solely must calculate the matrix multiplication as soon as. It’s realized from Fig. 3 that there are a lot of zeros in every row of the matrix which doesn’t must be calculated. Thus, the precise calculation of every row is C × D, the row quantity is the time of convolution kernel traverses, and roughly multiply A × B × C × D instances. Subsequently, throughout a calculation, the calculation quantity of the 2 strategies is roughly the identical. Nevertheless, when a brand new time-domain picture is inputted into the normal convolution every time, there are a lot of shift operations within the calculation, which vastly will increase the calculation time.

Though it takes a while to assemble the Toeplitz matrix, Toeplitz matrix multiplication solely must assemble the corresponding Toeplitz matrix as soon as in response to the given convolution kernel, after which can immediately carry out the matrix multiply calculation on all of the enter time-domain photos to acquire the convolution end result. On this manner, for the datasets with a lot of pattern units and check units, the convolution operation time will probably be vastly diminished.

### T-CNN mannequin classification

When a CNN mannequin is used for classification, its totally related layers carry out convergence operations, and a given loss operate is required. On this paper, the Triplet community is launched into the loss operate, after which the T-CNN mannequin is proposed.

Let the pattern set of m samples is (left{ {left( {x^{left( 1 proper)} ,y^{left( 1 proper)} } proper),left( {x^{left( 2 proper)} ,y^{left( 2 proper)} } proper),…,left( {x^{left( m proper)} ,y^{left( m proper)} } proper)} proper}), there are n lessons in these samples, which (y^{left( i proper)}) represents the anticipated output of (x^{left( i proper)}), and the loss operate of CNNs is proven as Eq. (13):

$$Rleft( {omega ,b} proper) = frac{1}{m}sumlimits_{{i = 1}}^{m} {left( {frac{1}{2}left| {p_{{omega ,b}} left( {x^{{left( i proper)}} – y^{{left( i proper)}} } proper)} proper|^{2} } proper)}$$

(13)

the place (omega) is the load of every neuron, (b) is the bias, and (p_{omega ,b} left( {x^{i} } proper)) is the precise output of the pattern. The CNN mannequin constantly adjusts the parameter (omega) and (b) by coaching to reduce (Rleft( {omega ,b} proper)). Equation (13) is the sq. loss operate of the normal convolutional neural community mannequin, which solely considers the class of the picture itself and doesn’t think about the variations between totally different classes. Subsequently, we’ll enhance it later.

The CNN makes use of the gradient descent methodology to regulate the parameter (R(omega ,b)), as proven in Eqs. (14) and (15):

$$omega_{ij} = omega_{ij} – afrac{partial }{{partial omega_{ij} }}Rleft( {omega ,b} proper)$$

(14)

$$b_{ij} = b_{ij} – afrac{partial }{{partial b_{ij} }}Rleft( {omega ,b} proper)$$

(15)

the place a is the educational fee and (Rleft( {omega ,b} proper)) is the CNN loss operate. Equations (14) and (15) are used to replace the values of community parameters (omega) and (b). The calculation methodology is the gradient descent methodology. In different phrases, the worth of (omega) and (b) might be obtained when the spinoff of the loss operate is 0.

As a way to enhance the classification accuracy, the Triplet community is launched into the CNN loss operate for constraint, and a T-CNN mannequin primarily based on the Triplet loss operate is proposed. The concept of the T-CNN mannequin is to enter three-time area photos at a time, two of which belong to the identical class and one belongs to a different class. The T-CNN mannequin can acquire the function of the time area photos by coaching and might acquire the function distinction operate (L_{1}) of two-time area photos from the identical class and the function distinction operate (L_{2} ) of two-time area photos from totally different lessons. Then (L_{1} ) and (L_{2} ) are used to regulate the parameters of the T-CNN mannequin. (L_{1} ) and (L_{2}) are proven in Eqs. (16 )and (17) respectively:

$$L_{1} = frac{1}{2}left| {p_{{omega ,b}}^{{left( {l_{1} } proper)}} – p_{{omega ,b}}^{{left( {l_{2} } proper)}} } proper|^{2}$$

(16)

$$L_{2} = frac{1}{2}min left| {n_{{omega ,b}}^{{left( l proper)}} – p_{{omega ,b}}^{{left( {l_{i} } proper)}} } proper|^{2} ,left( {i = 1,2} proper)$$

(17)

the place (p_{omega ,b}^{{left( {l_{i} } proper)}}) is the output worth of the identical class and (n_{omega ,b}^{left( l proper)}) is the output worth of the totally different lessons. The picture function distinction capabilities are proven within the adjustment Eq. (18).

$$L_{T} = max (0,L_{1} – L_{2} + gamma )$$

(18)

the place (gamma) represents the minimal distance of the distinction operate between totally different lessons and between lessons (set to 0.1 on this paper). Within the experiment of this paper, the comparability experiment was carried out by altering the worth of (gamma), and the worth of (gamma) was 0.01, 0.05, 0.1, 0.2 and 0.5 respectively. The experiment discovered that 0.1 was the perfect experimental end result. In every reverse iteration, LT step by step approaches zero. As proven in Fig. 4, when the function distinction operate L1 of the identical class of photos is bigger than the function distinction operate L2 of various lessons of photos minus the parameter α, LT is bigger than zero, and the mannequin is adjusted in reverse to make L1 smaller and L2 bigger. Reference 21 has verified that the Triplet loss operate could make samples of the identical type shut to one another and samples of various varieties removed from one another.

In Fig. 4, A and P belong to the identical class, whereas N doesn’t belong to the identical class as A and P. Earlier than the adjustment, the space between A and P is bigger than that between A and N, and the distinction operate LT is bigger than zero. Thus, the mannequin parameters must be adjusted in reverse. After the adjustment, the space between A and N turns into bigger, whereas the space between A and P turns into smaller.

In response to Eqs. (16) and (17), in every reverse iteration, it may be seen that L1 will make the function distinction of the identical class smaller, whereas L2 will make the function distinction of various lessons bigger. On this foundation, a Triplet loss operate is proposed as proven in Eq. (19):

$$Lleft( {omega ,b} proper) = Rleft( {omega ,b} proper) + alpha L_{1} – beta L_{2}$$

(19)

the place (Rleft( {omega ,b} proper)) denotes the CNN sq. loss operate, (alpha) and (beta) are the load proportion coefficients better than zero. Within the experiment, we examined the values of (alpha) and (beta). The values of (alpha) had been 0.1, 0.01, 0.3, 0.4 and so forth, and the values of (beta) had been 0.9, 0.99, 0.7, 0.6 and so forth. After a number of experiments, it was discovered that the values of (alpha) and (beta) had been 0.4, 0.6 respectively, and the experimental impact was the perfect. L1 is the function distinction operate of the identical class, and L2 is the function distinction operate of various lessons. Subsequently, the brand new residual error of every layer by the backpropagation algorithm is as follows:

$$omega_{ij} = omega_{ij} – afrac{partial }{{partial omega_{ij} }}Lleft( {omega ,b} proper)$$

(20)

$$b_{ij} = b_{ij} – afrac{partial }{{partial b_{ij} }}Lleft( {omega ,b} proper)$$

(21)

The T-CNN mannequin primarily based on the Triplet community provides the function distinction operate between the identical class and the function distinction operate between totally different lessons right into a cross-entropy loss operate, which is conducive to permit the parameters to extract options with bigger variations extra rapidly within the means of parameter weight adjustment. The partial spinoff of (Lleft( {omega ,b} proper)) could make the backpropagation residual calculation to acquire new parameters (omega) and (b). Every iteration is extra inclined to the path of gradient descent, which might make the mannequin converge quicker and enhance the classification effectivity.

The T-CNN mannequin construction used on this paper is 5 × 5 convolution of 128 neurons within the first layer, 5 × 5 convolution of 128 neurons within the second layer, most pooling layer of two × 2 within the third layer, 3 × 3 convolution of 256 neurons within the fourth layer, 3 × 3 convolution of 256 neurons within the fifth layer, most pooling layer of two × 2 within the sixth layer, 1024 neurons within the full connection layer within the seventh layer. The loss operate is the Triplet-based loss operate, the activation operate is the sigmoid operate, and the worth vary of the operate is (0,1). Determine 5 is the mannequin construction. T-CNN time collection classification algorithm is proven in Algorithm 1.