# Data augmentation

Data augmentation is a technique in machine learning used to reduce overfitting when training a machine learning model,[1] by training models on several slightly-modified copies of existing data.

## Data augmentation for signal processing

Residual or block bootstrap can be used for time series augmentation.

### Biological signals

Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and scarce. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Data scarcity is notable in signal processing problems such as for Parkinson's Disease Electromyography signals, which are difficult to source - Zanini, et al. noted that it is possible to use a Generative adversarial network (in particular, a DCGAN) to perform style transfer in order to generate synthetic electromyographic signals that corresponded to those exhibited by sufferers of Parkinson's Disease.[2]

The approaches are also important in electroencephalography (brainwaves). Wang, et al. explored the idea of using Deep Convolutional Neural Networks for EEG-Based Emotion Recognition, results show that emotion recognition was improved when data augmentation was used.[3]

A common approach is to generate synthetic signals by re-arranging components of real data. Lotte[4] proposed a method of "Artificial Trial Generation Based on Analogy" where three data examples ${\displaystyle x_{1},x_{2},x_{3}}$ provide examples and an artificial ${\displaystyle x_{synthetic}}$ is formed which is to ${\displaystyle x_{3}}$ what ${\displaystyle x_{2}}$ is to ${\displaystyle x_{1}}$. A transformation is applied to ${\displaystyle x_{1}}$ to make it more similar to ${\displaystyle x_{2}}$, the same transformation is then applied to ${\displaystyle x_{3}}$ which generates ${\displaystyle x_{synthetic}}$. This approach was shown to improve performance of a Linear Discriminant Analysis classifier on three different datasets.

Current research shows great impact can be derived from relatively simple techniques. For example, Freer[5] observed that introducing noise into gathered data to form additional data points improved the learning ability of several models which otherwise performed relatively poorly. Tsinganos et al.[6] studied the approaches of magnitude warping, wavelet decomposition, and synthetic surface EMG models (generative approaches) for hand gesture recognition, finding classification performance increases of up to +16% when augmented data was introduced during training. More recently, data augmentation studies have begun to focus on the field of deep learning, more specifically on the ability of generative models to create artificial data which is then introduced during the classification model training process. In 2018, Luo et al.[7] observed that useful EEG signal data could be generated by Conditional Wasserstein Generative Adversarial Networks (GANs) which was then introduced to the training set in a classical train-test learning framework. The authors found classification performance was improved when such techniques were introduced.

### Mechanical signals

The prediction of mechanical signals based on data augmentation brings a new generation of technological innovations, such as new energy dispatch, 5G communication field, and robotics control engineering.[8] In 2022, Yang et al.[8] integrate constraints, optimization and control into a deep network framework based on data augmentation and data pruning with spatio-temporal data correlation, and improve the interpretability, safety and controllability of deep learning in real industrial projects through explicit mathematical programming equations and analytical solutions.