Birds play an important role in nature. Knowing the birds in a specific area can help us understand the ecology of the area, and can effectively evaluate the environmental quality of the area’s ecology, which is of great significance to the protection of the natural environment. Bird recognition helps us get along better with nature, and also provides a new perspective for researchers to maintain ecological balance and monitor ecology. As the language of birds, birdsongs is an important physiological feature of birds, and there are great differences in the birdsongs of different species of birds1,2. Therefore, birds recognition focuses on birdsongs. At present, many researchers have collected birdsongs signals and carried out a lot of research work. With adopting various feature parameters extraction techniques for birdsongs signals, machine learning algorithms are used to classify and recognize birdsongs. Extracting of the exact feature parameters and exploiting better learning algorithm will play a key role in the classification results.
Feature parameters commonly used in birdsongs recognition technology include MFCC, short-time energy (STE), linear predictive cepstral coding (LPCC) and linear predictive coding (LPC), etc. For example, Wang et al. took eight kinds of birds as the research object, and divided the birdsongs into bird tweet and bird sing to extract their MFCC feature parameters, and used the dual gaussian mixture model for training and recognition3. Xu et al. studied birdsongs recognition based on syllable length, MFCC, and DTW model based on LPC, combined with time–frequency texture feature and multi-label classifiers. With eleven kinds of birds as the research object, it selected the optimal feature parameters and classifiers to improve the recognition effect of a single classifier4.
Classical classifiers include random forest, decision tree, gaussian mixture model, neural network, ELM, etc. ELM is a randomized fast learning algorithm with good generalization ability5,6. Using ELM to study classification and recognition problems has become a research hotspot. For example, Xue et al. classify and recognize power quality events based on wavelet transform and ELM, which can effectively recognize eight kinds of disturbances and have strong robustness7. Lin et al. assisted in the diagnosis of Alzheimer’s disease based on ELM, and the accuracy of this method in diagnosing Alzheimer’s disease reached 87.62%8. Venkatalakshmi et al. extracted breast X-ray image feature set and combined ELM classifier to classify normal, malignant and benign breast cancer. The accuracy, sensitivity and specificity of the method are better than that of similar technology9. Kashif et al. proposed an ELM-based consonant phoneme recognition model for the accent recognition of different pronunciations of English consonant phonemes by native Arabic speakers, the accuracy of the model reached 88%10.
Because ELM generally randomly generates input layer weights and hidden layer thresholds, and then obtains output weights through calculations. There is no uniform form for the selection of parameters, and only a large amount of training and learning can be used to obtain the optimal parameter value. This method takes a long time because of calculation complexity. The final result may not be the optimal solution, and the performance of the classifier is unstable11. Therefore, it is necessary to use intelligent algorithm to optimize ELM parameters to make the classifier achieve better results. DE is a population-based random search algorithm12,13, which conducts intelligent search through mutation and crossover, and ensures that the best individual can be further utilized. With fast convergence speed and good global search performance, DE is one of the most powerful and universal evolutionary optimizers in continuous parameter space. For example, to reduce the prediction time of ELM and avoid falling into local optimality, Yang et al. proposed a differential evolution coral reef optimization algorithm with hybrid DE and metaheuristic coral reef optimization to balance exploration and development capabilities to achieve better performance14. Dahou et al. combined DE and convolutional neural networks (CNN) to solve the problem of Arabic sentiment classification, and used DE algorithm to optimize CNN parameters. Experiments show that DE-CNN has good performance in terms of accuracy and time consumption15. Li et al. used principal component analysis to reduce the dimensionality of the input feature and used the sequence floating backward algorithm to perform feature selection, and then input the optimal feature set into the differential evolution ELM to evaluate the transient stability of the power system. Compared with other ELMs, this model greatly improved its performance in transient stability classification evaluation16.
However, the standard DE algorithm often leads to premature convergence and search stagnation17. Therefore, many scholars conducted research on DE algorithm improvement. For example, Singh et al. used multi-objective DE to adjust the initial parameters of CNN and the optimized CNN can effectively classify chest CT images for COVID-1918. A memetic differential evolution algorithm was proposed to solve the problem of text clustering, improved the mutation strategy of DE and mixed it with the memetic algorithm, and was superior to other clustering algorithms based on AUC measurement, F-measure, statistical analysis and existing text clustering algorithms19. Vivekanandan et al. used “DE/rand/2/exp” as the differential strategy to select optimal feature of cardiovascular disease, using fuzzy analytic hierarchy process and feedforward neural network to predict heart disease, the accuracy of the model reached 83%20. Duan et al. combined “DE/best/2” mutation operator and “DE/rand/2” mutation operator to form a dual-strategy and dynamically adjusted control factor \(\uplambda\) during the evolution process, this algorithm can significantly improve the global optimization performance21. However, the mutation strategies adopted by these articles for DE are single strategy or dual strategy, and they all used a single classifier for classification. In order to improve population diversity, convergence speed and global search ability, this paper proposes a multi-strategy mutation of DE algorithm. And the classification model of ensemble multiple DE-optimized ELMs will be built to enhance the recognition effect and generalization ability.
This paper takes birdsongs as the research object. Firstly, the MFCC feature parameters of the birdsongs data are extracted, and in order to maintain the time domain continuity of the audio signal22, performing differential calculation on MFCC. Secondly, a multi-strategy mutation is formed by combination of three strategies, while using adaptive adjustment control parameters (scaling factor F and crossover probability CR) to improve the standard DE. The input layer weights and the hidden layer thresholds of ELM are adjusted through the DE. Finally, we ensemble optimized ELM model to classify birdsongs. This model can better solve the problems of unstable performance of ELM classifier and difficulty in determining the number of optimal hidden layer neurons in birdsongs recognition.
The main contributions of this paper can be summarized as follows:
Adopt the multi-strategy mutation in DE algorithm (M-SDE) to improve the population diversity and global search ability;
Use the M-SDE algorithm to optimize the hidden layer thresholds and input layer weights of the ELM model;
Extract the differential MFCC feature parameters of birdsongs, and build the ensemble optimized ELM (M-SDE_EnELM) to improve model stability and recognition accuracy for birdsongs.
The rest of this paper is organized as follows: Firstly, the ELM and differential evolution are described. Secondly, we propose multi-strategy differential evolution algorithm. Thirdly, we introduce the MFCC feature parameters extraction process of birdsongs and the birdsongs recognition model based on ensemble ELM with multi-strategy differential evolution algorithm. Fourthly, experimental results and limitations are discussed. Finally, we give the conclusions.