Feature Selection Algorithm Based on Multi Strategy Grey Wolf Optimizer

. Feature selection is an important part of data mining, image recognition and other ﬁelds. The eﬃciency and accuracy of classiﬁcation algorithm can be improved by selecting the best feature subset. The classical feature selection technology has some limitations, and heuristic optimization algorithm for feature selection is an alternative method to solve these limitations and ﬁnd the optimal solution. In this paper, we proposed a Multi Strategy Grey Wolf Optimizer algorithm (MSGWO) based on random guidance, local search and subgroup cooperation strategies for feature selection, which solves the problem that the traditional grey wolf optimizer algorithm (GWO) is easy to fall into local optimization with a single search strategy. Among them, the random guidance strategy can make full use of the random characteristics to enhance the global search ability of the population, and the local search strategy makes grey wolf individuals make full use of the search space around the current best solution, and the subgroup cooperation strategy is very important to balance the global search and local search of the algorithm in the iterative process. MSGWO algorithm cooperates with each other in three strategies to update the location of grey wolf individuals, and enhances the global and local search ability of grey wolf individuals. Experimental results show that MSGWO can quickly ﬁnd the optimal feature combination and eﬀectively improve the performance of the clas-siﬁcation model.


Introduction
Feature selection plays an important role in machine learning, data mining and other classification applications.Its goal is to remove the noise in the original Supported by organization from the National Natural Science Foundation of China (No.61673396), and the Natural Science Foundation of Shandong Province, China (No.ZR2017MF032).
data and select the best identification feature.In addition, feature selection can improve the efficiency of classification by reducing the dimension of the original data.In recent years, more and more heuristic search algorithms are used for feature selection.Heuristic search algorithm can get a group of solutions at a time, which can get good results with less time and calculation cost.Many experts and scholars have done a lot of research on heuristic search algorithm: Genetic Algorithm (GA) is an evolutionary algorithm, which can search randomly and find the optimal solution based the evolution law of nature [1].Particle Swarm Optimization (PSO) is a classical swarm intelligence optimization algorithm, which is based on the research of birds' predatory behavior.Each solution is regarded as a particle with a specific position, fitness and speed vector, and its motion direction and speed are adjusted according to the global optimal solution and the optimal solution found by the particle itself, and gradually approach the optimal solution [2].The Whale Optimization Algorithm (WOA) is a heuristic optimization algorithm, which simulates the predatory behavior of humpback whales in nature.Compared with other swarm optimization algorithms, the main difference is that WOA simulates the bubble net attack of whales by following the best or random individuals and using the spiral mechanisms [3].The Grey Wolf Optimizer (GWO) is a new evolutionary algorithm, which mainly simulates the predatory behavior of grey wolf group, through the process of wolves tracking, encircling, chasing and attacking to achieve the purpose of optimized search [4].As GWO has the advantages of simple principle, few parameters to be adjusted, easy to realize and strong global search ability, its research has made remarkable progress.Emary and others first applied GWO to feature selection in 2015, and proposed two binary GWO feature selection methods based on different update mechanisms [5].
GWO algorithm can not effectively find the global optimal feature combination due to its single search strategy and insufficient global search ability [6].Therefore, in order to improve the effectiveness of GWO for feature selection, this paper proposes a Multi Strategy Grey Wolf Optimizer algorithm (MSGWO), which solves the problem caused by a single search strategy and improves the accuracy of the original GWO.MSGWO includes three different search strategies--Random guidance strategy, local search strategy and sub group cooperation strategy.The grey wolf optimizer algorithm with three strategies can further improve the search efficiency and find the optimal feature combination.

Grey Wolf Optimizer (GWO)
GWO is an intelligent optimization algorithm proposed by Mirjalili [4] in 2014.Due to its simple principle, fewer parameters to be adjusted, simple implementation and strong global search ability, the method is becoming more and more popular.Many research have been carried out using GWO [7][8][9][10][11].GWO algorithm is inspired by the predatory behavior of grey wolves, and it optimizes search through hunting, searching for prey, encircling prey, and attacking prey.There is a strict hierarchy between them.α, β, δ and ω represent different grades of grey wolves, and the dominance rate decreases from top to bottom.In order to model the grey wolf's social system mathematically, α is regarded as the optimal solution, β and δ are regarded as the suboptimal solution and the third optimal solution, respectively.They lead other wolves toward the possible position of prey.ω is regarded as the rest of the solutions, which is updated according to the positions of α, β and δ.Three definitions of the algorithm [4] are given below.Definition 1 Distance between Grey Wolf and Prey Where t indicates the current iteration, − → X p represents position vector of prey, − → X (t) represents current position vector of grey wolf.
Where r 1 is random vector in [0, 1], − → C is coefficient vector.We can explore and exploit search space by randomly enhancing (C > 1) or weakening (C < 1) the distance between prey and grey wolf.Definition 2 Update position of Grey Wolf Where components of − → a are linearly decreased from 2 to 0, r 2 is random vector in [0,1].As A decreases, half of the iterations are used for exploring (|A| > 1), and the rest for exploiting (|A| < 1).Definition 3 Determine position of prey In the abstract search space, the exact position of the prey (optimal solution) is not known.According to the hierarchy of grey wolves, hunting is usually guided by α, β and δ.Therefore, it is assumed that α (optimal candidate solution), β (suboptimal candidate solution), and δ (third optimal candidate solution) have a better acquaintance of the position of prey.It is known that grey wolves α, β and δ are closest to prey.By preserving the obtained three optimal solutions during each iteration, the orientation of prey can be determined according to the positions of the three optimal solutions, and other grey wolf individuals are forced to update their positions according to the three optimal solutions.The mathematical descriptions of grey wolf individuals tracking prey orientation are as follows: The distances between grey wolf individuals and α, β and δ are calculated in terms of formulas ( 5) and (6).Then the direction of grey wolf individuals moving towards prey are judged in terms of formula (7) 3 Multi Strategy Grey Wolf Optimizer algorithm

Random guidance strategy
In GWO, α, β and δ lead ω to the promising region to search for the optimal solution, but only following the optimal solution to update is easy to lead to premature convergence in the current optimal position, making GWO fall into the local optimum.In this paper, we randomly select a grey wolf position −−−→ X rand , and make other individuals update the position according to −−−→ X rand .The mathematical description is as follows: Where −−−→ D rand is the distance between grey wolf individual and ) is the updated position, and − → C g is the random vector.This strategy can make the gray wolf individuals that converge too early jump out of the local optimum, expand the global search range of the population, and increase the possibility of finding the global optimum solution.

Local search strategy
Because the whale optimization algorithm approaches the prey according to the shrinking encirclement mechanism, and moves along the spiral path according to the spiral renewal mechanism, it can expand the local search range of whales.Inspired by the whale optimization algorithm, this paper improves the single update mechanism of grey wolf position, which makes the grey wolf individuals explore the surrounding solutions while moving towards the optimal solution, greatly expanding the range of local search.The mathematical description of the strategy is as follows: Where − → D is the mean distance between grey wolf individual and α, β, δ, b is constant defining the shape of a logarithmic spiral, and l is the random number between [-1, 1].
The local search strategy is to make the grey wolf move along the spiral path as well as within the shrinking circle.In order to simulate the two simultaneous behaviors, we assume that there is a 0.5 probability of the contraction encirclement mechanism or spiral update mechanism to choose to update the location of the grey wolves.The mathematical description of the strategy is as follows: Where p is the random number between [0, 1].

Sub group cooperation strategy
In order to give full play to the advantages of random guidance strategy and local search strategy, expand the search space of the algorithm as much as possible, and guide algorithm jump out of the local optimum, this paper proposes a sub group cooperation strategy, the basic idea is as follows: In the evolutionary process, the population is divided into three subgroups A, B and C according to the fitness values.A represents the subgroup with large fitness value, B represents the subgroup with medium fitness value, and C represents the subgroup with poor fitness value.The fitness value of grey wolf individuals in subgroup A is large, which indicates that the convergence degree is high and it is easy to fall into local optimum.In this case, grey wolf individuals update their positions according to formula (13), so that the individual can search around the extreme point more finely, find the position with better fitness than before, enhance the local search ability.The fitness value of grey wolf individuals in subgroup B is medium, which is updated according to the formulas ( 5) -( 7) in the standard grey wolf algorithm.The fitness value of grey wolf individuals in subgroup C is poor, which is updated according to the formulas ( 8) - (10), and this subgroup can cover all possible solutions as much as possible by the random guidance strategy, so as to enhance the global search ability.The grey wolf individuals in subgroups A, B and C evolve according to their own update strategies, and each grey wolf individual migrates to the corresponding subgroup according to the new fitness value after every iteration until the termination condition is satisfied.

Multi Strategy Grey Wolf Optimizer algorithm (MSGWO)
In this paper, we combine the random guidance strategy, local search strategy, sub group cooperation strategy and the standard grey wolf algorithm to propose a Multi Strategy Grey Wolf Optimizer algorithm.In the continuous MSGWO, each individual can change its position to any point in the space.The purpose of this paper is to use MSGWO for feature selection, so the value of each dimension in the individual position is limited to 0 or 1. 0 means that the feature in the corresponding position is not selected, and 1 means that the feature in the corresponding position is selected.Thus, the updating formula of MSGWO for feature selection is as follows: sigmoid is defined as follows: In order to provide the more intuitive description of the MSGWO algorithm, we draw an algorithm flowchart of it.It is shown in Fig. 1.
As can be seen from Fig. 1, the local search strategy enables grey wolf individuals to make full use of the search space around the current best solution, which has good local search ability and helps to find more accurate solutions.For the random guidance strategy, we make full use of the random characteristics to improve the diversity of the population, thus enhancing the global search ability.In addition to the above two strategies, the sub group cooperation strategy is very important to balance the global search and local search in the iterative process.It not only guarantees the convergence speed of the algorithm, but also expands the search range of the population, and prevents the algorithm from falling into local stagnation in the later stage of the iteration.Due to the slow convergence speed in the early stage and the fast convergence speed in the later stage, the time complexity of the MSGWO is the same as that of the GWO.

Experiment
In order to prove the effectiveness of MSGWO proposed for feature selection in this paper, we evaluated the proposed algorithm using the Vehicle, Wine, Glass, Zoo, Landsat and Segment public datasets from UCI repository.Table 1 lists the details of the six datasets used in the evaluation.In order to prove the superi- ority of MSGWO in feature selection, we used the classical K-Nearest-Neighbor (KNN) classifier and Support Vector Machine (SVM) as the benchmark classifier, and compared the performance on six public datasets of our proposed MS-GWO method and several classical feature selection methods including analysis of variance (ANOVA) [12], principal component analysis (PCA) [13], PSO [2], WOA [3] and GWO, in which no feature selection is recorded as NFS.In the experiment, KNN and SVM adopt default parameters.For KNN, k-value is set to 5. For SVM, penalty parameter c is set to 1 and the gauss kernel is set to the kernel function.We carried on experiments by using five-folds cross validation for 5 times, and set the number of grey wolves in GWO, particles in PSO and killer whales in WOA were all set to 15. manually specify the number of selected features, and the most appropriate number of features is difficult to determine.The results of feature selection by heuristic method are ideal.The performance of standard GWO algorithm for different datasets is sometimes better than PSO and WOA, and sometimes lower than PSO and WOA.The MSGWO proposed in this paper is not only better than GWO, but also better than PSO and WOA for all the datasets listed in this paper in terms of accuracy and F1 value, which proves the effectiveness of MSGWO.
Because the heuristic method is a random search method, the results of each search may be different.In order to verify that the MSGWO method proposed in this paper can not only select the optimal feature combination, but also has good stability.In this paper, only the accuracy is analyzed.Fig. 2 and Fig. 3 show the box diagram of the accuracy of five times random experiments among four feature selection algorithms on SVM and KNN classifiers.From Fig. 2 and Fig. 3, it can be concluded that whether SVM or KNN is used as the base classifier, the maximum, minimum and average accuracy of MSGWO algorithm is superior to the other three heuristic methods, which further proves that MSGWO can effectively improve the performance of the classifier.In addition, by introducing the  4, it can be seen that accuracy of MSGWO is relatively stable and the standard deviation is the smallest in almost all datasets, which proves that MSGWO is stable.

Conclusion
In this paper, we propose a Multi Strategy Grey Wolf Optimizer algorithm based on random guidance, local search and sub group cooperation strategies for feature selection.In MSGWO, the search agent updates its position through the cooperation of three search strategies, which improves the global and local search ability of the algorithm.MSGWO not only retains the advantages of the fast convergence speed of GWO algorithm, but also makes full use of characteristics of various search strategies, and balances the global and local search ability, which makes it easy to find the optimal feature subset.In this paper, we use a variety of feature selection methods on six public datasets for comparative experiments.The results show that our MSGWO feature selection method can improve the accuracy of search, find the optimal solution, and is an efficient and reliable algorithm.

Fig. 2 .
Fig. 2. Comparison of the accuracy of heuristic feature selection methods on SVM

Fig. 3 .
Fig. 3. Comparison of the accuracy of heuristic feature selection methods on KNN

Table 1 .
Details of the six datasets used in the evaluation

Table 2 .
Table 2 and table 3 list the comparison of the average values of accuracy and F1 of five-folds cross validation on SVM and KNN, respectively.From table 2 and table 3, we can see that the classification effect of NFS is very poor, which shows the necessity of feature selection.The performance of ANOVA and PCA is not high, because the filter selection method needs to Comparison of classification performance between various feature selection algorithms on SVM

Table 3 .
Comparison of classification performance between various feature selection algorithms on KNN