Random Forest Band Selection
What is Random Forest? Can RF measure variable importance? How to select bands by Random Forest? |
Random forest is a popular machine learning algorithm based on classification and regression tree (CART).It is a model building strategy providing estimators (predict response variable) which minimize the classification error. In RF, many binary decision trees are built using several bootstrap samples coming from the variables. Unlike CART, random forest doesn't perform pruning steps so all trees of the forest are maximal trees. And at each node, a subset of variables are randomly choosen, the best split is calculated only within the subset. There are two key parameters in RF which would be sensitive to the results: ntree (the number of regression trees) and mtry (the random selected variables).
In addition, an Out Of Bag (OOB) error is highly concentrated on the performance of RF model. It is considered as prediction error for regression and misclassification, based on the sample which is not included in the boostrap samples for constructing the tree. And randomly permuted variables in the OOB samples, importance indices of these variable can be calculated, forwardly processed to select the optimal variable subsets. Even RF has been increasingly used in statistical issues of variable selection, how to apply to hyperspectral data needs to be investigated. Hyperspectral band selection by RF follows the processes of other common variable selection as usual. A preliminary classification of the data is indispensable in RF. In this study, data consisted of 63 spectra of minerals, belonging to 7 types. Spectral data labeled with mineral types are put into the RF-based variable selection. Due to the uncertainty of input parameters “ntree” and “mtry”, the RF process should be iterated until satisfying error reached. With the optimal parameters, RF is applied to the data, then the selected variables and OOB error are obtained simultaneously. If the error is too high, we need to re-adjust the input parameters. The next step is to subset the spectra with the selected bands, use them to map the minerals. To assess the result, the original data of 256 bands is also put into mineral mapping without any reduction. Detailed validation will be shown in the results analysis. |
Figure 3. RF-based band selection approaches
NSSA Band Selection
This method of n-dimensional spectral solid angle (NSSA) is only used to compare with RF. NSSA calculates the n-dimensional spectral solid angle between a series of endmebers (spectra) rather than the traditional spectral angle (SA) which measures the similarity between two endmembers. NSSA has been proved its capability to select effective bands and capture the absorption features of spectra by hyperspectral unmixing (Minghua, 2014). Albeit well programmed, the NSSA method is too time-consuming.