RF-based band selection results
A tentative RF-based band selection is applied to the hyperspectral data. The first step is to define the optimal value of “ntree”, “mtry” by iterative tests and adjusting input parameters. The error evaluated for RF guides us to estimate “ntree” (Fig. 7 b). Default value for “mtry” should be equal to the squared number of variables, while the optimal value for "mtry" was defined as 80 in this study. There is a turning point at 80 where error does not fluctuate anymore afterwards by testing various scales.
Then the variable importance is obtained from RF (Fig. 7 c). Three packages in R are utilized to select bands: "randomForest”, “VarSelRF” and “FSelector” respectively. All of them are RF-based algorithms except that the methods of calculating variable importance are different. The bands output from “VarSelRF”, which measures importance by eliminating variables based on OOB errors, exactly maximize the spectral contrast as shown in the spectral plot (Fig. 8 a). And the number of bands remained after selection should be 30 according to the OOB error trend generated from RF (Fig. 7 d). Thus the optimal band set is decided: "204,203,202,201,76,3,205,200,157,159,158,2,160,161,75,198,196,197,199,195,4,1,162,223,169,221,194,156,166".
Then the variable importance is obtained from RF (Fig. 7 c). Three packages in R are utilized to select bands: "randomForest”, “VarSelRF” and “FSelector” respectively. All of them are RF-based algorithms except that the methods of calculating variable importance are different. The bands output from “VarSelRF”, which measures importance by eliminating variables based on OOB errors, exactly maximize the spectral contrast as shown in the spectral plot (Fig. 8 a). And the number of bands remained after selection should be 30 according to the OOB error trend generated from RF (Fig. 7 d). Thus the optimal band set is decided: "204,203,202,201,76,3,205,200,157,159,158,2,160,161,75,198,196,197,199,195,4,1,162,223,169,221,194,156,166".
Figure 7. RF-based band selection and relevant analysis
RF selection compared with NSSA
NSSA is a band selection method using the “N-dimensional spectral solid angle”. The principle is to calculate the solid angles of n spectra at continuous bands and rank them, as a standard to measure the band importance (Minghua, 2014). Although subtle features from full bands are elaborately detected by NSSA, the time is too intensive due to the calculation burden of n-dimensional angles, and even more costly when the number of spectra is over 10. Random Forest offers a compensation for the time cost as it’s highly efficient on high dimensional data.
Even RF overlooks some features and maybe less accurate than NSSA, the spectral subset by RF selection could still discriminate those minerals. From a general comparison of the two methods (Fig. 8), the bands selected by NSSA reflects a better description of spectral features, specifically at the longer wavelength. The distinctive features can represent the 7 mineral types in some extent. However, the time cost would be huge to achieve the high accuracy. While RF needs less than 5 minutes, NSSA takes 2 hours based on the same data size. Therefore RF is both accurate and efficient.
A prospect of this analysis is to combine RF and NSSA for band selection. For example, it's practical to define the threshold of NSSA according to the number of bands selected by RF. Actually, the strategy of combination could be investigated in future researches.
Even RF overlooks some features and maybe less accurate than NSSA, the spectral subset by RF selection could still discriminate those minerals. From a general comparison of the two methods (Fig. 8), the bands selected by NSSA reflects a better description of spectral features, specifically at the longer wavelength. The distinctive features can represent the 7 mineral types in some extent. However, the time cost would be huge to achieve the high accuracy. While RF needs less than 5 minutes, NSSA takes 2 hours based on the same data size. Therefore RF is both accurate and efficient.
A prospect of this analysis is to combine RF and NSSA for band selection. For example, it's practical to define the threshold of NSSA according to the number of bands selected by RF. Actually, the strategy of combination could be investigated in future researches.
Figure 8. Band selection method comparison: RF and NSSA
Evaluation by mineral mapping
Mineral mapping helps us to illustrate the functionality of RF-based band selection. Mapping by spectral subset is fast for the reduced data, as well as precise because band selection decreases the correlation between the continuous bands. And the difference from the mapping results generally tells how well the RF performed. Some texture and shadow were detected after band selection. Nevertheless, an quantitative evaluation would be more objective to testify RF is capable to discriminate similar minerals with a small portion of the spectra. For this reason, test data of pure endmembers are used in the mapping with and without band selection. The mapping value of mineral abundance from the 30 RF selected bands (Fig. 9 a) are more concentrated on the base line (true value 1) than the original mapping (Fig. 9 b). It demonstrates that the mineral discrimination has been improved obviously through band selection.
Figure 9. Mapping result evaluation: full bands and selected bands
In conclusion, it’s significant to choose Random Forest as an effective band selection method for mineral mapping or any other analogous applications.The first part of results proves that RF is feasible for hyperspectral band selection, since this automatic machine learning method objectively captures the spectral feature of similar targets. In second analysis above, RF shows advantages in efficiency over NSSA. Lastly, the RF band selection indeed improves the accuracy of mineral mapping, indicated by a quantitative comparison between full bands and selected bands.