Abstract:Species distribution models (SDMs) have been used in various applications, such as conservation planning, determining the impact of climate change on species distribution, and others. SDMs allow construction of the correlation relationship between occurrence of a target species and environmental conditions (including bioclimatic and anthropogenic conditions). The correlation relationship can then be applied to the entire environmental space to predict the potential distribution of a target species. In the present study, we first review widely-used SDMs and summarize their evaluation approaches. Generally, SDMs can be classified into two categories according to the data required for construction of the correlation relationship, i.e., SDMs that predict the potential distribution of species based on presence-only records (PO models), and SDMs that use presence-absence records (PA models). If reliable absence records are available, PA models are suggested. Additional classification of SDMs is based on output format, namely, SDMs that give prediction results in the format of continuous probabilities (the higher the probability, the more suitable for distribution), and those with results in the format of binary values (1 for suitable and 0 for unsuitable). According to the various SDM output formats, SDM performance can be evaluated by threshold-independent (for models with continuous probabilities) and threshold-dependent (for models with binary prediction) strategies. Threshold-independent strategies can be realized by calculating values of maximum overall accuracy, maximum kappa, maximum vertical distance, area under the receiver operating characteristic (ROC) Curve (AUC), Gini index, point biserial correlation coefficient, mean square error, root mean square error, coefficient of determination, mean absolute prediction error, and others. Threshold-dependent strategies can be realized by calculating values of sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, true skill statistic, odds ratio, Yule's Y, Yule's Q, Phi coefficient, Kappa, normalized mutual information, extreme dependency score, and others. Based on existing problems related to the development, application and evaluation of SDMs, the present study suggests the following. (1) Because biased presence samples can influence the result of prediction, integrating a sample selection module in the SDM could improve the reliability of model prediction. (2) Bioclimatic variables (such as WorldClim) that are calculated from a Digital Elevation Model (DEM) may co-linearly correlate with each other, and such collinearity may result in overfitting when modeling the potential distribution of a species. As a result, selecting variables based on calculation of the Variance Inflation Factor (VIF) is a suitable means to avoid overfitting. (3) In addition to abiotic factors, biotic factors are also important determinants for species distribution. Thus, the use of biotic variables could improve the model results, although biotic factors are not easy to delineate within a geographic space. (4) The spatial and temporal extrapolation of SDMs, which deal with problems of species potential distribution at different geographic ranges and time points (past and/or future), respectively, are actually based on the assumption of an equilibrium relationship between the target species and environmental conditions. However, this assumption is challenged, because species have the abilities of adaption and dispersal. (5) The Partial AUC (PAUC) is suitable for evaluation of single model performance, and the Akaike Information Criterion (AIC) could provide an objective evaluation of the performance of several SDMs.