multivariate adaptive regression spline
در نشریات گروه جغرافیا-
امروزه شناسایی فاکتورهای موثر بر آتش سوزی جنگل ها از اهمیت بسیار بالایی برخوردار است زیرا سالانه مساحت زیادی از جنگل های جهان بر اثر آتش سوزی نابود می شوند و تکرار این اتفاق در بلندمدت می تواند خسارات جبران ناپذیری بر زمین و ساکنین آن وارد کند. با شناسایی این فاکتورها می توانیم زمان ها و نقاط دارای ریسک بالای آتش سوزی را شناسایی نماییم و با وضع قوانین و سیاست های مدیریتی کارآمد، آموزش به مردم و نظارت بیشتر در جهت مقابله با عوامل محرک آتش برآییم. در این تحقیق سعی شده است فاکتورهای موثر بر آتش سوزی های جنگل گلستان شناسایی شود و برای این منظور از سه روش رگرسیون خطی چندگانه، رگرسیون لجستیک و رگرسیون اسپلاین تطبیقی چندمتغیره در ترکیب با الگوریتم ژنتیک استفاده شد. نتایج این تحقیق نشان داد که هر دو دسته فاکتورهای بیوفیزیکی و انسانی در آتش سوزی های منطقه مورد مطالعه دارای تاثیر هستند. از این میان تنها فاکتورهای حداقل دما و حداکثر سرعت باد در هر سه حالت موثر شناخته شدند. روش رگرسیون اسپلاین تطبیقی چندمتغیره در مقایسه با دو روش دیگر عملکرد بهتری از خود نشان داد. مقدار RMSE نرمال شده این سه روش برابر 0/4291 برای رگرسیون خطی چندگانه، 0/9416 برای رگرسیون لجستیک و 0/1757 برای رگرسیون اسپلاین تطبیقی چندمتغیره و مقدار R2 آن ها نیز به ترتیب برابر 0/9862، 0/9912 و 0/9886 به دست آمد.
کلید واژگان: آتش سوزی جنگل، رگرسیون اسپلاین تطبیقی چندمتغیره، رگرسیون خطی چندگانه، رگرسیون لجستیک، الگوریتم ژنتیکIntroductionNowadays, Determining the effective factors on fire is so important, because the plenty areas of forests around the world are destroyed annually by fire and recurrence of that in the long term can irreparably damage to the earth and its inhabitants. It helps us to identify most dangerous locations and times in forest fire. Hence, we can prevent many of driving factors of forest fire by law enforcement, efficient forest management policies and more supervision. In the current study, we identified the effective factors on the fire in Golestan forest through integration of three different methods including multiple linear regression, logistic regression and multivariate adaptive regression spline with Genetic Algorithm.
Study Area
Golestan Province is in the North of Iran and 18% of it is covered by forests. Golestan Province is a touristic province and several roads pass through its forests and according to statistical records, most of the occurred fires were in proximity of these roads. Our study area is located in 36°53′-37°25′N and 55°5′- 55°50′E and its area is about 3719.5 km2. We selected this area, because includes the most of fires have been occurred in Golestan Province in recent years.Materials and MethodsA big fire was occurred on 12 December, 2010 in our study area and we used it as the dependent variable. The actual burnt area and some other data, such as Digital Elevation Model (DEM), the roads network, the rivers, the land uses, and soil types in the area were provided from Golestan Province Department of Natural Resources. Also, geographic coordination of the synoptic weather stations near the area and their data, including maximum, minimum, and mean temperature; total rainfall, as well as maximum wind speed and azimuth in December 2010 were obtained from National Meteorological Organization of Iran.
The land use and soil layers were in scale of 1:100000 and the roads and the rivers layers were in 1:5000 and all of them were provided in 2006. The region DEM is generated from topographic maps of Iran National Cartographic Center in scale of 1:25000 with positional resolution of 30m and we produced the slope and the aspect layers from it in ArcGIS software with the same resolution. The roads and the rivers were in vector format, hence, we used the Euclidean Distance analysis to generate rasters that each cell of them shows the distance from the nearest road or river.
At first we had 5 weather stations, which is very few for GWR. In this regard, we generated 1000 random points in the area and interpolated data to these points using Ordinary Kriging method with exponential semivariogram model in 30m resolution in ArcGIS software.
The multiple linear regression (MLR) model is the generalization of simple linear regression that is modeling the linear relation between one dependent variable and some independent variables. The general formula of MLR is seen below: (1)The unknown coefficients are obtained using least squares adjustment as follows: (2)The logistic regression (LR) model is a nonlinear model for determination of the relation between a binary dependent variable and some independent variables. If we use the values of 0 and 1 for non-fire and fire points respectively, then the probability that a point be a fire point is obtained by Eq. (3): (3)If the number of parameters is insignificant compared to the observations, then we use the unconditional maximum likelihood estimation shown by Eq. (4) to compute the unknown coefficients of this model. (4)The multivariate adaptive regression spline (MARS) model is a flexible non-parametric model that requires no assumption about the relation between the dependent andindependent variables. Hence it has a high ability in determination of complex nonlinear relations among the variables. The general formula of MARS is seen below: (5) is the m’th basic function that is obtained by Eq. (6): (6)These basic functions are chosen in such a way that leads to minimum RMSE of model.
We use the genetic algorithem (GA) with the fitness function of the normalized RMSE to select the optimum combination of effective factors on forest fire.Results and DiscussionIn this paper we study the dependence of the forest fire to 14 factors shown in table 1, in the study area. Our results are shown in figures 1 to 3.
ConclusionThis research shows that both of the biophysical and anthropogenic factors have significant effects on forest fire in our study area. Just two factors were identified as impressive factors in all three cases including the minimum temperature and the maximum speed of wind. This study concluded to the NRMSE=0.4291 and R2=0.9862 for the multiple linear regression, NRMSE=0.9416 and R2=0.9912 for the logistic regression and NRMSE=0.1757 and R2=0.9886 for the multivariate adaptive regression spline and totally the multivariate adaptive regression spline method showed a better performance in comparison to the other two methods.
Keywords: Forest fire, Multivariate Adaptive Regression Spline, Multiple Linear Regression, Logistic Regression, Genetic Algorithm -
جنگل ها از مهم ترین منابع طبیعی و اکولوژیکی در کره زمین و از ارکان مهم توسعه پایدار در هر کشوری به حساب می آیند. آتش سوزی هر سال حدود 5500 هکتار از جنگل ها را در ایران از بین می برد. در این تحقیق با استفاده از داده های آتش سوزی سازمان جنگل ها در تلفیق با داده های سنجنده MODIS بین سال های 91 تا 96 نقاط آتش شناسایی شدند. ازآنجا که بیش از 75 درصد آتش سوزی ها در فصل گرم سال یعنی سه ماه تیر، مرداد و شهریور اتفاق افتاده بود، از داده های این سه ماه برای مدل سازی استفاده شد. پارامترهای موثر در وقوع آتش سوزی ارزیابی و پارامترهای وابسته حذف شدند. سپس دو روش رگرسیون چندگانه خطی و رگرسیون انطباقی چندمتغیره اسپیلاین برای پیش بینی ریسک وقوع آتش سوزی بررسی شدند. برای ارزیابی از چند پارامتر مهم شامل جذر میانگین مربعات خطاها، ضریب تعیین R2، درصد برآورد درست نقاط آتش و غیرآتش و توزیع خطا استفاده شد. نتایج نشان داد که روش رگرسیون انطباقی چندمتغیره اسپیلاین با داشتن خطای میانگین مربعات باقی مانده ها داده های آموزشی برابر با 0/1628، R2 داده های آموزشی برابر با 8932/0، درصد پیش بینی درست نقاط آتش آزمایشی نزدیک به 94 درصد، درصد پیش بینی درست نقاط غیرآتش آزمایشی نزدیک به 88 درصد و توزیع مناسب تر خطا عملکرد بهتری نسبت به روش دیگر دارد. این امر در واقع نشان دهنده مدل سازی دقیق تر یک روش محلی در مقایسه با یک روش غیرمحلی است. به همین دلیل نقشه ریسک تهیه شده با رگرسیون انطباقی چندمتغیره اسپیلاین اعتمادپذیری بیشتری از روش دیگر دارد. در نهایت با استفاده از نقشه ریسک این روش مناطق پرریسک شناسایی شدند. ویژگی این مناطق شامل فاصله کم تا مناطق مسکونی و راه، دارای خاک غنی از مواد عالی، دمای به نسبت زیاد و ارتفاع کم بود.
کلید واژگان: آتش سوزی جنگل، رگرسیون انطباقی چندمتغیره اسپیلاین (MARS)، رگرسیون چندگانه خطی(MLR)، نقشه ریسک آتش سوزیForest areas are among the most important natural and ecological resources on the Earth and are considered as one of the main pillars of sustainable development in any country. Fires ruins almost 5500 hectares of Iran’s forests yearly. In this research, firstly, the fire points were identified using the fire data of Forest Organization in combination with MODIS sensor data between 2012 and 2017. Due to the fact that more than 75% of fires were happened in the hot season of the year (June, July, and August), the data of the three months was used for modeling. Then, the effective parameters in fire occurring were evaluated and the dependent parameters were removed. Accordingly, two methods, including multiple linear regression and multivariate adaptive regression spline were studied to predict the fire risk. Some important parameters including the root-mean-square error (RMSE), R2, the correct estimation percentage of fire and non-fire points, and error distribution were used to evaluate. After modeling, it was found that the multivariate adaptive regression spline has better performance—where its RMSE of test data was 0.1628, its R2 of test data was 0.893, and its correct estimation percentage of test fire points and test non-fire points was near 94% and 88% respectively, as well as its error distribution was better than the other method. This actually shows that modeling with a local method is very better than modeling with a global method. Therefore, the risk map resulted by multivariate adaptive regression spline has better reliability compared to those of the other method. Finally, the high-risk areas were recognized using the risk map of this method. The traits of these areas were a short distance to residential areas and roads, having rich soil with organic materials, relatively high temperature, and low height.
IntroductionIn 2000, a convention was established in the United Nations to improve the quality of human life in which the principles of the Millennium Development Goals were adopted. One of these goals was to ensure the stability of the environment and natural resources. In the contemporary world, the value of forests is about 120 billion dollars and the livelihood of almost 9.1 people is dependent on forest (in)directly.
According to the opinion of global experts including FAO, if the forest cover of a country is less than 25% of that country’s area, that country is in critical condition in terms of the human environment. Almost 190000 hectares of Iranian forests have been ruined by fire in a 28-year period. Forest fire not only changes the natural ecosystem and ruins many plant and animal species of a region, but also makes other destructive effects like air pollution, respiratory problems, soil erosion, increased flowing surface waters, increased acidity of soil, decreased fertility, tourism industry losses, manufacturing industry and economy losses, and even climate change.
Immediate and accurate detection of the fire location and the ability to determine the effective parameters on it, as well as the detection of the areas with high-risk of fire is among the main concerns of environmental protection and disaster management. We can prevent the fire by training people, making effective regulations and management policies, and increased monitoring to deal with fire triggers. Moreover, in the case of fire occurrence, we must take necessary actions like deploying fire-fighting equipment near hazardous areas and making easy access to these areas. In fact, nowadays, the increasing importance of protecting the forests and natural resources has led to change the focus from crisis management to risk management.MethodologyThe modeling was not possible without non-fire points. Accordingly, at the beginning, some points are randomly selected in the whole area with a certain distance from the fire points and are identified as non-fire points. To implement the methods in MATLAB programming environment, firstly, the parameters used in the modeling are extracted using the maps of these parameters for fire and non-fire points. These parameters are used as inputs in each of these methods.
Constantly, 70% of the selected data were used as the training data and 30% of them were used as the test data. Initially, the multivariate linear regression and then the multivariate adaptive regression spline were used for modeling. The steps of the research implementation are shown in Figure (1).
After implementation of the modeling, the evaluation parameters of each method were provided to compare. Then, the risk map of the area was provided using trial points and Inverse Distance Weighting (IDW) and by employing 12 lateral points for each method (Figures 2 and 3). The points with a high risk were extracted from the resulted map. Then, the main traits of these points are considered as the traits of high-risk points.
Fig. 1. The steps of the research implementation
Fig. 2. Fire risk map provided using the MLR method on test data
Fig. 3. Fire risk map provided using the MARS method on test dataDiscussion and Results
After removing the dependent parameters from the effective parameters on the fire, the optimal effective parameters are presented in Table (1). These parameters are divided into three groups including climate, ground physical, and human parameters.
The modeling of fire risk was done by two methods. In the training and testing data section, the RMSE and R2 are presented in Table (2) for multivariate adaptive regression spline and multivariate linear regression methods, respectively. The results achieved by the training data section indicate that the training procedure is more accurate (R2 closer to 1) and with less error (less RMSE) in the multivariate adaptive regression spline than those achieved by the multivariate linear regression method. The appropriate amount of evaluation parameters for test data shows that the model does not experience over-fitting in these methods.
Table 1. Effective parameters on fire occurrence in the case-study area
In the linear regression method, the two parameters of the correct estimation percentage of fire points and non-fire points have a low value, hence, the worst possible scenario has happened and the risk map has the least amount of reliability. In the multivariate adaptive regression spline, the fire and non-fire points are simultaneously estimated with a high accuracy. This makes the risk map provided by the multivariate adaptive regression method becomes to be more reliable.
As seen in the results, the risk map provided by the multivariate adaptive regression spline method has a very higher reliability compared to the risk map provided by multivariate linear regression method. Hence, the risk map resulted by the first method was used to determine the features of the areas with a high risk of fire (Figure 4).
Since the fire risk has a normal distribution, the areas which satisfy Equation (1) are among the 2.5% of the areas that have the most fire risk.
(1)
where is the average, is the standard deviation, and R is the fire risk. The main features of the mentioned areas can be used as the important tools for decision making. The extraction of high-risk areas is done in ArcGIS environment. Statistical analysis of effective parameters’ features in these areas shows some key points. These features include low distance from the residential regions (less than 2 km), low distance from the road (less than 2 km), having mollisol, relatively high average temperature (more than , and low height (less than 50 m).
Fig. 4. High risk map provided using the MARS method on test dataConclusionsThis research attempted to identify the optimal method for modeling of fire points risk using climate, ground physical, and human parameters. Therefore, an accurate local method (MARS) was used along with a non-local method (MLR).
In the test data and the training data sections, the MARS method had the lowest RMSE and a value closer to 1. The outputs showed that the MARS method had a more accurate performance in the estimation of the fire and non-fire points compared to the MLR method. This indicated the high reliability of the MARS method. After determining the optimal method for the modeling of the area’s fire occurrence, the points of the area with high risk of fire were detected. After doing a statistical analysis it was found that these points have some fundamental features including low distance from the residential regions (less than 2 km), low distance from the road (less than 2 km), having mollisol, relatively high average temperature (more than and low height (less than 50 m).Keywords: Forest Fire, Multiple Linear Regression, Multivariate Adaptive Regression Spline, Risk Map
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.