geographically weighted regression
در نشریات گروه پزشکی-
Background
Cancer remains a critical public health issue in India, with rising cases of breast cancer and cervical cancer. Accurate predictions and spatial analysis of cancer incidence are essential for shaping prevention strategies and targeting interventions in high-risk regions.
MethodsThis study utilized a big data framework employing machine learning techniques from the SparkML library to predict cancer cases and analyze spatial distributions across Indian states from 2016 to 2021. Three machine learning models used Random Forest Regressor, Gradient Boosting Regressor, and Geographically Weighted Regression (GWR) were applied to the dataset. Spatial autocorrelation analysis used Moran’s I statistic to identify clustering patterns.
ResultsThe spatial analysis revealed significant clustering of cancer cases, particularly in 2020, with a z-score of 2.23, a p-value of 0.02, and a Moran’s index of 0.15. Among the machine learning models, GWR achieved a predictive accuracy of 98% for both breast cancer and cervical cancer, while the Random Forest Regressor and Gradient Boosting Regressor achieved 95% and 97% accuracy, respectively, over the six-year period. Gradient Boosting outperformed other models in identifying key predictors and ensuring high predictive accuracy.
ConclusionsThe findings highlight the efficacy of Gradient Boosting and GWR in predicting cancer incidence and analyzing spatial patterns. These models provide critical insights into cancer clustering and risk factors, supporting the development of targeted prevention strategies and policy interventions for high-risk regions in India. The results emphasize the utility of machine learning techniques in public health research and cancer control.
Keywords: Cancer, Big Data, Machine Learning, Gradient Boosting, Geographically Weighted Regression -
Introduction
In 2022, gastric and breast cancer had a mortality rate of 6.8 cases, ranking fourth despite intervention efforts. A 2020 study by the International Agency for Research on Cancer found global variations in incidence rates. This study examines key risk factors and preventive measures [1-2].
Materials and MethodsThis applied ecological study uses the MAGWGPRS (Multivariate Adaptive Geographically Weighted Generalized Poisson Regression Spline) model, integrating MARS and GWGPR, to analyze cancer registry data. The model identifies geographic variations and hotspots in cancer risk. Data sources include pathology reports, death records, biopsy data, and a non-communicable disease risk factor survey. The dataset comprises patient age, location, cancer case counts, and relevant risk factors. Analysis is conducted using R, with ArcGIS for map visualization.
ResultsAccording to the International Agency for Research on Cancer, key risk factors for stomach cancer include obesity, smoking, physical inactivity, poor nutrition, age, and population density. The MAGWGPRS model, a Geographically Weighted Model, identifies regional variations in these factors by weighting observations based on distance using a kernel function and optimizing the model with the GCV criterion. Our analysis highlights vegetable consumption, smoking, low physical activity, and age as the primary determinants of gastric cancer risk.
ConclusionOur model identifies vegetable consumption, smoking, low physical activity, and aging as significant risk factors for gastric cancer. Further research is needed to refine obesity risk based on BMI criteria. The MAGWGPRS model is a valuable tool for identifying high-risk regions, enabling targeted interventions and prioritizing key risk factors across diverse geographic areas.
Keywords: Stomach Neoplasms, Pathology, Diagnosis, Aged, Body Mass Index, Obesity, Complications, Toxicity, (Diet, Vegetarian), Statistics, Numerical Data, Analyses Spatial, Geographically Weighted Regression, Poisson Distribution, Spatial Regression, Iran -
زمینه و هدف
ابتلاء پرندگان به بیماری آنفلوانزای فوق حاد پرندگان (Highly Pathogenic Avian Influenza) و از بین رفتن آن ها خسارات سنگینی به صنعت دام و طیور و بهداشت عمومی کشور تحمیل می کند. امروزه با توجه به حجم و تنوع داده ها، ضرورت استفاده از فناوری های مکان محور و علوم داده کاوی ضروری به نظر می رسد. هدف این تحقیق مدل سازی شیوع بیماری آنفلوانزا فوق حاد پرندگان به کمک قابلیت های تحلیل مکانی می باشد.
مواد و روش ها:
در پژوهش حاضر که به صورت تحلیلی-اکولوژیکی است، سال 1395 با میزان بالای شیوع این بیماری به عنوان سال تهیه متغیرهای 17گانه (اقلیمی، محیطی و انسان ساخت) و ایجاد لایه های مکانی، در استان گیلان انتخاب گردید. با استفاده از تلفیق تحلیل رگرسیونی تقویت شده (Boosted Regression Trees; BRT) و رگرسیون وزن دار جغرافیایی، وزن های این متغیرها محاسبه و مدل شیوع بیماری تهیه و توسط منحنی عملیاتی دریافت کننده (Receiver Operating Characteristic; ROC) اعتبار آن مورد ارزیابی قرار گرفت.
یافته ها:
متغیرهای تالاب، بازار فروش مرغ زنده و استخرها بیش ترین وزن را در تحلیل BRT به ترتیب 91/18، 59/15، 8/12 به خود اختصاص دادند. هم چنین، از نظر زمانی ماه بهمن بیش ترین میزان شیوع را در بین 3 ماه سرد سال داشته است.
نتیجه گیری:
این بیماری در نواحی اطراف تالاب ها و استخرها، نزدیکی بازارهای فروش مرغ زنده مشاهده شده است. بنابراین اداره کل دام پزشکی به عنوان نهاد نظارتی و سیاست گذار و تولیدکنندگان و فروشندگان مرغ به عنوان عوامل اجرایی می توانند نقش بسیار مهمی در پایش، کنترل و جلوگیری از شیوع این بیماری ایفاء نمایند.
کلید واژگان: آنفلوانزای پرندگان، تحلیل مکانی، رگرسیون تقویت شده، رگرسیون وزن دار جغرافیاییBackground and ObjectivesInfection of birds to Highly Pathogenic Avian Influenza (HPAI) and their extinction impose heavily losses on the livestock and poultry industry along with public health. Nowadays, due to the volume and variety of data, the need of using location-based technologies and data mining sciences has become inevitable. This study aims to model the prevalence of avian influenza, using the capabilities of spatial analyses.
Materials and MethodsIn this analytical-ecological study, the year 2016 is selected as the target year to prepare 17 variables (climate, environment, and man-made) and their spatial layers in Guilan province because of the high prevalence of the disease in this year. The weights of the variables were computed through combination of Boosted Regression Trees (BRT) analysis and Geographically Weighted Regression (GWR), and then prevalence of the disease was prepared and evaluated by the Receiver Operating Characteristic (ROC) curve.
ResultsThe variables of wetlands, live poultry markets, and pools have the highest weights according to BRT analysis, with 18.91, 15.59, and 12.8 percent, respectively. Also, in terms of time, the month of February has the highest prevalence among the three cold months of the year.
ConclusionThe disease has been observed in the areas around wetlands, pools, and live poultry markets. Therefore, the General Veterinary Administration, as a regulatory and policy-making body, and poultry producers and sellers as executive agents can play a significant role in monitoring, controlling, and preventing the spread of the disease.
Keywords: Avian influenza, Spatial analysis, Boosted regression, Geographically weighted regression -
Ordinary linear regression (OLR) is one of the most common statistical techniques used in determining the association between the outcome variable and its related factors. This method determines the association that is assumed to be true for the whole study area – a global association. In the field of public health and social sciences, this assumption is not always true, especially when it is known that the relationship between variables varies across the study area. Therefore, in such a scenario, an OLR should be calibrated in a way to account for this spatial variability. In this paper, we demonstrate use of the geographically weighted regression (GWR) method to account for spatial heterogeneity. In GWR, local models are reported in which association varies according to the location accounting for the local variation in variables. This technique utilizes geographical weights in determining association between the outcome variable and its related factors. These geographical weights are relatively large (i.e. close to 1) for observations located near regression point than for the observations located farther from the regression point. In this paper, we demonstrated the application of GWR and its comparison with OLR using demographic and health survey (DHS) data from Tanzania. Here we have focused on determining the association between percentages of acute respiratory infection (ARI) in children with its related factors. From OLR, we found that the percentage of female with higher education had the largest significant association with ARI (P = 0.027). On the other hand, result from the GWR returned coefficients varying from -0.15 to -0.01 (P < 0.001) over the study area in contrast to the global coefficient from OLR model. We advocate that identifying significant spatially-varying association will help policymaker to recognize the local areas of interest and design targeted interventions.Keywords: Acute Respiratory Infection (ARI), Geographically weighted regression, Ordinary linear regression, Tanzania
-
زمینه و هدفشرایط محیطی و اقلیمی در مناطق مختلف جغرافیایی زمینه را برای برخی از بیماری ها فراهم می کند. سرطان پوست نیز یکی از سرطان های شایع است که نرخ بروز آن در نواحی جغرافیایی متفاوت است. هدف از این مطالعه مشخص کردن تاثیر پارامترهای اقلیمی و محیطی در بروز بیماری سرطان پوست و تهیه ی نقشه ی توزیع جغرافیایی سرطان پوست در ایران است.
روش اجرا: مطالعه ی حاضر با استفاده از داده های بیماران سرطان پوست، داده های جمعیت کل کشور، داده های اقلیمی و محیطی موثر بر بروز سرطان پوست انجام گرفته است. در این مطالعه پس از محاسبه ی نرخ بروز سرطان پوست برای کل کشور، از مدل رگرسیون وزن دار جغرافیایی برای برقراری رابطه ی رگرسیونی بین داده های اقلیمی و محیطی با نرخ بروز سرطان پوست استفاده شده است. هم چنین در این مطالعه میزان ضریب تشخیص بین نقشه ی واقعیت نرخ بروز سرطان پوست و نقشه ی مدل شده ی آن محاسبه گردیده است.یافته هانتایج ضریب همبستگی نشان داده است که پارامترهای UV خورشید و رطوبت نسبی به ترتیب بیشترین همبستگی مثبت و منفی با نرخ بروز سرطان پوست داشته اند. بخش های جنوب، شرق و مرکز ایران از بیشترین نرخ بروز سرطان پوست و سواحل شمالی و شمال غرب ایران از کمترین نرخ بروز بیماری برخوردار بوده اند. اعتبارسنجی نقشه ی واقعیت نرخ بروز بیماری و نقشه ی مدل شده ی نرخ بروز بیماری حاکی از ضریب تشخیص 71/0 بوده است.نتیجه گیریتمامی پارامترهای اقلیمی و محیطی موردنظر در این مطالعه در نرخ بروز سرطان پوست موثر بوده اند.کلید واژگان: سرطان پوست، ملانوم، سنجش از دور، رگرسیون وزن دار جغرافیاییBackground And AimEnvironmental and climatic conditions in different geographical areas provide the basis for certain diseases. Skin cancer is one of the most common types of cancer, with a different incidence rate in geographical areas. The aim of this study is to determine the effects of climate and environmental factors on skin cancer and to map the geographical distribution of skin cancer in Iran.MethodsThis study was performed using data of patients with skin cancer, population and data of climatic and environmental factors that affect skin cancer incidence. In this study, after calculating the incidence of skin cancer rate for the whole country, we used the Geographically Weighted Regression model to establish a regression relationship between climate and environmental data and the incidence of skin cancer. The coefficient of detection between the map of incidence of skin cancer and its model map was calculated.ResultsCorrelation coefficients showed that sun UV and relative humidity had the highest positive and negative correlation with the incidence of skin cancer, respectively. The southern, eastern and central regions of Iran had the highest incidence of skin cancer rate and the northern and northwestern coasts of Iran had the lowest incidence rate. Validating of actual incidence rate map and the modeled incidence rate map indicated a coefficient of detection of 0.71.ConclusionAll of the climate and environmental parameters in this study contributed to in the incidence of skin cancer.Keywords: skin cancer, melanoma, remote sensing, Geographically Weighted Regression -
BackgroundAlthough intellectual disability (ID) is a common disability in Iran, there is no investigation on the spatial distribution pattern of these patients in national level and the spatial maps for recognition the areas with higher prevalence of IDs and local neighborhoods of these regions or effect of socio-demographic factor on this scattering is not still available. This proposition motivated us to assess the population with ID in our country.MethodsIn a cross-sectional study, we applied Moran’s Index (Moran’s I) which includes information about the strength of the neighboring association between counties, as global univariate distribution assessment. A geographically weighted regression was used to explore relation between ID patient’s prevalence and some socio-demographic factors (migration and illiteracy rate, physician number (PN)/10,000 people and health-care centers (HCCs)/10,000 people).ResultsWe found that spatial clusters of ID patients exist among Iran counties (Moran’s I = 0.36,P < 0.01) and in a rural area population groups (Moran’s I = 0.20,P < 0.01). Further, we detected spatial associations between ID patients and all of our investigated socio-demographic factors in national scale. In rural areas, illiteracy has high association with ID especially in the south region of Iran. Urban area has random pattern of ID patients both within and between the Iran counties (Moran’s I = 0.01,P > 0.3).ConclusionsAccording to the results, our Initial hypothesis about the existence ofspatial clusters in distribution ofpeople with ID in Iran was proven. Spatial autocorrelation between migration and illiteracy rate and prevalence of patients with ID was shown and was in agreement with our hypothesis. However, our supposition that the prevalence should have inverse relationship with PN and HCC was rejected.Keywords: Geographic information system, geographically weighted regression, intellectual disability, Iran, prevalence
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.