The Studies of Decision Tree in Estimation of Breast Cancer Risk by Using Polymorphism Nucleotide

Abstract:
Introduction
Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important factor in predicting the risk of diseases. The number of seven important SNP among hundreds of thousands genetic markers were identified as factors associated with breast cancer. The objective of this study is to evaluate the training data on decision tree predictor error of the risk of breast cancer by using single nucleotide polymorphism genotype.
Methods
The risk of breast cancer were calculated associated with the use of SNP formula:xj = fo * In human, The decision tree can be used To predict the probability of disease using single nucleotide polymorphisms .Seven SNP with different odds ratio associated with breast cancer considered and coding and design of decision tree model, C4.5, by Csharp2013 programming language were done. In the decision tree created with the coding, the four important associated SNP was considered. The decision tree error in two case of coding and using WEKA were assessment and percentage of decision tree accuracy in prediction of breast cancer were calculated. The number of trained samples was obtained with systematic sampling. With coding, two scenarios as well as software WEKA, three scenarios with different sets of data and the number of different learning and testing, were evaluated.
Results
In both scenarios of coding, by increasing the training percentage from 66/66 to 86/42, the error reduced from 55/56 to 9/09. Also by running of WEKA on three scenarios with different sets of data, the number of different education, and different tests by increasing records number from 81 to 2187, the error rate decreased from 48/15 to 13/46. Also in the majority of scenarios, prevalence of the disease, had no effect on errors in the WEKA and code.
Conclusion
The results suggest that with increased training, and thus the accuracy of prediction error decision tree to reduce the risk of breast cancer increases with the use of decision trees. In Biological data, decision trees error is high even with a 66/66% training. On the other hand by increasing the number of SNP from 4 to 7 decision tree, decision tree error dramatically decreased at 70/1% training. In general we can say that with increased training and increasing the number of SNP in the decision tree, the prediction accuracy increased and errors reduced. In the CODING and WEKA, percentage of disease prevalence had no significant effect on errors,” Because of selecting set of training and testing by systemic method “.
Language:
Persian
Published:
Journal of Shaeed Sdoughi University of Medical Sciences Yazd, Volume:25 Issue: 4, 2017
Pages:
300 to 310
magiran.com/p1748370  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
دسترسی سراسری کاربران دانشگاه پیام نور!
اعضای هیئت علمی و دانشجویان دانشگاه پیام نور در سراسر کشور، در صورت ثبت نام با ایمیل دانشگاهی، تا پایان فروردین ماه 1403 به مقالات سایت دسترسی خواهند داشت!
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!