The impact of disease prevalence rate in training set on performance of random forest and threshold Bayes A methods

Author(s):
Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
The objective of current study was to investigate the role of disease prevalence rate of training set and genomic architecture on performance of random forest (RF) and threshold Bayes A (BTA) in threshold traits. For this purpose, genomic population were simulated to reflect variations in heritability (0.05 and 0.25), number of QTL (150 and 600) and linkage disequilibrium (low and high) for 30 chromosomes. To create binary phenotype with different disease prevalence rate, at first, 5 percent of training set animals which had the lowest phenotype average defined code 1 (or diseased) and 95 percent of others defined code 0 (or healthy). This process continued with a 5% increase rate until 50 percent of animals had code 1 in training set. In both random forest and Bayes A methods, genomic accuracy with increase in disease prevalence rate 5 to 20 percent was increased, and afterwards to achieve of 50 percent was decreased. The negative effect of high levels of disease prevalence rate on genomic accuracy was higher than low levels of it. Overall, RF was fluctuation to variations of genetic architecture and disease prevalence rate. Despite the higher accuracy of TBA at different scenarios, RF showed a better performance when high-heritability traits were controlled by a large number of QTLs. Despite the important role of genetic basis of the population analyzed, the best method to predict genomic breeding value of threshold traits depend on disease prevalence rate.
Language:
Persian
Published:
Animal Sciences Journal, Volume:32 Issue: 124, 2019
Pages:
131 to 146
https://www.magiran.com/p2069412