Data Mining and Deployment of Multilingual Iranian Cultural Thesaurus (ASFA) Dataset in the CRISP Framework

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Purpose

The Simple Knowledge Organization System (SKOS) is a widely used data model for sharing and linking knowledge organization systems on the web. It offers a cost-effective way to migrate existing knowledge organization systems to the Semantic Web. To integrate ASFA into the Semantic Web, the ASFA dataset needs to be converted and deployed as an RDF graph based on SKOS. To achieve this, the records in ASFA''''s Iran MARC format must be re-engineered. This study aims to re-engineer the ASFA dataset using data mining in the CRISP framework and deploy it on the open-source platform Skosmos. 

Method

The study used the developmental-applied type of research and employed the CRISP-D.M. methodology, unsupervised type, and hierarchical clustering technique for data mining to start the project, we first needed to understand the business goal. This goal was to convert the ASFA dataset into the SKOS data model, creating an RDF graph. It was discovered that ASFA''''s heritage data comprises 1,880 records categorized into 18 fields, including education, literature, communication, economy, history, Sufism and mysticism, sociology, geography, law, psychology, linguistics, religion, political science, philosophy, technology, experimental science, librarianship and information, management, culture, and art. The data was prepared by identifying and correcting missing and outlier data and before starting the project, our team needed to fully comprehend the business''''s objective. The ultimate goal was to convert the ASFA dataset into the SKOS data model. This was done to better comprehend the business objective. Creating an RDF graph. The modeling stage utilized the hierarchical clustering technique macrocode in Excel to generate target feature values. The model was evaluated through a visual inspection technique and random sampling method. In the sixth step, Iran MARC data was converted to SKOS as an RDF graph using the SkosPlay tool, and the data was transferred to the Vocbench platform. ASFA Dataset was deployed on the Skosmos platform using the Turtle format.

Findings

The main finding of this study is the deployment and development of ASFA Dataset based on SKOS/RDF on the open source platform Skosmos at kosmos.nlai.ir. The total number of records increased to 11,880 records creating collection records for clustering. One of the important findings during the data preparation stage was the compilation of the mapping table between SKOS core elements and Iran MARC fields.

Conclusion

By integrating stages of methodologies used in the literature review within the CRISP framework, an innovative method was developed for converting thesauri into a lightweight ontology based on SKOS/RDF graph format.

Language:
Persian
Published:
Librarianship and Informaion Organization Studies, Volume:34 Issue: 1, 2023
Pages:
58 to 82
magiran.com/p2632572  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!