Gazetteer Enrichment Using Real Estate Advertisements

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Introduction

Gazetteers are geospatial dictionaries of geographic names containing triples of place names, geographic footprints, and feature types for named geographic places. As an important element in Geospatial Information Retrieval (GIR), these precious resources should be enriched according to new applications. . Identification and adding new place names to the gazetteer, and keeping it up to date are important issues in the gazetteer enrichment. The main challenge in this era is that in most gazetteers only a top-down approach is considered. Consequently, most local place names are ignored in such gazetteers. In addition, updating gazetteers is a time-consuming and expensive process. Since the emergence of Web 2.0, using volunteered Geographic Information (VGI) and social media in harvesting place names have been attracted the attention of many researchers due to containing local place names and recently created ones. In a similar condition, online property ads published by people contain such place names. This article presents a data-driven method for identifying urban place names including neighborhoods and main streets using online real estate advertisements.

Materials and Methods

The online real estate ads of four metropolises including Tehran, Mashhad, Isfahan, and Shiraz mined from the Divar website. After n-gram extraction and applying required pre-processes, the n-grams got labeled. To remove outlier points from an n-gram set and consider the scenario that several places can have the same name through a city, the point set of the n-gram get clustered. Based on a set of spatial statistics, the random forest models on housing data of each city trained and then tested on the ads data of other cities.

Discussion and Results

The results show that either in detecting the main street or neighborhood, the model trained on ads data from one city has a successful prediction on the other ones. For example, the models trained based on the data of Tehran and tested on the data of Mashhad achieved 61% and 74% respectively in identifying street and neighbourhood. However, for some reasons such as imbalancement of datasets, data labeling challenges, and in some cases, identifying non-spatial n-grams due to clustering, precision has been decreased. Also, because of differences in urban patterns and place naming patterns between the cities, the recall has been slightly decreased.

Conclusion

A place can be referenced in two different ways: 1- By calling its name and 2- By coordinate data. Gazetteers are considered a bridge between that two types of georeferencing. According to the importance of these resources in geospatial applications, the enrichment of them is a necessity. For containing local place names, online property listings can be considered as a valuable resource for harvesting toponyms and enriching gazetteers. Regarding to that most users in publishing online property, ads consider a neighborhood or main street name which is well-known for the readers, these place names usually are written without any clue for identifying a location in a text processing manner. The behavior with respect to a set of spatial statistics can be considered as a spatial signature to recognize an n-gram as a neighborhood or street place name.

Language:
Persian
Published:
Journal of Geomatics Science and Technology, Volume:11 Issue: 2, 2021
Pages:
1 to 14
magiran.com/p2376926  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!