Georeferencing Semi-Structured Place-Based Web Resources Using Machine Learning
In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georeference the semi-structured resources on the web using machine learning. To this end, we leveraged the advertisements related to real state domain in the city of Tehran, Iran, published in Divar website. In order to extract the advertisesments from the website, a crawling approach was chosen. In addition, to assign coordinates to advertisements, we used Random Forests algorithm. The results show that using this approach, the advertisements can be georeferenced at the precision of neighborhoods. The resulting presicion from this approach is about 2 km and 6 km in latitude and longitude directions, respectively. Moreover, the results demonstrate that price of the property has higher importance relative to other variables considered in this study. It can be concluded that the price of properties in Tehran shows stronger spatial pattern in North-South direction than East-West direction.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.