Magiran | نشریه ماشین بینایی و پردازش تصویر، سال نهم شماره 1 (بهار 1401)

انتخاب همه

طراحی و ارزیابی یک شبکه‏ عصبی کپسولی جدید برای طبقه ‏بندی نامتوازن تصاویر

حامد جباری، نوشین بیگدلی* صفحات 1-15

طبقه ‏بندی نامتوازن تصاویر یکی از مسایل مهم و دشوار در زمینه داده کاوی است. با عدم توانایی الگوریتم های طبقه بندی استاندارد، شبکه‏ های عصبی کپسولی با درنظر گرفتن ارتباطات فضایی ویژگی‏ ها، در مقایسه با سایر شبکه‏ های عمیق مثل شبکه ‏های عصبی کانولوشنی بستر مناسبی را برای طراحی مدل‏ های طبقه ‏بندی نامتوازن فراهم می‏ کنند. ازطرف‏ دیگر چندشاخگی در ترک‏ های سطحی یکی از ناهنجاری‏ ها و دسته ‏های اقلیت موجود در سازه‏ های بتنی است که تشخیص آن می ‏تواند در نگهداری سازه‏ های بتنی و مدیریت هزینه‏ ها موثر باشد. به‏ همین ‏منظور در این مقاله یک معماری جدید بر اساس شبکه‏ های عصبی کپسولی برای ارزیابی طبقه ‏بندی نامتوازن تصاویر ترک ‏های سطحی در سازه ‏های بتنی معرفی شده است. بررسی و مقایسه شبکه پیشنهادی با شبکه‏ های کانولوشنی در طبقه‏ بندی متوازن و نامتوازن ترک‏های سطحی روی 13500 مجموعه تصاویر جمع‏ آوری‏ شده، نشان از برتری شبکه پیشنهادی داشت. شبکه پیشنهادی در بررسی اثر کاهش تعداد تصاویر آموزش در دقت طبقه ‏بندی نیز برتری چشم‏گیری در مقایسه با شبکه ‏های کانولوشنی از خود نشان داد. این شبکه‏ طبقه ‏بندی متوازن ترک‏های سطحی را با دقت 99/56 درصد انجام داد. هم‏چنین شبکه پیشنهادی تا عدم توازن دسته اقلیت به اکثریت 1 به 8، دقت بالای 80 درصد داشت که نسبت به سایر روش ‏ها بسیار مناسب است.
کلیدواژگان: طبقه‏ بندی تصاویر، طبقه‏ بندی نامتوازن، ترک‏ های سطحی، چندشاخگی، یادگیری عمیق، شبکه‏ های عصبی کپسولی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
شناسایی کانی های موجود در مقاطع نازک سنگ با استفاده از پردازش تصاویر رنگی

شکوفه ساعدی، عبدالله چاله چاله* صفحات 17-29

در روش سنتی برای مطالعه کانی های موجود در مقاطع نازک، مرز کانی ها به صورت دستی جدا شده و هر بخش برچسب گذاری می شود. این روش هزینه بر و نیازمند دانش، تخصص و تجربه بالایی است. بنابراین وجود یک سامانه شناسایی خودکار در این حوزه ضروری است. چنین سامانه ای می تواند باعث افزایش دقت و کاهش خطاهای انسانی، هزینه و زمان تشخیص کانی ها شود. هدف این پژوهش، پیشنهاد یک سامانه تشخیص خودکار است که با استفاده از پردازش تصویر، کانی های موجود را شناسایی و طبقه بندی کند. مراحل اصلی روش ارایه شده شامل جمع آوری تصاویر از مقاطع نازک، قطعه بندی، استخراج ویژگی و طبقه بندی است. پس از ایجاد پایگاه تصاویر، الگوریتم JSEG برای قطعه بندی انتخاب و اعمال شده است. سپس ویژگی های رنگ و بافت در دو فضای رنگی RGB و HSI از هر ناحیه استخراج شده اند. این ویژگی ها، برای طبقه بندی به طبقه بند فرستاده شده و طبقه بند هر ناحیه را به عنوان یک کانی برچسب گذاری کرده است. به علاوه، در این پژوهش کارایی شش طبقه بند مختلف نیز برای این منظور مورد ارزیابی قرار گرفته است. براساس نتایج، طبقه بند Bagged Tree دارای بالاترین دقت به میزان 95٫52 و کمترین میزان میانگین خطای مطلق برابر با 0٫04 می باشد. همچنین همه طبقه بندها دارای دقت بالای 93% هستند که نشان می دهد روش استخراج ویژگی پیشنهادی دارای قابلیت مناسبی است.

کلیدواژگان: پردازش تصویر، شناسایی خودکار، مقاطع نازک، کانی، قطعه بندی، طبقه بندی

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
ترکیب روش منظم سازی تنک و آسیب مغزی بهینه در کوچک سازی یک مدل یادگیری عمیق

محمود امین طوسی* صفحات 31-45

یکی از چالش های شبکه های عصبی پیچشی، به عنوان ابزار اصلی یادگیری عمیق، حجم زیاد برخی از مدل های مربوطه است. یک شبکه ی عصبی پیچشی به مثابه مدلی از مغز، متشکل از میلیون ها اتصال است. کاهش حجم این مدل ها از طریق حذف (هرس) اتصالات اضافی مدل انجام می شود که همانند یک آسیب مغزی است. دو روش منظم سازی تنک و آسیب مغزی بهینه از جمله مشهورترین شیوه های هرس مدل هستند. در این نوشتار با ترکیب این دو شیوه نتایج بهتری در کاهش حجم مدل حاصل شده است. ابتدا با استفاده از روش انتقال یادگیری، یک مدل بزرگ شبکه های عصبی پیچشی برای شناسایی طبقات هدف، آموزش داده شد؛ سپس با روش های منظم سازی تنک و آسیب مغزی بهینه ، اتصالات اضافی آن هرس شدند. نتایج آزمایشات نشان داده است که در بیشتر مجموعه دادگان مورد بررسی، اعمال شیوه ی ترکیبی منظم سازی تنک و آسیب مغزی بهینه نسبت به اعمال هر یک از آنها به صورت جداگانه کاراتر است. برای یکی از مجموعه دادگان مورد بررسی، با روش ترکیبی پیشنهادی تعداد اتصالات مدل 76 درصد کاهش داده شد، بدون آنکه کارایی آن کاهش یابد. این کاهش حجم مدل، زمان پردازشی را به یک سوم تقلیل داده است. کاهش حجم مدل می تواند امکان استفاده از آن در مرورگرها و سخت افزارهای ضعیف تر و همه گیرتر را تسهیل سازد.

کلیدواژگان: شبکه های عصبی پیچشی، هرس شبکه، یادگیری عمیق، بهینه سازی تنک، منظم سازی تنک

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
ارائه روشی مبتنی بر رای گیری برای ترکیب خروجی های شبکه های عمیق جهت آنالیز قالب بندی اسناد چاپی

امیررضا فاتح، محسن رضوانی، علیرضا تجری، منصور فاتح* صفحات 47-64

در چند دهه گذشته، تحقیقات فراوانی در زمینه OCR یا نویسه خوان نوری انجام شده است. نویسه خوان نوری، یکی از راه های تبدیل تصاویر متنی به متن قابل ویرایش و شناسایی حروف و کلمات به صورت خودکار است. تشخیص مناطق متنی و غیرمتنی درون سند به آنالیز قالب بندی اسناد شناخته می شود و یکی از گام های کلیدی در روند تبدیل تصویر سند به متن قابل ویرایش است. جداسازی مناطق متنی و غیرمتنی درون یک تصویر از تاثیرگذارترین پیش پردازش های ممکن در سیستم های نویسه خوان نوری است. نبودن یک قالب یکسان در تمامی صفحات، وجود پس زمینه های پیچیده، نویزهای مختلف، کیفیت پایین، چرخش تصاویر و تصاویر چندین ستونه مانع از شناسایی درست مناطق حاوی متن می شوند. عدم تشخیص درست مناطق حاوی متن و به تبع آن عدم تشخیص صحیح مختصات خطوط، تمامی بخش های بعدی یک سیستم نویسه خوان نوری را دچار اخلال می کند. در این تحقیق، روشی نوین برای تشخیص مناطق متنی درون تصویر ارایه شده است. روش پیشنهادی، با بکارگیری از چندین روش مختلف و استفاده از سیستم رای گیری در میان آن ها، مناطق متنی تصویر را استخراج می نماید که تا کنون در کارهای پیشین از آن بهره گرفته نشده است. روش پیشنهادی بر روی دادگانی از تصاویر با بیش از 950 صفحه مورد آموزش و آزمون قرار گرفته است که نتایج آزمون حاکی از ارایه دقت 97.94% در روش پیشنهادی است. مجموعه دادگان ارایه شده در این مقاله به صورت آزاد در دسترس است.
کلیدواژگان: تقسیم بندی تصویر، آنالیز قالب بندی سند، آشکارسازی متن، آشکارسازی تصویر، رای گیری

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
ارائه یک روش یادگیری خود - نظارتی عمیق مبتنی بر تبدیل موجک گسسته دو بعدی برای تعمیم دامنه تصاویر

سارا فرهمندی نیا، مهدی افتخاری*، کاوه بهرامن صفحات 65-76

در یادگیری ماشین، انتقال و تعمیم دانش یادگرفته شده از یک دامنه به دامنه های دیگر، یکی از قابلیت های مهم و اساسی به شمار می رود. از آن جا که یادگیری با نظارت هرگز نمی تواند کامل باشد، استفاده از روش های دیگری همچون روش های یادگیری خود - نظارتی می تواند برای مساله ی تعمیم دامنه بسیار کمک کننده باشد. در این مقاله، ما روشی را ارایه می دهیم که علاوه بر طبقه بندی تصاویر اصلی به منظور یادگیری برچسب های داده در فرایند با نظارت، سعی می کند که تصاویر حاصل از اعمال تبدیل موجک گسسته بر روی تصاویر اصلی را با تولید شبه برچسب هایی برای آنها طبقه بندی کند. این کار به عنوان یک وظیفه ی خود - نظارتی می تواند باعث یادگیری ویژگی های مفید و یک بازنمایش کلی در میان تصاویر دامنه های مختلف شود، که می تواند به بهبود مساله ی تعمیم دامنه بسیار کمک کند. در ادامه با ترکیب روش های خود - نظارتی مانند پازل jigsaw و حدس زاویه چرخش با تبدیل موجک گسسته، نشان می دهیم که این ترکیب می تواند باعث بهبود نتایج برای مساله ی تعمیم دامنه شود. در این مقاله، ما از مجموعه داده های معروف PACS، VLCS و Office-Home برای انجام آزمایش ها استفاده کردیم و نتایج نشان می دهند که روش پیشنهادی ما می تواند از روش های پیشرفته و به روز تعمیم دامنه بهتر عمل کند.
کلیدواژگان: تطبیق دامنه، تعمیم دامنه، دامنه منبع، دامنه هدف، یادگیری - خودنظارتی، تبدیل موجک

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی
روشی خودکار به منظور کالیبراسیون نسبی و زمانی دوربین های غیرحرفه ای با هدف تولید ویدئوهای سه بعدی

عطیه گنجعلی، علیرضا صفدری نژاد* صفحات 77-91

در این مقاله راهکاری خودکار به منظور تولید ویدیوهای سه بعدی از طریق کنارهم قراردادن دو دوربین غیرحرفه ای پیشنهاد شده است. عدم امکان تامین هم زمانی دوربین ها در شروع فیلم برداری، نرخ نامشابه نمونه برداری فریم ها، معلوم نبودن پارامترهای کالیبراسیون داخلی و همچنین محدودیت های مربوط به تنظیم سخت افزاری ارتباط نسبی دوربین ها، چالش های این راهکار قلمداد می شوند. در راهکار پیشنهادی، ابتدا هم زمانی ویدیوها از طریق تناظریابی شاخص های زمانی تامین شده و در ادامه مقاطع زمانی توام با سکون در طول ویدیوها شناسایی می شوند. در ادامه، مجموعه ای از نقاط متناظر در دو ویدیو به کمک اجرای تناظریابی خودکار شناسایی شده و در روندی اصلاحی مورد پالایش قرار می گیرند. نقاط متناظر پالایش شده در برآورد هم زمان پارامترهای کالیبراسیون داخلی و نسبی دوربین های استریو به کار گرفته شده و در آخر، ویدیوهای سه بعدی نرمال شده از طریق بازنمونه برداری مبتنی بر هندسه ی اپی پلار تولید می گردند. این روش در مورد چندین ویدیوی سه بعدی از چهار جنبه ی مختلف کمی و کیفی مورد ارزیابی قرار گرفته است. دقت هندسی مطلوب در تولید ویدیوهای نرمال شده، هم زمانی دقیق ویدیوهای سه بعدی، تعمیم پذیری مطلوب روش پیشنهادی در تولید ویدیوهای سه بعدی در شرایط محیطی مختلف و رضایت تماشاگران ویدیوهای سه بعدی از منظر درک بصری عمق، از ویژگی های نتایج این روش محسوب می شوند.
کلیدواژگان: انتروپی، توجیه نسبی، تصاویر نرمال، فیلم سه بعدی، شار نوری، هم زمان سازی خودکار

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

انتخاب همه

Design and Evaluation of a New Capsule Neural Network (CapsNet) for Imbalanced Images Classification

Hamed Jabbari, Nooshin Bigdeli * Pages 1-15

Imbalanced image classification is one of the most important and difficult issues in data mining. With the inability of standard classification algorithms, Capsule neural networks (CapsNet) provide a good platform for designing imbalanced classification models by considering spatial communication of features, compared to other deep networks such as Convolutional Neural Networks (CNN). On the other hand, crack bifurcation in the surface cracks is one of the anomalies and minority categories in concrete structures that can be effective in the maintenance of concrete structures and cost management. Also, the surface crack image sets are suitable data for evaluating imbalanced classification due to their characteristics. Therefore, in this paper, a new architecture based on CapsNet is introduced to evaluate the imbalanced classification of surface crack images in the concrete structures. Examination and comparison of the proposed network with CNN in balanced and imbalanced image classification of surface cracks on 13,500 sets of collected images showed the superiority of the proposed network. Also, the proposed network showed a significant advantage compared to CNN in investigating the effect of reducing the number of training images on classification accuracy. This network performed balanced classification of surface cracks with 99.56% accuracy. Also, the proposed network has an accuracy of 80% up to the imbalance of theminority group to the 1:8 minority, which is very suitable compared to CNN.
Keywords: Image classification, Imbalance Classification, Surface Cracks, Crack Bifurcation, Deep Learning, CapsNet

Abstract View Paper Research/Original Article Original: Persian
Recognition of Minerals in Thin Sections Using Color Image Processing

Shokoofeh Saedi, Abdolah Chalechale * Pages 17-29

In the traditional methods of analyzing minerals in thin sections, the boundaries of the minerals were manually separated and each section was labeled. This approach is expensive and requires high expertise and experience. Therefore, an automatic identification system is essential in this field. Such a system can increase the accuracy and reduce human error, cost and time of mineral identification. The aim of this study is to propose an automated identification system which uses image processing to identify and classify existing minerals.The main steps of the proposed method include collecting images from thin sections, segmentation, feature extraction and classification. After creating the image database, the JSEG algorithm is applied for segmentation. Then, the color and texture features in both RGB and HSI color spaces are extracted from each region and are sent to the classifier for classification, which labels each segment as a mineral. In this study, the efficiency of six different classifiers has been evaluated. According to the results, the Bagged Tree classifier has the highest accuracy of 95.52% and the lowest Mean Absolute Error of 0.04. Also, all classifiers have accuracies of over 93%, which indicates that the proposed feature extraction method is able to properly identify minerals.

Keywords: image processing, thin section, Mineral, Segmentation, Classification

Abstract View Paper Research/Original Article Original: Persian
Combining a Regularization Method and the Optimal Brain Damage Method for Reducing a Deep Learning Model Size

Mahmood Amintoosi * Pages 31-45

One of the challenges of convolutional neural networks (CNNs), as the main tool of deep learning, is the large volume of some relevant models. CNNs, inspired form the brain, have millions of connections. Reducing the volume of these models is done by removing (pruning) the redundant connections of the model. Optimal Brain Damage (OBD) and Sparse Regularization are among the famous methods in this field. In this study, a deep learning model has been trained and the effect of reducing connections with the aforementioned methods on its performance has been investigated. As the proposed approach, by combining the OBD and regularization methods its redundant connections were pruned. The resulting model is a smaller model, which has less memory and computational load than the original model, and at the same time its performance is not less than the original model. The experimental results show that the hybrid approach can be more efficient than each of the methods, in the most tested datasets. In one dataset , with the proposed method, the number of connections were reduced by 76%, without sacrificing the efficiency of the model. This reduction in model size has decreased the processing time by 66 percent. The smaller the software model, the more likely it is to be used on weaker hardware, found everywhere, and web applications.

Keywords: Convolutional Neural Networks, Network Pruning, Deep Learning, Sparse Optimization, Sparse Regularization

Abstract View Paper Research/Original Article Original: Persian
Providing a Voting-Based Method for Combining Deep Neural Network Outputs to Layout Analysis of Printed Documents

Amirreza Fateh, Mohsen Rezvani, Alireza Tajary, Mansoor Fateh * Pages 47-64

In the last few decades, a lot of research has been done in the field of OCR or optical character recognition. Optical character recognition is one of the ways to convert text images to editable text and recognize letters and words automatically. Recognizing textual and non-textual areas within a document is known as document layout analysis, and is one of the key steps in the process of converting a document image to editable text. Separating textual and non-textual areas within an image is one of the most effective possible preprocesses in optical character recognition systems. The lack of the same template on all pages, the presence of complex backgrounds, different kinds of noises, low quality, image rotation, and the existence of more than one text column prevent the correct recognition of areas containing text. Failure to correctly recognize areas containing text and, consequently, incorrect recognition of line coordinates will disrupt all subsequent parts of an optical character recognition system. In this research, a new method has been proposed to recognize textual areas within the image. The proposed method, using various methods and using a voting system among them, extracts the textual areas of the image. The proposed method has been trained and tested on a dataset with more than 950 images and reached 97.94% accuracy. The presented dataset in this article is open access.
Keywords: Image Segmentation, document layout analysis, Text detection, image detection, Voting

Abstract View Paper Research/Original Article Original: Persian
Proposing a deep self – supervised learning method based on two dimensional discrete wavelet transform for image domain generalization

Sara Farahmandinia, Mahdi Eftekhari *, Kaveh Bahraman Pages 65-76

In machine learning, transferring and generalizing the knowledge learned from one domain to another is one of the important and basic capabilities. Since supervised learning is not complete, the use of other methods, such as self-supervised learning methods, can be very helpful in domain generalization. In this paper, we present a method that, in addition to classify original images in order to learn data labels in a supervised process, attempts to classify images resulting from the application of discrete wavelet transform on the original images by generating pseudo-labels for them. This extra work as a self-supervision task can lead to learn useful features and a general image representation for images of different domains, which can greatly help to improve the problem of domain generalization. In the following, by combining self-supervised methods such as jigsaw puzzles and guessing the rotation angle with discrete wavelet transform, we show that this combination can improve the results for the domain generalization problem. In this paper, we used the well-known PACS, VLCS and office-Home datasets to perform experiments, and the results show that our proposed method can work better than advanced and state-of-the-art domain generalization methods.
Keywords: Domain adaptation, domain generalization, Source Domain, target domain, self – supervised learning, wavelet transform

Abstract View Paper Research/Original Article Original: Persian
A method for automatic temporal and relative calibration of the amateur cameras to produce 3D videos

Atiyeh Ganjali, Alireza Safdarinezhad * Pages 77-91

In this paper, a novel method has been proposed to automatically produce 3D videos through amateur digital cameras that have been fixed with each other. Asynchrony of the videos (frames per second rates and start times), unavailability of the exact camera internal parameters, and the technical limitation of precise relative adjusting the stereo cameras could be known as the main challenges of producing 3D videos via the amateur cameras. In the proposed method, the videos acquired by the stereo camera are synchronized through the automatic matching between temporal indices. Then, the calm periods of videos (the times with zero relative velocity between camera and scenes) are detected to be used for selecting proper matched image frames. Proper matched frames are then applied for finding matched points via a geometrical constrained feature-based matching method. The matched points are used for self-calibration as well as relative parameters estimation of the stereo cameras. In the last step, epipolar resampling procedure has been used to generate normalized videos. Automatic and precise synchronization of the stereo videos as well as the proper generalization of the proposed approach in the different sample datasets has been seen in the evaluation processes. Spectator satisfaction in the depth perception of the 3D videos is another quality achievement of the proposed method.

Abstract View Paper Research/Original Article Original: Persian

به جمع مشترکان مگیران بپیوندید!

فهرست مطالب

نشریه ماشین بینایی و پردازش تصویر
سال نهم شماره 1 (بهار 1401)

نشریه ماشین بینایی و پردازش تصویر

Machine Vision and Image Processing

به جمع مشترکان مگیران بپیوندید!

فهرست مطالب

نشریه ماشین بینایی و پردازش تصویر سال نهم شماره 1 (بهار 1401)

نشریه ماشین بینایی و پردازش تصویر

Machine Vision and Image Processing

نشریه ماشین بینایی و پردازش تصویر
سال نهم شماره 1 (بهار 1401)