جستجوی مقالات مرتبط با کلیدواژه

parallel processing

در نشریات گروه برق

تکرار جستجوی کلیدواژه parallel processing در نشریات گروه فنی و مهندسی

تکرار جستجوی کلیدواژه parallel processing در مقالات مجلات علمی

انتخاب همه

An Efficient Region-of-Interest (ROI) based Scalable Framework for Free Viewpoint Video Application

H. Roodaki *

Journal of Electrical and Computer Engineering Innovations, Volume:12 Issue: 1, Winter-Spring 2024, PP 283 -293

Background and Objectives
From the multiview recorded video, free viewpoint video provides flexible viewpoint navigation. Thus, a lot of views need to be sent to the receivers in an encoded format. The scalable nature of the coded bitstream is one method of lowering the volume of data. However, adhering to the limitations of the free viewpoint application heavily relies on the kind of scalable modality chosen. The perceptual quality of the received sequences and the efficiency of the compression technique are significantly impacted by the scalable modality that was chosen.
Methods
In order to address the primary issues with free-viewpoint video, such as high bandwidth requirements and computational complexity, this paper suggests a scalable framework. The two components of the suggested framework are as follows: 1) introducing appropriate scalable modality and data assignment to the base and enhancement layers; and 2) bit budget allocation to the base and enhancement layers using a rate control algorithm. In our novel scalable modality, termed Tile-based scalability, the idea of Region of Interest (ROI) is employed, and the region of interest is extracted using the tile coding concept first presented in the MV-HEVC.
Results
When compared to the state-of-the-art techniques, our approach's computational complexity can be reduced by an average of 44% thanks to the concept of tile-coding with parallel processing capabilities. Furthermore, in comparison to standard MV-HEVC, our suggested rate control achieves an average 17.7 reduction in bandwidth and 1.2 improvement in video quality in the Bjøntegaard-Bitrate and Bjøntegaard-PSNR scales.
Conclusion
Using new tile-based scalability, a novel scalable framework for free-viewpoint video applications is proposed. It assigns appropriate regions to the base and enhancement layers based on the unique features of free viewpoint scalability. Next, a rate control strategy is put forth to allocate a suitable bitrate to both the base and enhancement layers. According to experimental results, the suggested method can achieve a good coding efficiency with significantly less computational complexity than state-of-the-art techniques that used the λ-domain rate control method.

Keywords: Tile-based Scalability, Region of Interest, λ-domain rate control algorithm, MV-HEVC, Parallel processing

Abstract View Paper Research/Original Article Original: English
جهت یابی زمان حقیقی منابع صوت زیر آب با استفاده از واحد پردازنده گرافیکی

احسان ایمانی فر، امیر اخوان*، علی اصغر آبنیکی

فصلنامه پردازش علائم و داده ها، سال هجدهم شماره 2 (پیاپی 48، تابستان 1400)، صص 45 -56

جهت یابی منابع صوت به کمک روش های مبتنی بر آرایه فازی، اهمیت فراوانی در حوزه های مختلف از جمله سونار، بینایی ربات و تشخیص عیوب مکانیکی دارد. روش های شکل دهی پرتو وفقی، از جمله الگوریتم کمینه واریانس بدون اعوجاج از قدرت تفکیک بالایی نسبت به روش های غیروفقی برخوردار هستند؛ اما این برتری در ازای پیچیدگی محاسباتی این الگوریتم ها بدست آمده است. این مسئله باعث می شود در کاربردهایی که نیاز به جهت یابی زمان حقیقی منبع صوت دارند، به ندرت از این الگوریتم ها استفاده شود. از سوی دیگر، یک ویژگی مهم روش های شکل دهی پرتو وفقی از جمله کمینه واریانس، پتانسیل بالای این الگوریتم ها برای موازی سازی می باشد. هدف این مقاله، پیاده سازی موازی الگوریتم کمینه واریانس با به کارگیری واحد پردازنده گرافیکی (GPU)، به جای واحد پردازنده مرکزی (CPU) به منظور افزایش سرعت اجرا و رسیدن به حالت زمان حقیقی می باشد. برای دست یابی به این هدف از مدل برنامه نویسی کودا برای پیاده سازی الگوریتم بر روی پردازنده گرافیکی استفاده شده است. به منظور بررسی عملکرد پیاده سازی موازی الگوریتم کمینه واریانس، دو مدل GPU متفاوت و همچنین CPU بکاربرده شده است. صحت عملکرد پیاده سازی های مختلف در این مقاله توسط داده های واقعی سونار و همچنین داده های شبیه سازی تایید گردید. نتایج نشان می دهد که می توان با استفاده از یک آرایه 64 حسگره، جهت منابع صوت زیر آب را با استفاده از الگوریتم کمینه واریانس به صورت زمان حقیقی و با قدرت تفکیک بالا تخمین زد.

کلید واژگان: جهت یابی منابع صوت، الگوریتم کمینه واریانس، پردازش موازی، واحد پردازنده گرافیکی، مدل برنامه نویسی کودا

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Real-Time DOA Estimation of Underwater Sound Sources Using GPU

Ehsan Imani Far, Amir Akhavan*, Ali Asghar Abniki

Signal and Data Processing, Volume:18 Issue: 2, 2021, PP 45 -56

Direction of Arrival (DOA) estimation of sound sources using phased array-based methods has a lot of importance in various fields, including sonar, robot vision and mechanical defect detection. Adaptive beamforming methods, such as the MVDR (Minimum Variance Distortionless Response) algorithm, have high resolution compared to non-adaptive methods; but this advantage is achieved in return for the computational complexity of these algorithms. This makes it hard to use these algorithms in applications that require real-time sound source DOA estimation. On the other hand, an important feature of the adaptive beamforming methods including MVDR is the high potential of these algorithms for parallelization. The purpose of this paper is the parallel implementation of the MVDR algorithm by employing GPU instead of CPU to increase the execution speed and achieve real-time mode. To achieve this purpose, the CUDA programming model has been used to implement the algorithm on the GPU. In order to investigate the performance of parallel implementation of the MVDR algorithm, two different GPUs, as well as CPUs, have been used. The performance validity of various implementations in this paper was confirmed by real sonar data as well as simulation data. The results show that using an array of 64 sensors, it is possible to estimate the DOA of underwater sound sources in real-time and with high resolution using the MVDR algorithm.

Keywords: DOA estimation of sound sources, MVDR algorithm, Parallel processing, GPU, CUDA

Abstract View Paper Research/Original Article Original: Persian
الگوریتم توزیع شده بادبان ماهی مبتنی بر سیستم های چند عامله جهت حل توابع غیر محدب و مقیاس پذیر و پیاده سازی آن توسط پردازشگرهای گرافیکی

سوده شادروان، حمید رضا ناجی *، وحید خطیبی

مجله هوش مصنوعی و داده کاوی، سال نهم شماره 1 (Winter 2021)، صص 59 -71

مشاهده متن مقاله پژوهشی/اصیل زبان: انگلیسی

A Distributed Sailfish Optimizer Based on Multi-Agent Systems for Solving Non-Convex and Scalable Optimization Problems Implemented on GPU

S. Shadravan, H. Naji *, V. Khatibi

Journal of Artificial Intelligence and Data Mining, Volume:9 Issue: 1, Winter 2021, PP 59 -71

The SailFish Optimizer (SFO) is a metaheuristic algorithm inspired by a group of hunting sailfish that alternates their attacks on group of prey. The SFO algorithm takes advantage of using a simple method for providing the dynamic balance between exploration and exploitation phases, creating the swarm diversity, avoiding local optima, and guaranteeing high convergence speed. Nowadays, multi agent systems and metaheuristic algorithms can provide high performance solutions for solving combinatorial optimization problems. These methods provide a prominent approach to reduce the execution time and improve of the solution quality. In this paper, we elaborate a multi agent based and distributed method for sailfish optimizer (DSFO), which improves the execution time and speedup of the algorithm while maintaining the results of optimization in high quality. The Graphics Processing Units (GPUs) using Compute Uniﬁed Device Architecture (CUDA) are used for the massive computation requirements in this approach. In depth of the study, we present the implementation details and performance observations of DSFO algorithm. Also, a comparative study of distributed and sequential SFO is performed on a set of standard benchmark optimization functions. Moreover, the execution time of distributed SFO is compared with other parallel algorithms to show the speed of the proposed algorithm for solving unconstrained optimization problems. The final results indicate that the proposed method is executed about maximum 14 times faster than other parallel algorithms and shows the ability of DSFO for solving non-separable, non-convex and scalable optimization problems.

Keywords: SailFish Optimizer (SFO), Multi agent system, parallel processing, shared memory, Graphic processing units

Abstract View Paper Research/Original Article Original: English
بهبود کارایی تبدیل موجک گسسته دو بعدی با استفاده از تکنیک موازی سازی در سطح داده

عبدالبصیر تیباش، اسدالله شاه بهرامی*

نشریه مهندسی برق، سال چهل و نهم شماره 4 (پیاپی 90، زمستان 1398)، صص 1547 -1558

تبدیل موجک گسسته دوبعدی (2D-DWT) به صورت گسترده ای در کاربردهای مختلف پردازش داده های چندرسانه ای ازجمله استانداردهای فشرده سازی تصاویر و ویدیو مورداستفاده قرار می گیرد. بااین وجود، این تبدیل دارای پیچیدگی محاسباتی بالاتری نسبت به تبدیل های مرسوم مانند تبدیل گسسته کسینوسی و دیگر توابع موجود در استانداردهای فشرده سازی است و بیشترین درصد از زمان اجرا را به خود اختصاص می دهد. در این مقاله، برای بهبود کارایی 2D-DWT از مجموع دستورات فناوری های توسعه برداری پیشرفته AVX/AVX2 و جمع ضرب ترکیبی (FMA) که قابلیت پردازش 256 بیت داده با استفاده از معماری یک دستورالعمل و چندین داده (SIMD) که توسط اکثر پردازشگرهای همه منظوره (GPP) پشتیبانی می گردد، پیشنهادشده است. با استفاده از این فناوری ها قابلیت پردازش هشت داده 32 بیتی برای اعداد اعشاری و شانزده داده 16 بیتی برای اعداد صحیح شانزده بیتی در ثبات های SIMD یک GPP فراهم می گردد. بعلاوه نحوی نگاشت تبدیل های مختلف موجک به روش پردازش های سطری-ستونی که پردازش های سطری و ستونی را جداگانه انجام می دهد و مبتنی بر خط که هر دو، سطرها و ستون های تصویر را در یک حلقه پردازش می کند، استفاده شده است. نتایج پیاده سازی موازی سازی تبدیل های مختلف بر روی یک پلتفرم GPP نشان داد که کارایی، 2D-DWT به ازای اندازه تصاویر مختلف را می توان تا 28.8 برابر نسبت به پیاده سازی سریال بالا برد. همچنین نگاشت مبتنی بر خط که باعث استفاده بهتر از ساختار سلسله مراتبی حافظه می گردد، کارایی را نسبت به نگاشت سطری – ستونی بیشتر بهبود می دهد.

کلید واژگان: پردازشگرهای همه منظوره، پردازش موازی، تبدیل موجک گسسته دو بعدی، موازی سازی سطح داده، یک دستورالعمل چندین داده

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Performance Improvement of 2D Discrete Wavelet Transform using Data-Level Parallelism Technique

A. Tibash, A. Shahbahrami *

Journal of Electrical Engineering, Volume:49 Issue: 4, 2020, PP 1547 -1558

The two-Dimensional Discrete Wavelet Transform (2D-DWT) is widely used in various applications for multimedia data processing, including image and video compression standards. However, this transform is computational intensive than conventional conversions, such as the discrete cosine transform. In this paper, in order to improve the performance of 2D-DWT, we use Single Instruction, Multiple Data (SIMD) set instructions including Advanced Vector Extensions (AVX), Fused Multiply-Add (FMA), and AVX2 supported by most General-Purpose Processors (GPP). These technologies capable to process 256-bit data located in SIMD registers. The AVX technology can process eight 32-bit floating point numbers, while AVX2 processes sixteen 16-bit fixed-point numbers. In other words, it is possible to exploit 8- and 16-way data-level parallelism. In addition, two different way of parallelism, Row Column Wavelet Transform (RCWT) which processes rows and columns separately and Line-Based Wavelet Transform (LBWT) that processes both rows and columns in a single loop are used. Experimental results of different wavelet transform with different image sizes on a GPP show that the speedups of up to 28.8x yield. Furthermore, LBWT approach improves performance more than RCWT. This is because it uses memory hierarchy structure more efficiently than RCWT approach.

Keywords: Data-Level Parallelism, Discrete Wavelet Transform, General-Purpose Processor, Parallel processing, Single Instruction, Multiple Data

Abstract View Paper Research/Original Article Original: Persian
افزایش سرعت الگوریتم حذف درز با تجزیه به زیرتصاویر زوج و فرد

فاطمه سیر، سعید مظفری *

نشریه مهندسی برق و مهندسی کامپیوتر ایران، سال پانزدهم شماره 4 (پیاپی 45، زمستان 1396)، ص 315

روش حذف درز یکی از روش های تغییر ابعاد مبتنی بر محتوا است. در این روش، مسیر پیوسته ای از پیکسل های کم ارزش که از بالا تا پایین و یا از چپ تا راست تصویر امتداد دارند و درز نامیده می شوند، استخراج می گردند. با حذف درزها از تصویر و یا اضافه کردن آنها به تصویر، می توان ابعاد تصویر را به ترتیب کاهش و یا افزایش داد. روش حذف درز را از دو منظر سرعت و کیفیت می توان مورد مطالعه قرار داد. در این مقاله یک روش موازی سازی برای افزایش سرعت این الگوریتم ارائه شده که در آن تصویر اصلی به دو زیرتصویر زوج و فرد تجزیه می شود و عمل جستجو به طور مستقل روی این دو تصویر انجام می گردد. در مقایسه با روش حذف درز، روش پیشنهادی با حفظ نسبی کیفیت تصویر، سرعت را به حداقل دو برابر افزایش می دهد. می توان هر یک از روش های جستجوی درز پیشین را در روش پیشنهادی به کار برد و یا آن را با سایر روش های موازی ادغام نمود. در ادامه به اصلاح روش پیشنهادی با هدف افزایش کیفیت پرداخته شده است.

کلید واژگان: حذف درز، تغییر ابعاد آگاه بر محتوا، پردازش موازی، تجزیه تصویر

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Seam Carving Speed Improvement by Odd and Even Subimages Decomposition

F. Siar, S. Mozaffari *

Iranian Journal of Electrical and Computer Engineering, Volume:15 Issue: 4, 2018, P 315

Seam carving is one of content aware image retargeting techniques. In this method, a path of pixels with lowest energy, called seam, crossing from top to bottom or from left to right in an image is extracted. By removing or inserting seams, size of the image can be changed. Speed and quality are two main parameters in seam carving. In this paper a new method for speed enhancement of seam carving is proposed. The input image is decomposed into odd and even subimages and searching for seams is performed in parallel in these subimages. Compared to the original seam carving, the proposed method improves the speed at least by two times while maintain images quality unchanged. Previous seam searching algorithms can be utilized in our method or it can be combined with other parallel processing schemes. Finally, image quality of the proposed seam carving is improved.

Keywords: Seam carving, content aware image retargeting, parallel processing, image decomposition

Abstract View Paper Research/Original Article Original: Persian
آشکارسازی سیگنال بر اساس پردازش موازی مبتنی بر جی پی یو در شبکه های حس گری صوتی دارای زیرساخت

حامد صادقی، امیر اخوان

فصلنامه پردازش علائم و داده ها، سال چهاردهم شماره 4 (پیاپی 34، زمستان 1396)، صص 19 -30

الگوریتم فیشر، یکی از معروف ترین و پرکاربردترین روش های آشکارسازی آرایه ای سیگنال های صوتی بسامد پایین در شبکه های حس گری دارای زیرساخت است؛ اما یکی از مشکلات عمده در به کارگیری این الگوریتم، زمان طولانی انجام پردازش در آن است که در عمل، پیاده سازی بلادرنگ آشکارساز را با مشکل مواجه می سازد. در این مقاله، چگونگی پیاده سازی الگوریتم فیشر را با استفاده از واحد پردازش گرافیک (جی پی یو) به منظور تحقق محاسبات سریع و انجام پردازش های نزدیک به زمان واقعی، ارائه می کنیم. به خصوص به منظور بهبود هرچه بیشتر سرعت محاسبات، الگوریتم آشکارسازی با استفاده از روش پردازش موازی (مبتنی بر جی پی یو) پیاده سازی شده است. نتایج شبیه سازی ها، ارتقای قابل ملاحظه سرعت آشکارساز فیشر را نشان می دهند که باعث بهبود کارآیی شبکه حس گری صوتی خواهد شد.

کلید واژگان: شبکه حس گری، پردازش آرایه ای، شکل دهی پرتو، پردازش موازی، جی پی یو

چکیده مشاهده متن مقاله پژوهشی/اصیل زبان: فارسی

Signal Detection Based on GPU-Assisted Parallel Processing for Infrastructure-based Acoustical Sensor Networks

Amir Akhavan Bitaghsir Dr

Signal and Data Processing, Volume:14 Issue: 4, 2018, PP 19 -30

Nowadays, several infrastructure-based low-frequency acoustical sensor networks are employed in different applications to monitor the activity of diverse natural and man-made phenomena, such as avalanches, earthquakes, volcanic eruptions, severe storms, super-sonic aircraft flights, etc. Two signal detection methods are usually implemented in these networks for the purpose of event occurrence identification, which are the progressive multi-channel correlator (PMCC) and the so-called Fisher detector. But, the Fisher method is more important and applicable in low signal-to-noise (SNR) ratio conditions, which is of a special interest in acoustical monitoring networks. Unfortunately, an important disadvantage of this algorithm is its relative high detection-time; which limits its application for real-time detection scenarios. This disadvantage is fundamentally due to a beam forming process in Fisher algorithm, which requires doing complete search in a slowness-network, constructed from possible incoming wave front directions and speeds. To address this issue, we propose a method for implementation of this beam forming on a graphics processing unit (GPU), in order to realize a fast-computing and/or near real-time signal processing technique. In addition, we also propose a parallel-processing algorithm for further enhancement of the performance of this GPU-based Fisher detector. Simulation results confirm the performance improvement of Fisher detector, in terms of required processing time for acoustical signal detection applications.

Keywords: Sensor network, array processing, beamforming, parallel processing, GPU

Abstract View Paper Research/Original Article Original: Persian
Modal Analysis of Two-Dimensional Beams Using Parallel Finite Element Method

Soroush Heydari, Saeed Asil Gharebaghi *

Scientia Iranica, Volume:24 Issue: 6, Nove-Dec 2017, PP 2762 -2775

Modal analysis is the process of determination of the natural frequencies and mode shapes of structures. In practical problems, modal analysis may be repeated many times, which results in a huge amount of computations. Although parallel processing technique can reduce the analysis time, it is rarely implemented by civil engineers because it requires more programming skills as well as designing parallel algorithms. In the present paper, the Davidson algorithm is adapted for parallel modal analysis of two-dimensional beams. More precisely, the parallel version of the Davidson algorithm is implemented from scratch. A new proposed method, which is called "Modified Checkered Method" (MCM), is introduced, and four versions of the algorithm, are implemented. Two out of four versions use Row-wise and MCM in combination with Compressed Sparse Row algorithm, while the others utilize the aforesaid methods without matrix compression. It is shown that the speedup increases when the main matrix of the standard form of eigenvalue problem is not compressed. Moreover, the speedup will increase in comparison to the Row-wise division method when MCM is used. It is notable that the implemented Parallel Finite Element source code is capable of being used in companion with a wide variety of finite elements.

Keywords: Eigenvalue problem, Parallel processing, FEM, CSR matrix compression, Davidson algorithm, Modified Checkered Method

Abstract View Paper Research/Original Article Original: English
Families of communication architectures for data centers and parallel processing derived by switching network dilation

B. Parhami

Scientia Iranica, Volume:23 Issue: 6, 2016, P 2891

Network dilation is a way of o ering system families, at a range of sizes and computational powers, which share an underlying communication architecture and routing algorithm. We consider indirect networks that connect processing nodes via intermediate switch nodes. In the simplest such indirect networks, there is a switching network of some regular topology, where each switch is connected to d other switches and to exactly one processing node. A variant, which we adopt here because it is more robust in the sense of not losing any processing capability to single-switch failures, is the use of 2-port processing nodes that connect to two neighboring switches. This alternate architecture also has the advantage of increasing the number of processing nodes from n to (d=2)n with a factorof- 2 increase in internode distances. A k-dilated version of the latter architecture replaces each processing node with a path network (linear array) of length k, thus growing the network size to k(d=2)n and also further increasing internode distances. In this paper, we study topological and performance attributes of such dilated network architectures, proving general theorems about worst-case and average internode distances and deriving the routing algorithm from that of the underlying switch network.

Keywords: Communication, Graph theory, Interconnection network, Parallel processing, Routing algorithm, Symmetric network

Abstract View Paper Original: English

نکته

نتایج بر اساس تاریخ انتشار مرتب شده‌اند.
کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شده‌است. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
در صورتی که می‌خواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.

به جمع مشترکان مگیران بپیوندید!

parallel processing