Regularized Knowledge Transfer for Multi-Agent Reinforcement Learning

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

Reinforcement learning (RL) refers to the training of machine learning models to make a sequence of decisions on which an agent learns by interacting with its environment, observing the results of interactions and receiving a positive or negative reward, accordingly. RL has many applications for multi-agent systems, especially in dynamic and unknown environments. However, most multi-agent reinforcement learning (MARL) algorithms suffer from some problems specifically the exponential computational complexity to calculate the joint state-action space, which leads to the lack of scalability of algorithms in realistic multi-agent problems. Applications of MARL can be categorized from robot soccer, networks, cloud computing, job scheduling, and to optimal reactive power dispatch. In the area of reinforcement learning algorithms, there are serious challenges such as the lack of application of equilibrium-based algorithms in practice and high computational complexity to find equilibrium.  On the other hand, since agents have no concept of equilibrium policies, they tend to act aggressively toward their goals, which it results the high probability of collisions. Consequently, in this paper, a novel algorithm called Regularized Knowledge Transfer for Multi-Agent Reinforcement Learning (RKT-MARL) is presented that relies on Markov decision process (MDP) model. RKT-MARL unlike the traditional reinforcement learning methods exploits the sparse interactions and knowledge transfer to achieve an equilibrium across agents. Moreover, RKT-MARL benefits from negotiation to find the equilibrium set. RKT-MARL uses the minimum variance method to select the best action in the equilibrium set, and transfers the knowledge of state-action values across various agents. Also, RKT-MARL initializes the Q-values in coordinate states as coefficients of current environmental information and previous knowledge. In order to evaluate the performance of our proposed method, groups of experiments are conducted on five grid world games and the results show the fast convergence and high scalability of RKT-MARL. Therefore, the fast convergence of our proposed method indicates that the agents quickly solve the problem of reinforcement learning and approach to their goal.

Language:
Persian
Published:
Signal and Data Processing, Volume:20 Issue: 4, 2024
Pages:
141 to 159
magiran.com/p2710849  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!