طراحی و تنظیم سامانه کنترل پهپاد مبتنی بر الگوریتم یادگیری تقویتی عمیق

علیزاده, محمد حسین; طلوعی, علیرضا; قاسمی, رضا

طراحی و تنظیم سامانه کنترل پهپاد مبتنی بر الگوریتم یادگیری تقویتی عمیق

نوع مقاله : مقاله پژوهشی

نویسندگان

محمد حسین علیزاده ¹

علیرضا طلوعی ²

رضا قاسمی ³

¹ کاندید دکتری، دانشکده فناوری‌های نوین و مهندسی هوافضا، دانشگاه شهید بهشتی، تهران، ایران.

² دانشیار، دانشکده فناوری‌های نوین و مهندسی هوافضا، دانشگاه شهید بهشتی، تهران، ایران

³ دانشیار، دانشکده مهندسی برق، دانشگاه قم، قم، ایران.

چکیده

تحلیل دینامیک پهپاد به دلیل تغییر ماموریت در محیط شامل عدم قطعیت، پیچیده و غیردقیق می‌باشد. در این پهپاد علاوه بر شرایط محیطی، به دلیل اختصاص عملگرهای مشترک برای کنترل، تداخل در کانال‌ها ایجاد می‌گردد. در این تحقیق با معرفی دینامیک غیر خطی پهپاد، ناگزیر با انجام ساده‌سازی مدل خطی شده تقریبی استخراج شده‌است. در این مقاله با قیود عملکردی مشخص شده، برای ردیابی زاویه فراز و غلت و همچنین مهار سرعت‌های زاویه‌ای کنترل‌گر کلاسیک طراحی می‌شود. این کنترل‌گرها چون برای تابع تبدیل تقریبی و شرایط بدون حضور عدم قطعیت طراحی می‌شوند، لذا در تمامی وضعیت‌ها و اغتشاش‌ها لزوما رفتار مناسبی ندارند. هدف از این مقاله ارائه روشی برای اصلاح اثر مدل‌سازی غیر دقیق و عدم قطعیت‌ در طراحی سامانه کنترلی کلاسیک است. روش پیشنهادی استفاده از یادگیری تقویتی برای اصلاح ضرایب کنترل‌گر است. محاسبات به صورت خارج از خط صورت گرفته و پس از یادگیری و در فرایند کاری، نتیجه به صورت بهره‌های اصلاحی به کنترل‌گر کلاسیک اعمال می‌گردند. نتایج تحقیق نشان از افزایش حداقل 20 درصدی در متوسط پاداش دریافتی و کاهش سه برابری در تعداد شبیه‌سازی‌های ناپایدار و یا شبه پایدار دارد. به عبارت دیگر قابلیت اطمینان در عملکرد پهپاد افزایش می‌یابد.

کلیدواژه‌ها

پهپاد

کنترل‌گر کلاسیک

یادگیری ماشین

یادگیری تقویتی

الگوریتم مونت کارلو

شبکه عصبی

موضوعات

هدایت، کنترل، ناوبری

عنوان مقاله English

Design and tuning of a UAV control system based on deep reinforcement learning algorithms

نویسندگان English

MohammadHosein Alizadeh ¹

Alireza Toloei ²

reza Ghasemi ³

¹ PhD candidate, Fculty of New Technologies and Aerospace Engineering, Shahid Beheshti University, Tehran, Iran.

² Associate Professor, Fculty of New Technologies and Aerospace Engineering, Shahid Beheshti University, Tehran, Iran

³ Assistant Professor, Electrical Engineering Department, University of Qom, Qom, Iran.

چکیده English

Dynamic analysis of UAVs becomes complex and imprecise due to mission changes in uncertain environments. In this UAV, besides environmental conditions, shared actuators used for control introduce interference across control channels. In this study, the nonlinear dynamics of the UAV are first introduced, and an approximate linearized model is derived through simplification. Based on this model, classical controllers are designed under specified performance constraints for pitch and roll angle tracking, as well as for damping angular velocity. However, since these controllers are designed based on approximate transfer functions and under nominal conditions without uncertainty, they may not perform adequately in all situations and disturbances. The main objective of this paper is to propose a method to compensate for modeling inaccuracies and uncertainties in classical control system design. The proposed method employs reinforcement learning to adjust controller parameters. The training process is conducted offline, and the learned corrective gains are then integrated into the classical controller during operation. The results demonstrate at least a 20% increase in the average cumulative reward and a threefold reduction in the number of unstable or quasi-stable simulations, thereby improving the UAV’s reliability and performance.

کلیدواژه‌ها English

UAV

Classic Controller

Machin Learning

Reinforcement Learning

Monte-Carlo

Neural Network

[1] Yoo, Jaehyun, Dohyun Jang, H Jin Kim, and Karl H Johansson, Hybrid Reinforcement Learning Control for a Micro Quadrotor Flight, IEEE Control Systems Letters 5, no. 2 2020.

[2] Zhang, Tianhao, Gregory Kahn, Sergey Levine, and Pieter Abbeel, Learning Deep Control Policies for Autonomous Aerial Vehicles with Mpc-Guided Policy Search, IEEE international conference on robotics and automation, 2016.

[3] Wang, Yuanda, Jia Sun, Haibo He, and Changyin Sun, Deterministic Policy Gradient with Integral Compensator for Robust Quadrotor Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems 50, no. 10, 2019.

[4] Qingqing, Zheng, Tang Renjie, Gou Siyuan, and Zhang Weizhong, A PID Gain Adjustment Scheme Based on Reinforcement Learning Algorithm for a Quadrotor, 39th Chinese Control Conference, 2020.

[5] Bøhn, Eivind, Erlend M Coates, Signe Moe, and Tor Ame Johansen, Deep Reinforcement Learning Attitude Control of Fixed-Wing Uavs Using Proximal Policy Optimization, The international conference on unmanned aircraft systems (ICUAS), 2019.

[6] Elhaki, Omid, and Khoshnam Shojaei, A Novel Model-Free Robust Saturated Reinforcement Learning-Based Controller for Quadrotors Guaranteeing Prescribed Transient and Steady State Performance, Aerospace Science and Technology 119, 2021.

[7] Chaffre, Thomas, Julien Moras, Adrien Chan-Hon-Tong, Julien Marzat, Karl Sammut, Gilles Le Chenadec, and Benoit Clement, Learning-Based Vs Model-Free Adaptive Control of a Mav under Wind Gust, The Informatics in Control, Automation and Robotics: 17th International Conference Lieusaint-Paris, France, 2022.

[8] Rodriguez-Ramos, Alejandro, Carlos Sampedro, Hriday Bavle, Paloma De La Puente, and Pascual Campoy, A Deep Reinforcement Learning Strategy for Uav Autonomous Landing on a Moving Platform, Journal of Intelligent & Robotic Systems 93, 2019.

[9] Guerra-Langan, Ana, Sergio Araujo Estrada, and Shane Windsor, Reinforcement Learning to Control Lift Coefficient Using Distributed Sensors on a Wind Tunnel Model, AIAA SCITECH 2022 Forum, 2022.

[10] Alizadeh MH, Toloei A., Designing Pitch Angle Compensator for a UAV and Robustification it with Bee Colony Optimization Algorithm, Technology in Aerospace Engineering, 2024.

[11]Esfandiari, Mohamadamin, and MA Amiri Atashgah, Reinforcement Learning Control of an Aerial Robot Based on a Tuned Proximal Policy Optimization in Takeoff and Hover Phases, 10th RSI International Conference on Robotics and Mechatronics (ICRoM), 2022.

[12] Karami, Hamede, and Reza Ghasemi, Adaptive Neural Observer-Based Nonsingular Super-Twisting Terminal Sliding-Mode Controller Design for a Class of Hovercraft Nonlinear Systems, Journal of Marine Science and Application 20, no. 2, 2021.

[13] Tran, Huu Khoa, Hoang Hai Son, Phan Van Duc, Tran Thanh Trang, and Hoang-Nam Nguyen, Improved Genetic Algorithm Tuning Controller Design for Autonomous Hovercraft, Processes 8, no. 1, 2020.

[14] Kong, Xiangyu, Yuanqing Xia, Rui Hu, Min Lin, Zhongqi Sun, and Li Dai, Trajectory Tracking Control for under-Actuated Hovercraft Using Differential Flatness and Reinforcement Learning-Based Active Disturbance Rejection Control, Journal of Systems Science and Complexity, 2022.

[15] Alizadeh, M. H., Toloei, A., Ghasemi, R., Designing a Sliding Mode Control System for a Hovercraft and Improving it with Deep Reinforcement Learning, International Journal of Engineering, 2025.

[16] Wu, Shuai, Motion Control of Unmanned Surface Vehicle Based on Improved Reinforcement Learning Proximal Policy Optimization Algorithm, 2nd International Conference on Information Technology and Intelligent Control, 2022.

[17] McLean, D., Automatic flight control systems (Book), Englewood Cliffs, NJ, Prentice Hall, 1990.

[18] Mohammadloo, S., M. H. Alizadeh and M. Jafari, Multivariable autopilot design for sounding rockets using intelligent eigenstructure assignment technique, International Journal of Control, Automation and Systems, 208-219, 2014.

[19] Sigaud, O. and O. Buffet, Markov decision processes in artificial intelligence, John Wiley & Sons, 2013.

[20] Poole, D. L. and A. K. Mackworth, Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2010.

[21] Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint, 2015.

[22] Silver, D., A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam and M. Lanctot, Mastering the game of Go with deep neural networks and tree search, nature, 2016.

[23] Sadollah, A., A. Bahreininejad, H. Eskandar and M. Hamdi, Mine blast algorithm: A new population based algorithm for solving constrained engineering optimization problems, Applied Soft Computing 13(5): 2592-2612, 2023.

[24] Sutton, Richard S, and Andrew G Barto, Reinforcement Learning: An Introduction, MIT press, 2018.

[25] Rubinstein, Reuven Y, and Dirk P Kroese, The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Vol. 133: Springer, 2004.

[26] Shuprajhaa, T, Shiva Kanth Sujit, and K Srinivasan, Reinforcement Learning Based Adaptive PID Controller Design for Control of Linear/Nonlinear Unstable Processes, Applied Soft Computing 128, 2022.

[27] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. , Proximal Policy Optimization Algorithms, arXiv preprint, 2017.

دوره 13، شماره 2 - شماره پیاپی 2
اسفند 1403
صفحه 95-112

XML

اصل مقاله 3.94 M

تعداد مشاهده مقاله 713
تعداد دریافت فایل اصل مقاله 140

دانش و فناوری هوافضا

طراحی و تنظیم سامانه کنترل پهپاد مبتنی بر الگوریتم یادگیری تقویتی عمیق

Design and tuning of a UAV control system based on deep reinforcement learning algorithms

دوره 13، شماره 2 - شماره پیاپی 2اسفند 1403صفحه 95-112

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 13، شماره 2 - شماره پیاپی 2
اسفند 1403
صفحه 95-112