طراحی کنترل‌کننده تطبیقی بهینه بدون مدل برای باز پیکربندی آرایش پروازی ماهواره‌ها با یادگیری تقویتی

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری / دانشکده برق، دانشگاه علم و صنعت ایران

2 عضو هیات علمی / دانشکده برق، دانشگاه علم و صنعت ایران

3 عضو هیات علمی / دانشکده کامپیوتر، دانشگاه علم و صنعت ایران

چکیده

در این مقاله یک کنترل‌کننده تطبیقی بهینه بدون مدل برای باز پیکربندی آرایش پروازی ماهواره‌ها ارائه می‌شود. باز پیکربندی آرایش پروازی ماهواره‌ها، یک قابلیت مهم برای دستیابی به اهداف ماموریت‌های آرایش پروازی ماهواره‌ها است. به دلیل پیچیدگی استخراج یک مدل ریاضی دقیق و همچنین حضور اغتشاشات مداری و نامعینی‌ها طراحی یک کنترل‌کننده بهینه امری دشوار است. در این کار، ابتدا یک کنترل‌کننده بهینه بر مبنای یک تابع هزینه تعمیم‌یافته تخفیف‌یافته استخراج می‌شود. سپس پایداری آن با استفاده از روش لیاپانوف به اثبات می‌رسد. این کنترل‌کننده برای آنکه قابلیت پیاده‌سازی بروی کامپیوتر‌های ماهواره را داشته باشد، به صورت دیجیتالی طراحی شده است. سپس با استفاده از روش‌های یادگیری تقویتی، الگوریتمی برای حل مسئله ارائه می‌شود. این الگوریتم قادر است تا مسئله ردیابی باز پیکربندی آرایش پروازی ماهواره‌ها را به صورت بر-خط و بدون نیاز به مدل حل کند. در انتها، کارایی روش پیشنهادی در یک سناریو ماموریتی بازپیکربندی آرایش پروازی ماهواره‌ها، صحه‌گذاری می‌شود.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Model-free adaptive optimal control of spacecraft formation flying reconfiguration using Q-Learning

نویسندگان [English]

  • Mohammadrasoul Kankashvar 1
  • Hossein Bolandi 2
  • Naser Mozayani 3
1 Iran University of Science and Technology
2 Iran University of Science and Technology
3 Iran University of Science and Technology
چکیده [English]

This paper investigates an optimal adaptive controller based on reinforcement learning while considering orbital perturbations. The controller can achieve mission goals, online without any model. Reconfiguration capabilities provide great flexibility in achieving formation flying mission goals. In reconfiguration, it is desired that spacecrafts migrate from the current formation to a new formation, thus achieving mission goals. Orbital perturbations, difficulties in extracting exact mathematical models, and unknown system dynamics make the optimal reconfiguration problem challenging. Due to the digital nature of spacecraft computer systems, controllers have to be implemented digitally. Accordingly, this paper introduces an adaptive optimal digital controller for a discounted generalized cost function. The stability of the proposed controller is proven by the Lyapunov method. Then, using the Q-learning method, an algorithm is presented so that the controller can find the optimal control gains in a model-free fashion. Finally, numerical simulations of a formation flying mission scenario, confirm the effectiveness of this method.

کلیدواژه‌ها [English]

  • Reinforcement learning
  • spacecraft formation flying
  • Q-learning
  • optimal adaptive control
  • multi-agent systems
[1] F. Y. Hadaegh, S. J. Chung, and H. M. Manohara, “On development of 100-gram-class spacecraft for swarm applications,” IEEE Syst. J., vol. 10, no. 2, pp. 673–684, 2016, doi: 10.1109/JSYST.2014.2327972.
[2] D. Wang, B. Wu, and E. K. Poh, Satellite Formation Flying Relative Dynamics, Formation Design, Fuel Optimal Maneuvers and Formation Maintenance, vol. 87, no. 2001. 2017.
[3] K. T. Alfriend, S. R. Vadali, P. Gurfil, J. P. How, and L. S. Breger, Spacecraft formation flying: Dynamics, control and navigation. 2009.
[4] H. Cho, “Energy-optimal reconfiguration of satellite formation flying in the presence of uncertainties,” Adv. Sp. Res., vol. 67, no. 5, pp. 1454–1467, Mar. 2021, doi: 10.1016/j.asr.2020.11.036.
[5] K. Dharmarajan and G. B. Palmerini, “Optimal Reconfiguration Manoeuvres in Formation Flying Missions,” in 2021 IEEE Aerospace Conference (50100), Mar. 2021, vol. 2021-March, pp. 1–9, doi: 10.1109/AERO50100.2021.9438285.
[6] X. Bai, Y. He, and M. Xu, “Low-Thrust Reconfiguration Strategy and Optimization for Formation Flying Using Jordan Normal Form,” IEEE Trans. Aerosp. Electron. Syst., vol. 57, no. 5, pp. 3279–3295, Oct. 2021, doi: 10.1109/TAES.2021.3074204.
[7] G. Di Mauro, D. Spiller, S. F. Rafano Carnà, and R. Bevilacqua, “Minimum-fuel control strategy for spacecraft formation reconfiguration via finite-time maneuvers,” J. Guid. Control. Dyn., vol. 42, no. 4, pp. 752–768, 2019, doi: 10.2514/1.G003822.
[8] G. Di Mauro, D. Spiller, R. Bevilacqua, and S. D’Amico, “Spacecraft formation flying reconfiguration with extended and impulsive maneuvers,” J. Franklin Inst., vol. 356, no. 6, pp. 3474–3507, Apr. 2019, doi: 10.1016/j.jfranklin.2019.02.012.
[9] H. M. PARI and H. Bolandi, “Discrete time multiple spacecraft formation flying attitude optimal control in presence of relative state constraints,” Chinese J. Aeronaut., vol. 34, no. 4, pp. 293–305, 2021.
[10] D. Wang, B. Wu, and E. K. Poh, Satellite Formation Flying, vol. 87. Singapore: Springer Singapore, 2017.
[11] H. Rouzegar, A. Khosravi, and P. Sarhadi, “Spacecraft formation flying control under orbital perturbations by state-dependent Riccati equation method in the presence of on–off actuators,” Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng., vol. 233, no. 8, pp. 2853–2867, Jun. 2019, doi: 10.1177/0954410018787417.
[12] X. Liu, P. Lu, and B. Pan, “Survey of convex optimization for aerospace applications,” Astrodynamics, vol. 1, no. 1, pp. 23–40, Sep. 2017, doi: 10.1007/s42064-017-0003-8.
[13] D. Parente, D. Spiller, and F. Curti, “Time-suboptimal satellite formation maneuvers using inverse dynamics and differential evolution,” J. Guid. Control. Dyn., vol. 41, no. 5, pp. 1108–1121, 2018, doi: 10.2514/1.G003110.
[14] W. Wang, G. Mengali, A. A. Quarta, and J. Yuan, “Distributed adaptive synchronization for multiple spacecraft formation flying around Lagrange point orbits,” Aerosp. Sci. Technol., vol. 74, pp. 93–103, 2018, doi: 10.1016/j.ast.2018.01.007.
[15] B. Shasti, A. Alasty, and N. Assadian, “Robust distributed control of spacecraft formation flying with adaptive network topology,” Acta Astronaut., vol. 136, no. October 2016, pp. 281–296, Jul. 2017, doi: 10.1016/j.actaastro.2017.03.001.
[16] H. Liu, Y. Tian, F. L. Lewis, Y. Wan, and K. P. Valavanis, “Robust formation flying control for a team of satellites subject to nonlinearities and uncertainties,” Aerosp. Sci. Technol., vol. 95, p. 105455, Dec. 2019, doi: 10.1016/j.ast.2019.105455.
[17] Y. Guo, J. Zhou, and Y. Liu, “Distributed RISE control for spacecraft formation reconfiguration with collision avoidance,” J. Franklin Inst., vol. 356, no. 10, pp. 5332–5352, Jul. 2019, doi: 10.1016/j.jfranklin.2019.05.003.
[18] D. Lee, “Nonlinear disturbance observer-based robust control for spacecraft formation flying,” Aerosp. Sci. Technol., vol. 76, pp. 82–90, May 2018, doi: 10.1016/j.ast.2018.01.027.
[19] G. Gaias and S. D’Amico, “Impulsive Maneuvers for Formation Reconfiguration Using Relative Orbital Elements,” J. Guid. Control. Dyn., vol. 38, no. 6, pp. 1036–1049, Jun. 2015, doi: 10.2514/1.G000189.
[20] M. Chernick and S. D’Amico, “New Closed-Form Solutions for Optimal Impulsive Control of Spacecraft Relative Motion,” J. Guid. Control. Dyn., vol. 41, no. 2, pp. 301–319, Feb. 2018, doi: 10.2514/1.G002848.
[21] B. L. Wu, D. W. Wang, and E. K. Poh, “Energy-optimal low-thrust satellite formation manoeuvre in presence of J2 perturbation,” Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng., vol. 225, no. 9, pp. 961–968, 2011, doi: 10.1177/0954410011408659.
[22] A. D. Ogundele, “Approximate analytic solution of nonlinear Riccati spacecraft formation flying dynamics in terms of orbit element differences,” Aerosp. Sci. Technol., vol. 113, p. 106686, 2021, doi: 10.1016/j.ast.2021.106686.
[23] J. C. H. Christopher, “Watkins and peter dayan,” Q-Learning. Mach. Learn., vol. 8, no. 3, pp. 279–292, 1992.
[24] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[25] L. Buşoniu, T. de Bruin, D. Tolić, J. Kober, and I. Palunko, “Reinforcement learning for control: Performance, stability, and deep approximators,” Annu. Rev. Control, vol. 46, no. xxxx, pp. 8–28, 2018, doi: 10.1016/j.arcontrol.2018.09.005.
[26] S. G. Khan, G. Herrmann, F. L. Lewis, T. Pipe, and C. Melhuish, “Reinforcement learning and optimal adaptive control: An overview and implementation examples,” Annu. Rev. Control, vol. 36, no. 1, pp. 42–59, 2012, doi: 10.1016/j.arcontrol.2012.03.004.
[27] D. Bertsekas, Dynamic programming and optimal control: Volume I, vol. 1. Athena scientific, 2012.
[28] D. Bertsekas, Reinforcement learning and optimal control. Athena Scientific, 2019.
[29] Y. Yang, Y. Wan, J. Zhu, and F. L. Lewis, “H ∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs,” IEEE Control Syst. Lett., vol. 5, no. 1, pp. 175–180, Jan. 2021, doi: 10.1109/LCSYS.2020.3001241.
[30] C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems with Unknown Dynamics,” IEEE Trans. Automat. Contr., vol. 64, no. 11, pp. 4423–4438, Nov. 2019, doi: 10.1109/TAC.2019.2905215.
[31] N. Li, I. Kolmanovsky, and A. Girard, “LQ control of unknown discrete-time linear systems—A novel approach and a comparison study,” Optim. Control Appl. Methods, vol. 40, no. 2, pp. 265–291, 2019, doi: 10.1002/oca.2477.
[32] S. A. A. Rizvi and Z. Lin, “Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem,” IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 5, pp. 1523–1536, 2019, doi: 10.1109/TNNLS.2018.2870075.
[33] S. Ali Asad Rizvi and Z. Lin, “Model-Free Global Stabilization of Discrete-Time Linear Systems with Saturating Actuators Using Reinforcement Learning,” Proc. IEEE Conf. Decis. Control, vol. 2018-Decem, no. Cdc, pp. 5276–5281, 2019, doi: 10.1109/CDC.2018.8618941.
[34] S. Ali, A. Rizvi, and Z. Lin, “Output Feedback Optimal Tracking Control Using Reinforcement Q-Learning,” Proc. Am. Control Conf., vol. 2018-June, no. 2, pp. 3423–3428, 2018, doi: 10.23919/ACC.2018.8430997.
[35] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and Autonomous Control Using Reinforcement Learning: A Survey,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 6, pp. 2042–2062, 2018, doi: 10.1109/TNNLS.2017.2773458.
[36] X. Li, L. Xue, and C. Sun, “Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm,” Neurocomputing, vol. 314, pp. 86–93, 2018, doi: 10.1016/j.neucom.2018.05.111.
[37] M. Zheng, Y. Wu, and C. Li, “Reinforcement learning strategy for spacecraft attitude hyperagile tracking control with uncertainties,” Aerosp. Sci. Technol., vol. 119, p. 107126, Dec. 2021, doi: 10.1016/j.ast.2021.107126.
[38] X. Wang, P. Shi, C. Wen, and Y. Zhao, “Design of Parameter-self-tuning Controller Based on Reinforcement Learning for Tracking Non-cooperative Targets in Space,” IEEE Trans. Aerosp. Electron. Syst., vol. 9251, no. c, pp. 1–1, 2020, doi: 10.1109/taes.2020.2988170.
[39] J. Broida and R. Linares, “Spacecraft rendezvous guidance in cluttered environments via reinforcement learning,” Adv. Astronaut. Sci., vol. 168, pp. 1777–1788, 2019.
[40] F. Sun and K. Turkoglu, “Reinforcement learning based continuous-time on-line spacecraft dynamics control: Case study of NASA SPHERES spacecraft,” AIAA Guid. Navig. Control Conf. 2018, no. 210039, pp. 1–11, Jan. 2018, doi: 10.2514/6.2018-0859.
[41] S. Silvestrini and M. R. Lavagna, “Spacecraft Formation Relative Trajectories Identification for Collision-Free Maneuvers using Neural-Reconstructed Dynamics,” no. January, pp. 1–14, 2020, doi: 10.2514/6.2020-1918.
[42] M. Shirobokov, S. Trofimov, and M. Ovchinnikov, “Survey of machine learning techniques in spacecraft control design,” Acta Astronaut., vol. 186, no. April, pp. 87–97, Sep. 2021, doi: 10.1016/j.actaastro.2021.05.018.
[43] A. Scorsoglio, A. D’Ambrosio, L. Ghilardi, B. Gaudet, F. Curti, and R. Furfaro, “Image-Based Deep Reinforcement Meta-Learning for Autonomous Lunar Landing,” J. Spacecr. Rockets, pp. 1–13, 2021.
[44] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley & Sons, 2012.
[45] J. Sullivan, S. Grimberg, and S. D’Amico, “Comprehensive survey and assessment of spacecraft relative motion dynamics models,” J. Guid. Control. Dyn., vol. 40, no. 8, pp. 1837–1859, 2017, doi: 10.2514/1.G002309.
[46] S. A. Schweighart and R. J. Sedwick, “High-Fidelity Linearized J Model for Satellite Formation Flight,” J. Guid. Control. Dyn., vol. 25, no. 6, pp. 1073–1080, Nov. 2002, doi: 10.2514/2.4986.
[47] F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, 2012.
[48] D. Vrabie, K. G. Vamvoudakis, and F. L. Lewis, Optimal adaptive control and differential games by reinforcement learning principles. 2012.
[49] P. Werbos, “Approximate dynamic programming for realtime control and neural modelling,” Handb. Intell. Control neural, fuzzy Adapt. approaches, pp. 493–525, 1992.
[50] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007, doi: 10.1016/j.automatica.2006.09.019.