Model-free adaptive optimal control of spacecraft formation flying reconfiguration using Q-Learning

Document Type : Research Paper

Authors

Iran University of Science and Technology

Abstract

This paper investigates an optimal adaptive controller based on reinforcement learning while considering orbital perturbations. The controller can achieve mission goals, online without any model. Reconfiguration capabilities provide great flexibility in achieving formation flying mission goals. In reconfiguration, it is desired that spacecrafts migrate from the current formation to a new formation, thus achieving mission goals. Orbital perturbations, difficulties in extracting exact mathematical models, and unknown system dynamics make the optimal reconfiguration problem challenging. Due to the digital nature of spacecraft computer systems, controllers have to be implemented digitally. Accordingly, this paper introduces an adaptive optimal digital controller for a discounted generalized cost function. The stability of the proposed controller is proven by the Lyapunov method. Then, using the Q-learning method, an algorithm is presented so that the controller can find the optimal control gains in a model-free fashion. Finally, numerical simulations of a formation flying mission scenario, confirm the effectiveness of this method.

Keywords

Main Subjects


[1] F. Y. Hadaegh, S. J. Chung, and H. M. Manohara, “On development of 100-gram-class spacecraft for swarm applications,” IEEE Syst. J., vol. 10, no. 2, pp. 673–684, 2016, doi: 10.1109/JSYST.2014.2327972.
[2] D. Wang, B. Wu, and E. K. Poh, Satellite Formation Flying Relative Dynamics, Formation Design, Fuel Optimal Maneuvers and Formation Maintenance, vol. 87, no. 2001. 2017.
[3] K. T. Alfriend, S. R. Vadali, P. Gurfil, J. P. How, and L. S. Breger, Spacecraft formation flying: Dynamics, control and navigation. 2009.
[4] H. Cho, “Energy-optimal reconfiguration of satellite formation flying in the presence of uncertainties,” Adv. Sp. Res., vol. 67, no. 5, pp. 1454–1467, Mar. 2021, doi: 10.1016/j.asr.2020.11.036.
[5] K. Dharmarajan and G. B. Palmerini, “Optimal Reconfiguration Manoeuvres in Formation Flying Missions,” in 2021 IEEE Aerospace Conference (50100), Mar. 2021, vol. 2021-March, pp. 1–9, doi: 10.1109/AERO50100.2021.9438285.
[6] X. Bai, Y. He, and M. Xu, “Low-Thrust Reconfiguration Strategy and Optimization for Formation Flying Using Jordan Normal Form,” IEEE Trans. Aerosp. Electron. Syst., vol. 57, no. 5, pp. 3279–3295, Oct. 2021, doi: 10.1109/TAES.2021.3074204.
[7] G. Di Mauro, D. Spiller, S. F. Rafano Carnà, and R. Bevilacqua, “Minimum-fuel control strategy for spacecraft formation reconfiguration via finite-time maneuvers,” J. Guid. Control. Dyn., vol. 42, no. 4, pp. 752–768, 2019, doi: 10.2514/1.G003822.
[8] G. Di Mauro, D. Spiller, R. Bevilacqua, and S. D’Amico, “Spacecraft formation flying reconfiguration with extended and impulsive maneuvers,” J. Franklin Inst., vol. 356, no. 6, pp. 3474–3507, Apr. 2019, doi: 10.1016/j.jfranklin.2019.02.012.
[9] H. M. PARI and H. Bolandi, “Discrete time multiple spacecraft formation flying attitude optimal control in presence of relative state constraints,” Chinese J. Aeronaut., vol. 34, no. 4, pp. 293–305, 2021.
[10] D. Wang, B. Wu, and E. K. Poh, Satellite Formation Flying, vol. 87. Singapore: Springer Singapore, 2017.
[11] H. Rouzegar, A. Khosravi, and P. Sarhadi, “Spacecraft formation flying control under orbital perturbations by state-dependent Riccati equation method in the presence of on–off actuators,” Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng., vol. 233, no. 8, pp. 2853–2867, Jun. 2019, doi: 10.1177/0954410018787417.
[12] X. Liu, P. Lu, and B. Pan, “Survey of convex optimization for aerospace applications,” Astrodynamics, vol. 1, no. 1, pp. 23–40, Sep. 2017, doi: 10.1007/s42064-017-0003-8.
[13] D. Parente, D. Spiller, and F. Curti, “Time-suboptimal satellite formation maneuvers using inverse dynamics and differential evolution,” J. Guid. Control. Dyn., vol. 41, no. 5, pp. 1108–1121, 2018, doi: 10.2514/1.G003110.
[14] W. Wang, G. Mengali, A. A. Quarta, and J. Yuan, “Distributed adaptive synchronization for multiple spacecraft formation flying around Lagrange point orbits,” Aerosp. Sci. Technol., vol. 74, pp. 93–103, 2018, doi: 10.1016/j.ast.2018.01.007.
[15] B. Shasti, A. Alasty, and N. Assadian, “Robust distributed control of spacecraft formation flying with adaptive network topology,” Acta Astronaut., vol. 136, no. October 2016, pp. 281–296, Jul. 2017, doi: 10.1016/j.actaastro.2017.03.001.
[16] H. Liu, Y. Tian, F. L. Lewis, Y. Wan, and K. P. Valavanis, “Robust formation flying control for a team of satellites subject to nonlinearities and uncertainties,” Aerosp. Sci. Technol., vol. 95, p. 105455, Dec. 2019, doi: 10.1016/j.ast.2019.105455.
[17] Y. Guo, J. Zhou, and Y. Liu, “Distributed RISE control for spacecraft formation reconfiguration with collision avoidance,” J. Franklin Inst., vol. 356, no. 10, pp. 5332–5352, Jul. 2019, doi: 10.1016/j.jfranklin.2019.05.003.
[18] D. Lee, “Nonlinear disturbance observer-based robust control for spacecraft formation flying,” Aerosp. Sci. Technol., vol. 76, pp. 82–90, May 2018, doi: 10.1016/j.ast.2018.01.027.
[19] G. Gaias and S. D’Amico, “Impulsive Maneuvers for Formation Reconfiguration Using Relative Orbital Elements,” J. Guid. Control. Dyn., vol. 38, no. 6, pp. 1036–1049, Jun. 2015, doi: 10.2514/1.G000189.
[20] M. Chernick and S. D’Amico, “New Closed-Form Solutions for Optimal Impulsive Control of Spacecraft Relative Motion,” J. Guid. Control. Dyn., vol. 41, no. 2, pp. 301–319, Feb. 2018, doi: 10.2514/1.G002848.
[21] B. L. Wu, D. W. Wang, and E. K. Poh, “Energy-optimal low-thrust satellite formation manoeuvre in presence of J2 perturbation,” Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng., vol. 225, no. 9, pp. 961–968, 2011, doi: 10.1177/0954410011408659.
[22] A. D. Ogundele, “Approximate analytic solution of nonlinear Riccati spacecraft formation flying dynamics in terms of orbit element differences,” Aerosp. Sci. Technol., vol. 113, p. 106686, 2021, doi: 10.1016/j.ast.2021.106686.
[23] J. C. H. Christopher, “Watkins and peter dayan,” Q-Learning. Mach. Learn., vol. 8, no. 3, pp. 279–292, 1992.
[24] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[25] L. Buşoniu, T. de Bruin, D. Tolić, J. Kober, and I. Palunko, “Reinforcement learning for control: Performance, stability, and deep approximators,” Annu. Rev. Control, vol. 46, no. xxxx, pp. 8–28, 2018, doi: 10.1016/j.arcontrol.2018.09.005.
[26] S. G. Khan, G. Herrmann, F. L. Lewis, T. Pipe, and C. Melhuish, “Reinforcement learning and optimal adaptive control: An overview and implementation examples,” Annu. Rev. Control, vol. 36, no. 1, pp. 42–59, 2012, doi: 10.1016/j.arcontrol.2012.03.004.
[27] D. Bertsekas, Dynamic programming and optimal control: Volume I, vol. 1. Athena scientific, 2012.
[28] D. Bertsekas, Reinforcement learning and optimal control. Athena Scientific, 2019.
[29] Y. Yang, Y. Wan, J. Zhu, and F. L. Lewis, “H ∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs,” IEEE Control Syst. Lett., vol. 5, no. 1, pp. 175–180, Jan. 2021, doi: 10.1109/LCSYS.2020.3001241.
[30] C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems with Unknown Dynamics,” IEEE Trans. Automat. Contr., vol. 64, no. 11, pp. 4423–4438, Nov. 2019, doi: 10.1109/TAC.2019.2905215.
[31] N. Li, I. Kolmanovsky, and A. Girard, “LQ control of unknown discrete-time linear systems—A novel approach and a comparison study,” Optim. Control Appl. Methods, vol. 40, no. 2, pp. 265–291, 2019, doi: 10.1002/oca.2477.
[32] S. A. A. Rizvi and Z. Lin, “Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem,” IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 5, pp. 1523–1536, 2019, doi: 10.1109/TNNLS.2018.2870075.
[33] S. Ali Asad Rizvi and Z. Lin, “Model-Free Global Stabilization of Discrete-Time Linear Systems with Saturating Actuators Using Reinforcement Learning,” Proc. IEEE Conf. Decis. Control, vol. 2018-Decem, no. Cdc, pp. 5276–5281, 2019, doi: 10.1109/CDC.2018.8618941.
[34] S. Ali, A. Rizvi, and Z. Lin, “Output Feedback Optimal Tracking Control Using Reinforcement Q-Learning,” Proc. Am. Control Conf., vol. 2018-June, no. 2, pp. 3423–3428, 2018, doi: 10.23919/ACC.2018.8430997.
[35] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and Autonomous Control Using Reinforcement Learning: A Survey,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 6, pp. 2042–2062, 2018, doi: 10.1109/TNNLS.2017.2773458.
[36] X. Li, L. Xue, and C. Sun, “Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm,” Neurocomputing, vol. 314, pp. 86–93, 2018, doi: 10.1016/j.neucom.2018.05.111.
[37] M. Zheng, Y. Wu, and C. Li, “Reinforcement learning strategy for spacecraft attitude hyperagile tracking control with uncertainties,” Aerosp. Sci. Technol., vol. 119, p. 107126, Dec. 2021, doi: 10.1016/j.ast.2021.107126.
[38] X. Wang, P. Shi, C. Wen, and Y. Zhao, “Design of Parameter-self-tuning Controller Based on Reinforcement Learning for Tracking Non-cooperative Targets in Space,” IEEE Trans. Aerosp. Electron. Syst., vol. 9251, no. c, pp. 1–1, 2020, doi: 10.1109/taes.2020.2988170.
[39] J. Broida and R. Linares, “Spacecraft rendezvous guidance in cluttered environments via reinforcement learning,” Adv. Astronaut. Sci., vol. 168, pp. 1777–1788, 2019.
[40] F. Sun and K. Turkoglu, “Reinforcement learning based continuous-time on-line spacecraft dynamics control: Case study of NASA SPHERES spacecraft,” AIAA Guid. Navig. Control Conf. 2018, no. 210039, pp. 1–11, Jan. 2018, doi: 10.2514/6.2018-0859.
[41] S. Silvestrini and M. R. Lavagna, “Spacecraft Formation Relative Trajectories Identification for Collision-Free Maneuvers using Neural-Reconstructed Dynamics,” no. January, pp. 1–14, 2020, doi: 10.2514/6.2020-1918.
[42] M. Shirobokov, S. Trofimov, and M. Ovchinnikov, “Survey of machine learning techniques in spacecraft control design,” Acta Astronaut., vol. 186, no. April, pp. 87–97, Sep. 2021, doi: 10.1016/j.actaastro.2021.05.018.
[43] A. Scorsoglio, A. D’Ambrosio, L. Ghilardi, B. Gaudet, F. Curti, and R. Furfaro, “Image-Based Deep Reinforcement Meta-Learning for Autonomous Lunar Landing,” J. Spacecr. Rockets, pp. 1–13, 2021.
[44] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley & Sons, 2012.
[45] J. Sullivan, S. Grimberg, and S. D’Amico, “Comprehensive survey and assessment of spacecraft relative motion dynamics models,” J. Guid. Control. Dyn., vol. 40, no. 8, pp. 1837–1859, 2017, doi: 10.2514/1.G002309.
[46] S. A. Schweighart and R. J. Sedwick, “High-Fidelity Linearized J Model for Satellite Formation Flight,” J. Guid. Control. Dyn., vol. 25, no. 6, pp. 1073–1080, Nov. 2002, doi: 10.2514/2.4986.
[47] F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, 2012.
[48] D. Vrabie, K. G. Vamvoudakis, and F. L. Lewis, Optimal adaptive control and differential games by reinforcement learning principles. 2012.
[49] P. Werbos, “Approximate dynamic programming for realtime control and neural modelling,” Handb. Intell. Control neural, fuzzy Adapt. approaches, pp. 493–525, 1992.
[50] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007, doi: 10.1016/j.automatica.2006.09.019.