Logo
Please use this identifier to cite or link to this item: http://20.198.91.3:8080/jspui/handle/123456789/8942
Title: Reinforcement learning based stabilization of cart inverted pendulum system
Authors: Dutta, Sajay
Advisors: Chakraborty, Sayantan
Keywords: State Dependent Riccati Equation (SDRE);LQR;DDC;Q-learning
Issue Date: 2023
Publisher: Jadavpur University, Kolkata, West Bengal
Abstract: For the Classical Method of Control, the system must first be identified. After that, a suitable control law is developed using the system model. System identification presents many difficulties because the majority of the time the system is non-linear and the dynamics of the system are unknown. Data Driven Control (DDC) system design technique solve this issue by directly controlling the system using just only the input output data without knowing the system model. Even when the system is non-linear, Reinforcement Learning has demonstrated encouraging outcomes in regulating the systems. Inverted Pendulum is a well-known benchmark problem for developing new control strategies. There exist several traditional control methods for balancing an Inverted Pendulum like PID Controller, LQR Controller, State Dependent Riccati Equation (SDRE) Controller etc. All of these control techniques have major issues like either need to find out the values of different parameters in case of PID (𝑘􀯉, 𝑘􀯂 and 𝑘􀮽 gain constant) and LQR (Q and R), or need to solve some complex algebraic equation to get values in all steps in case of SDRE. Values of these constants may affect the system significantly. It can make the system unstable or oscillatory. The aim of this study is to implement different Reinforcement Learning algorithm to balance an inverted pendulum and compare them. Instead of using classical control algorithms that need a model of the system to be controlled, we used model-free control algorithm i.e., Q-learning, Policy Gradient, Actor-Critic. We demonstrate that reinforcement learning can be successfully used in Inverted Pendulum to balance and control it without having a detailed model. The stability control time is too long in the Qlearning Algorithm. Policy-based algorithms like Actor-Critic and Policy Gradient were also able to obtain some excellent results given the right hyperparameter set.
URI: http://20.198.91.3:8080/jspui/handle/123456789/8942
Appears in Collections:Dissertations

Files in This Item:
File Description SizeFormat 
M.E.(Control System Engineering) Sajay Dutta.pdf3.27 MBAdobe PDFView/Open


Items in IR@JU are protected by copyright, with all rights reserved, unless otherwise indicated.