This project implements a Deep Q-Learning algorithm to simulate lunar landing. The objective is to develop an AI agent capable of landing a lunar module safely on the moon's surface. The implementation is based on the Gymnasium library and utilizes PyTorch for the neural network architecture.
- Python 3.x
- Gymnasium
- PyTorch
- NumPy
To install the required packages, you can use the following commands:
pip install gymnasium
pip install "gymnasium[atari, accept-rom-license]"
apt-get install -y swig
pip install gymnasium[box2d]
pip install torch
pip install numpy
The project consists of the following sections:
- Installing the required packages and importing the libraries: This part includes commands to install necessary libraries and packages.
- Building the AI: This part involves creating the architecture of the neural network.
- Training the AI: This part covers the training loop, including how the agent interacts with the environment, stores experiences, and updates its knowledge through training.
- Evaluation: This part evaluates the performance of the trained model.
To run the project, execute the Python script:
python deep_q_learning_for_lunar_landing.py
Ensure you have all the required libraries installed as mentioned in the installation section.
Deep Q-Learning is a reinforcement learning algorithm that uses a neural network to approximate the Q-value function. The key components include:
- Q-Value: Represents the expected future rewards for an action taken in a given state.
- Experience Replay: Stores the agent's experiences to break the correlation between consecutive samples.
- Target Network: A separate network to stabilize training by keeping a fixed Q-value target.
The neural network consists of:
- An input layer that takes the state representation.
- Hidden layers with activation functions.
- An output layer that outputs Q-values for each action.
- Initialize the environment and the network.
- Interact with the environment to collect experiences.
- Store experiences in a replay buffer.
- Sample a mini-batch of experiences from the replay buffer.
- Calculate the loss using the difference between predicted Q-values and target Q-values.
- Backpropagate the loss and update the network weights.
- Periodically update the target network.
After training, the AI agent should be able to land the lunar module safely. The performance can be evaluated based on the average reward per episode and the number of successful landings.