Andrei Shostak CS 370: Artificial Intelligence
This project is an implementation of a pathfinding intelligent agent for a treasure hunt game. The primary goal was to develop and train an agent, represented as a pirate, to navigate an 8x8 maze environment and find the treasure in the most efficient way possible. The core of this project is the application of Deep Q-Learning, a reinforcement learning technique that utilizes a neural network to approximate the optimal action-value function. The project was developed in Python using libraries such as Keras/TensorFlow for the neural network and NumPy for data manipulation.
The repository contains the final, completed Jupyter Notebook (Shostak_Andrei_ProjectTwo.ipynb) which includes the implemented algorithm, the training process, and the final results demonstrating the agent's success.
This project involved working with a partially completed codebase to implement a sophisticated AI algorithm.
-
Given Code: I was provided with a foundational framework that included:
TreasureMaze.py: A Python class that defined the 8x8 maze environment, its rules, the reward system, and the possible actions the agent could take.GameExperience.py: A class designed to handle the "experience replay" mechanism, a critical component of Deep Q-Learning that stores and samples past experiences to train the model effectively.- A skeleton Jupyter Notebook (
TreasureHuntGame.ipynb): This included the setup, helper functions for visualization (show), a pre-defined neural network architecture (build_model), and functions to test the final model (play_game,completion_check).
-
Created Code: My primary contribution was to complete the
qtrainfunction within the Jupyter Notebook. This involved translating the theoretical Deep Q-Learning algorithm into functional code. Specifically, I implemented the main training loop where the agent:- Interacts with the maze environment on an episode-by-episode basis.
- Uses an epsilon-greedy strategy to balance between exploration (taking random actions to discover the maze) and exploitation (using its learned knowledge to make optimal moves).
- Stores each
(state, action, reward, new_state)tuple in the experience replay buffer. - Samples mini-batches of experiences from the buffer to train the neural network after every step.
- Continuously updates its policy based on the feedback from the training process, ultimately learning the optimal path to the treasure.
This project was a practical exercise in applying core computer science principles to a complex problem.
-
What do computer scientists do and why does it matter? Computer scientists design and build solutions to problems by creating models and implementing algorithms. This project perfectly illustrates that process. We took an abstract problem—finding an optimal path—and modeled it computationally with a maze, states, and rewards. We then applied a specific algorithm (Deep Q-Learning) to find a solution. This matters because the same fundamental approach can be used to solve critical real-world problems far beyond game AI, such as optimizing logistics and supply chains, discovering new medical treatments, or creating more efficient energy grids.
-
How do I approach a problem as a computer scientist? My approach has become more structured throughout this course. First, I focus on decomposition: breaking the problem down into smaller, manageable components, just as this project was divided into the environment, memory, model, and training logic. Second, I think in terms of abstraction, leveraging existing libraries and frameworks like Keras and NumPy rather than reinventing the wheel. Third, I follow an iterative development cycle: implement a version of the solution, test it, analyze the results (like the win rate), identify shortcomings (like inefficient training), and refine the implementation. This cycle of building, testing, and improving is fundamental to computer science.
-
What are my ethical responsibilities to the end user and the organization? As a computer scientist, my ethical responsibilities are paramount. This project highlighted several key areas.
- Bias and Fairness: The agent's behavior is entirely dependent on its training data and reward function. A poorly designed reward function could lead the agent to learn unintended or harmful behaviors. It is my responsibility to design systems that are fair and whose objectives align with human values.
- Transparency and Explainability: The decisions made by our neural network are largely a "black box." In a low-stakes game, this is acceptable. However, in critical applications, it would be my ethical duty to strive for systems where the decision-making process is as transparent as possible, allowing users to understand and trust the outcomes.
- Security and Privacy: While not a major component of this specific project, any AI that learns from user data carries an implicit responsibility to protect that data. It is my duty to ensure data is handled securely, with user consent, and in a way that respects privacy.