An AI Agent to Solve Raven's Progressive Matrices

Artificial Intelligence is often measured against human cognition. One way to benchmark this is through solving Raven’s Progressive Matrices (RPM)—a test of abstract reasoning and problem-solving. For my project, I developed an AI agent capable of solving these visual puzzles using Python and image-processing techniques.

What are Raven’s Progressive Matrices?

Raven’s Progressive Matrices are a type of intelligence test made up of a grid (typically 2x2 or 3x3) of images. Each puzzle asks the solver to identify the missing image that completes the pattern in the grid. These problems test spatial reasoning and logic, making them a unique challenge for AI systems.

The RPM agent I designed uses a heuristic-driven approach inspired by Dark Pixel Ratio (DPR), a metric that calculates the difference in black pixels between images, to simulate human-like pattern recognition.

Agent Design and Methodology

The AI agent was designed with efficiency and adaptability in mind, implementing methods to solve both 2x2 and 3x3 RPM problems. Key components of the design include:

Image Preprocessing
- Grayscale and binary conversions (using Pillow).
- Removal of dithering for accurate pixel operations.
Dark Pixel Ratio (DPR)
- A heuristic-based method comparing black pixel ratios between rows, columns, and diagonals.
Additional Heuristics
- Pixel Matching: Counts and compares black-and-white pixel overlaps.
- Logical Operations: Applies AND, OR, and XOR transformations to infer patterns.
- Affine Transformations: Detects image flips and rotations (though less successful overall).

Visualizing the Process

An example of pixel-matching operations is shown below for a 3x3 problem. The agent evaluates relationships across horizontal, vertical, and diagonal patterns to identify the correct solution:

Key Achievements

Solved a wide variety of RPM problems, demonstrating adaptability across varying complexity levels.
Successfully tackled more abstract and challenging problems using an innovative blend of heuristics.
Achieved a runtime of under 10 seconds per problem, optimizing the balance between accuracy and computational efficiency.

Challenges and Limitations

While the agent excelled in many cases, its inability to recognize shapes or apply learned knowledge from previous problems highlighted areas for improvement. The reliance on pixel-based heuristics, though efficient, led to some errors in harder problem sets.

Key challenges included:

Addressing visual similarities between incorrect answers.
Enhancing accuracy for more abstract problems, where patterns were less pixel-reliant.

How This Project Mirrors Human Cognition

Interestingly, the agent mimics human problem-solving by analyzing patterns across rows, columns, and diagonals. However, humans tend to rely on shape recognition and prior experience, while the agent depends purely on pixel ratios and logical operations. Despite these differences, the agent achieved high efficiency, solving problems in seconds compared to a human’s minutes.

Key Takeaways

This project reinforced my understanding of AI problem-solving and computer vision techniques. By combining heuristics, algorithmic logic, and Python’s image-processing capabilities, I was able to design an agent that demonstrated a blend of creativity and precision.

Through this work, I’ve deepened my expertise in:

Designing AI systems for abstract reasoning.
Applying computer vision techniques for real-world problem-solving.
Iteratively improving performance through data-driven optimization.

By working on this agent, I gained a greater appreciation for the interplay between human cognition and AI design, pushing me closer to my goal of creating AI systems that think more like us.