Will AI be able to solve confusing but elementary geometric reasoning problems in 2024?

Experimental log for resolving the Manifold Market on this question follows.

All of the following was tried late on 2024-12-31 or a small number of hours into the morning of 2025-01-01, with exceptions in brackets for things tried more than a small number of hours into 2025, since the point was to assess AI capabilities at the end of 2024.

Problem 0 (original)

I have a geometric reasoning problem for you:

  1. Draw a circle
  2. Draw a triangle mostly inside the circle but with a corner poking out
  3. Now draw a teeny-tiny square fully inside the part of the triangle poking out of the circle
  4. Draw a line segment from the center of that square to the center of the circle

What is the minimum and maximum number of line intersections in that drawing?

Answer: Min 4, max 5. That’s 2 intersections for the triangle poking out, 1 for exiting the square, 1 for entering the circle. If the triangle contains the center of the circle, we’re done at 4. If it doesn’t we get one more intersection exiting the triangle.

Trials:

  1. o1: correct, mostly (I had to nudge it to agree that a line exiting a triangle right at the corner still only counts as 1 intersection. I would say it was wrong to say otherwise but not wrong in a way that betrays a failure to understand the geometry.)
  2. o1 pro: wrong (it said 3 and 4, mostly understanding but, charitably, forgetting the intersection with the square)
  3. o1 pro: wrong (it said 4 and 6, thinking the line could cross the triangle twice)
  4. o1: correct
  5. o1 pro: wrong (similar to trial 3)
  6. o1: wrong (said 3 and 10, and then 4 and 6 with coaxing)
  7. o1: wrong (said 3 and 5, missing the square-line intersection and thinking there could be 2 line-triangle intersections)
  8. o1: wrong (said 4 and 6, similar to trial 3)
  9. o1 pro: wrong (4 and 10 with all kinds of wrongness)
  10. o1 pro: wrong (3 and 10)
  11. o1 pro: wrong (similar to trial 3)
  12. o1: wrong (3 and 6)

o1: 2 out of 6 correct
o1 pro: 0 out of 6 correct
combined: 2 out of 12 correct

Problem 1

Here’s a geometric reasoning puzzle for you:

  1. Draw a big square
  2. Draw a skinny rectangle with one short edge fully inside the square and the other fully outside the square
  3. Draw a tiny circle fully inside the rectangle and fully outside the square
  4. Draw a line (technically a ray) starting at the center of the circle, going through the center of the square, and then continuing infinitely

What is the minimum and maximum number of line intersections in the drawing? Count both the intersections the ray makes with the shapes and the intersections of the shapes with each other.

Answer: Min 6, max 6. That’s 2 intersections for the square and rectangle, 1 to exit the circle, 2 to enter and exit the square, and 1 to exit the rectangle.

This may be slightly easier since there’s no way to change the number of intersections.

Trials:

  1. o1: wrong but got there with a bunch of coaxing
  2. o1: wrong (6 and 8, but it seemed to at least understand when I explained why 8 was wrong)
  3. o1 pro: wrong (similar to trial 2)
  4. o1 pro: wrong (6 and 8 but then got the right answer from a single “are you sure?”) [2025-01-01]

Problem 2 (ever-so-slightly harder variant of problem 1)

Here’s a geometric reasoning puzzle for you:

  1. Draw a big square
  2. Draw a skinny rectangle with one short edge fully inside the square and the other fully outside the square
  3. Draw a tiny circle fully inside the rectangle and fully outside the square
  4. Draw a big circle fully outside all that — not touching, overlapping, or containing anything drawn so far
  5. Draw a line (technically a ray) starting at the center of the tiny circle, going through the center of the square, and then continuing infinitely

What is the minimum and maximum number of line intersections in the drawing? Count both the intersections the ray makes with the shapes and the intersections of the shapes with each other.

Answer: Min 6, max 8. Same as problem 1 but the ray may or may not hit the big circle.

Trials:

  1. o1: correct
  2. o1 pro: correct [2025-01-01]

Problem 3

Draw an equilateral triangle pointing up. Next draw an identical one pointing down and put them on top of each other so the point of one is at the midpoint of the base of the other. Notice that the height of the resulting triangle mish-mash is exactly the same as the height of the triangles. Considering just the outer boundary (not the convex hull) of what we’ve drawn, what kind of polygon (if any) is it?

Answer: It’s an irregular hexagon, like a rectangle with the sides dented in. It’s not a hexagram, like you get if you let the tips of the triangles poke out past each other.

This is pretty easy if you don’t jump to the hexagram conclusion.

Trials:

  1. o1: wrong (it said a convex hexagon or a rhombus even with lots of coaxing)

Problem 4

  1. Draw an equilateral triangle pointing up
  2. Make an identical one but pointing down and put them on top of each other so the point of one is at the midpoint of the base of the other

How many regions have we partitioned the plane into?

To clarify “regions”: if we draw a single square, say, that’s 2 regions — inside and outside. If we draw the diagonals of the square, that’s 4 regions inside the square, plus the outside is 5 regions.

Answer: 6 because we have what looks like two hourglasses next to each other, so we have 4 for the hourglass halves, 1 for the area between them, and 1 for the outside.

Trials:

  1. o1: wrong (it thought it makes a hexagram)

Problem 5

  1. Draw a rectangle twice as wide as it is high
  2. Connect opposite corners with straight lines to make an X
  3. For each of the four resulting triangles with corners at the center of the rectangle, make a mirror copy of it, reflecting along the side that the triangle and rectangle share

Considering just the outermost boundary of what we’ve drawn, what is the resulting shape? It may help to draw it on graph paper to make sure you’re getting the symmetries right. And if you’re not certain, try first with a square instead of a rectangle — the same basic thing happens either way.

Answer: A rhombus, so 4 sides, not 8, because the sides of the adjacent triangles are colinear.

Trials:

  1. o1: wrong (it thought it was a concave octagon) but got there with a little coaxing (“did you try with a square first?” and “i see why you might think that but if you try drawing it…”)

Problem 6

[via Floris van Doorn]

Draw a square in a plane. Now draw an equilateral triangle in the same plane, overlapping (not just abutting) the square, such that one side of the triangle and one side of the square lie on the same line, and moreover these two sides have the same midpoint. Consider the shape formed by both the triangle and the square combined. The number of vertices of this shape depends on the relative sizes of the square and the triangle. What are the possible number of vertices this shape can have?

Answer: 4, 6, 9, 3: 4 is what you get if you start with the triangle wholly inside the square, and then if you gradually grow the triangle, you get 2 extra vertices when the base sticks out, then 3 additional ones (9 total) when the base and tip stick out, and finally 3 when the triangle subsumes the square.

Trials:

  1. o1: wrong (it understood how 4 and 3 happen but seemed hopelessly confused about the other cases)

Problem 7

[via Floris van Doorn]

Draw a square in a plane. Now draw an equilateral triangle in the same plane that does not overlap with the square, but touches it. One side of the triangle and one side of the square lie on the same line, and moreover these two sides have the same midpoint. Consider the shape formed by both the triangle and the square combined. The number of vertices of this shape depends on the relative sizes of the square and the triangle. What are the possible number of vertices this shape can have?

Answer: 5 and 7. If the triangle’s side is either smaller or larger than the square, you get all 4 of the square’s vertices as well as all 3 of the triangles. If the sides match up you get a house shape with 5 vertices.

Trials:

  1. o1: wrong (it mostly understood but came up with 5, 6, 7)

Problem 8

Draw a normal up arrow, like two sides of a triangle and then a much longer line that bisects that angle. Call the long line L and the short lines A and B. Now draw two tiny circles symmetrically on either side of L and just touching the ends of A and B at the tops of the circles. Next, draw a line connecting the centers of the circles. Finally, draw two lines from the centers of the circle to the far end of L, so both these lines meet at the opposite side from the arrowhead.

How many regions have we divided the plane into?

To clarify “regions”: if we draw a single square, say, that’s 2 regions — inside and outside. If we draw the diagonals of the square, that’s 4 regions inside the square, plus the outside is 5 regions.

Answer: 9. That’s 4 triangle-ish regions inside the kite-like thing we’ve drawn, 2 regions in each of the 2 circles, plus the outside.

Trials:

  1. o1: wrong

Problem 9

[DM’d from Odoacre on Manifold]

Draw an equilateral triangle in the plane. Now draw a square in the same plane, abutting but not overlapping the triangle: one side of the triangle and one side of the square lie on the same line, and moreover these two sides have the same midpoint. Now pick another side of the triangle and add another square abutting it in the same way, but not allowing the two squares to overlap. (For example, if both squares are tiny compared to the triangle, we have a triangle with two square bumps sticking out.) Consider the shape formed by the triangle and the two squares combined as a whole. The number of vertices of this shape depends on the relative sizes of the square and the triangle. What are the possible number of vertices this shape can have?

Answer: Two tiny squares is one possible case, yielding 11 vertices. Two other cases are one or both squares having equal side length as the triangle. Those yield 7 and 9 vertices. Finally, one of the squares can have bigger side length than the triangle, in which case the other square must either be smaller or the same side length as the triangle, yielding 11 and 9 vertices. So all in all, that’s 7, 9, or 11 vertices.

Trials:

  1. o1: wrong (5, 6, 7)
  2. o1 pro: wrong (7, 8, 9, 10 and then with coaxing it understands why 11 is one of the answers but still makes other errors)