In working on a post for tomorrow on whether Large Language Models like GPT-4 and Bard-2 have beliefs, I asked GPT-4 what I thought would be a not-too-hard question about chemistry: "What element is two to the right of manganese on the periodic table?" It crashed, burned, and exploded on the spot, giving two different wrong answers foot on tail, without noticing the contradiction:
These results aren't surprising if you consider what sort of sentences you would expect to predominate in the training data. Vertical relationships are important in chemistry, but horizontal relationships aren't especially meaningful, so you would expect it to do a better job on vertical rather than horizontal movements. (I think omitting seaborgium is OK, since seaborgium compounds wouldn't be prominent in discussions of group 6 chemistry.)
The diagonal relationship you asked about, "down and to the left," is also not a chemically meaningful relationship — however, "down and to the right" (or "up and to the left") can be a chemically meaningful one:
These results aren't surprising if you consider what sort of sentences you would expect to predominate in the training data. Vertical relationships are important in chemistry, but horizontal relationships aren't especially meaningful, so you would expect it to do a better job on vertical rather than horizontal movements. (I think omitting seaborgium is OK, since seaborgium compounds wouldn't be prominent in discussions of group 6 chemistry.)
The diagonal relationship you asked about, "down and to the left," is also not a chemically meaningful relationship — however, "down and to the right" (or "up and to the left") can be a chemically meaningful one:
https://en.wikipedia.org/wiki/Diagonal_relationship
I predict the LLMs should be expected to do a better job on these sorts of diagonal relationships than on on the ones you were asking about.