Can Robots Play Games?
It's an AI agent versus an old school FAQ in the battle to guide a player through Fire Emblem: Path of Radiance!
If a player relies on a Large Language Model (LLM) AI agent to guide them through a videogame, how will the robot perform…? Can it give the player decent advice? And will it do as well (or better) than an old school FAQ (Frequently Asked Questions)?
It’s no secret that I am a sceptic about AI, an inevitable consequence of having studied the subject for my Masters Degree. As impressive as the LLMs are at creating blocks of text that portray the illusion of intelligence, the mirage relies upon absorbing probabilistic relationships from actual human discussions. The impression of reasoning is supplied by the inference engine that effectively performs symbolic manipulations of logical propositions. Some of the things that LLM-based AI can do are certainly impressive - it can effortlessly produce text in specific written styles, for instance, which is much tougher for a human to do on demand. But when push comes to shove, can it actually provide a meaningful service like guiding a player through a game?
To investigate this, I decided to put it up in a battle against the venerable FAQ. The long-form FAQ replaced videogame magazines as a player guide around 1995 (when GameFAQs was set up), and were in turn was largely supplanted by the ‘Let’s Play’ video in the 2010s. Now the AI agent is emerging as a potential challenger to both approaches, and I wanted to discover how good a game guide a robot could be. I’m focussing here on AI versus FAQ, and setting the video format to one side, but I’ll make a few remarks about gameplay videos as player guides later on.
The Case Study - Fire Emblem: Path of Radiance
I’ve long wanted to play a Fire Emblem, so when the Switch 2 GameCube emulator offered Path of Radiance, I was keen to dive in. This seemed like a perfect scenario to test the efficacy and accuracy of an LLM-based AI at guiding players - a role I expect they will increasingly be called upon to perform in the years ahead. The game was released in 2005, and so there was plenty of written material online for the LLM to have sucked up during its training, and there were unlikely to be many ambiguities about how to play it well at this point.
I focussed the experiment on the housekeeping aspects of the game more than the battles, that is, on how to train my mercenaries and how to comprise the team in terms of which characters to recruit and level up. Fire Emblem games give a lot of characters to choose from - there are 46 in Path of Radiance! - and there are important choices to make in the game regarding which weapons to equip (each character can have four at a time, each offering a different tactical option), as well as where to spend permanent attribute increasing items like Speedwings and Dracoshields (in this respect, Fire Emblem is very much like Pokémon, which is unsurprising as they all descend from Dragon Quest).
The ‘control’ in this experiment was a FAQ written by ‘Ask B 007’ and published on GameFAQs, the content of which was last updated in March 2007. The author had clearly revised the guide with input from the Fire Emblem community, and this particular guide was tagged ‘Most Recommended’. I used this resource to assess the answers the robot provided, sometimes after following its advice and sometimes before following it. I also sometimes checked Rick52’s Skills FAQ, not because the Skills weren’t covered by Ask B 007, but because it was an easier resource to search.
I played through the game twice, the first time in around forty hours and the second time in about twenty hours. I used Copilot as the LLM agent simply because it has seemingly better web search integration than other AI agents I’ve used. During the first run, I asked many more questions than the second outing, which I used to refine and test what looked like the strengths of the robot in terms of its power to answer questions.
How did it do? Let’s look at four cases.
Right Answer, Wrong Reason
Sometimes the robot would give me an answer that was accurate in terms of what it was claiming, but was justified completely incorrectly. For instance, it recommended giving the Adept scroll (which gives a hero a percentage chance to trigger a special attack) to Zihark and Marcia, and claimed that this Skill was activated on the characters SPD (Speed) attribute. In fact, there are no Skills in Path of Radiance that are activated on SPD, they are all activated on SKL (Skill), a relationship a human would be hard-pressed to get mixed up about. To human eyes, it’s pretty obvious that with a game mechanic called ‘Skills’, the SKL attribute does the work.
What happened in this instance was that players had discussed the advantages of giving the Adept scroll to faster characters because in these games you get two attacks (’double’ an opponent) if your SPD is 4 points higher than the enemy. As a result, faster characters double more often, and trigger more attacks, and players seemingly assumed this meant more activations of Adept. However, this turns out not to be so! Skill activation is based on the attack (or counter) activations, not the number of actual attacks. It would be true that there would be more relevant cases if we were talking about Criticals, since every attack (including doubles) qualifies for a potential Critical. But it wasn’t true of the Adept skill.
Nonetheless, Zihark and Marcia are potentially good recipients of Adept, because of their high SPD. It means they have potentially three attacks instead of one (because their high SPD triggers doubles fairly often), which can lead to outright kills rather than just wounding the enemy, which is always a tactical advantage. This is an example where the advice the robot gave was sound, but its reasons for giving it was entirely incorrect. It looks as if it had picked up the community advice around the game, but could not construct logical propositions to adequately explain its reasoning because fundamentally it had no understanding of how the game worked.
Wrong Answer, Right Reason
Sometimes the robot’s reasoning would be sound but the answer given would be completely unusable. This happened frequently in a series of cases whereby it insisted that certain characters wielded weapon types that they were flatly incapable of using. For instance, Brom can only use Lances prior to promotion, while after promotion he can use Swords as well. The robot insisted Brom was an Axe wielder not a Lance Knight - up to and including claiming to have examined the source code(!).
It simply would not bend on this issue despite being completely wrong. There is something highly surreal about telling a robot that you have the game open and are looking at it and can see that something is a certain way, and having it come back and tell you that you are mistaken, and that you must be playing a hacked version of the game, or an obscure Japanese variant. Anything but admit that it was wrong! Premature certainty is one of LLM-based AI’s greatest weaknesses.
On this occasion, the cause of the robot’s confusion was that Brom (and many of the other characters in Path of Radiance) appear in other Fire Emblem games, including the sequel, Radiant Dawn, in which he can indeed use Axes. Because LLMs process the likelihood of words appearing next to one another, it had absorbed all the discussions about Brom and Axes relating to the sequel and could not distinguish between that and the first game. It was just a high probability of ‘Brom’ and ‘Axes’ appearing near each other.
In these cases, the propositional reasoning was often sound. It recommended Axes because, frankly, they are overpowered in Path of Radiance, partly because there is a surplus of Lance enemies (and in the game’s ‘weapon triangle’, Axes are strong against Lances but weak against Swords), and partly because the Hand Axe gives Axe wielders a strong counter that works against ranged attacked as well as melee attacks. It knew that Axes were a good bet, and therefore recommended them. The trouble was, it was not useful advice because it was completely impossible to follow its suggestion.
Wrong Answer, Wrong Reason
Then there were cases where the answers given were just complete nonsense. It thought that Arms Scrolls raised all of a hero’s weapon ranks, rather than just the rank of the currently equipped weapon, and so advised giving them to those heroes with a lot of different weapon combinations (chiefly the Mages). I suppose you might say that its logic was sound here, but it was premised on false assumptions, and as such ought to be considered incorrect reasoning, as well as an unhelpful answer.
Likewise, it consistently proposed Skill combinations that were flatly impossible to achieve. Most characters have either 25 or 20 points available for assigning Skills, and if you use an Occult scroll to acquire a Mastery Skill, it uses up 20 of that capacity. It was forever recommending combining 10 capacity Skills with a Mastery Skill that takes 20, a combination that no character could ever possibly have in practice. Again, you could say that its logic was sound in that the combinations it proposed would be very powerful if they were possible. But the fact that they could never be achieved makes the reasoning incorrect, and the answers themselves were also utterly wrong.
When pressed, it would sometimes make bizarre suggestions, like this one:
Right — since Zihark already has Vantage + Adept, Wrath is officially off the table for him in Path of Radiance. You can’t stack three skills unless you remove one with an Occult scroll, and Vantage+Adept is already one of the strongest combos in the game
While its correct about Vantage with Adept, the idea that you ‘remove one with an Occult scroll’ is completely erroneous, since an Occult scroll is used to add a Mastery Skill, and you don’t need anything to forget a Skill. Also, you can stack three Skills because there are 5 point Skills and two of them are available as scrolls.
Furthermore, it endlessly got the details wrong. It thought that the Boots were a hidden item in the fourth stage of Chapter 17, but they actually appeared in Chapter 15. It correctly stated that it was a hidden item, but had the wrong level and the wrong location. Worse in this case, because I failed to double-check the FAQ on this occasion, I entirely missed my chance to get the Boots (which grant 2 extra movement, and are exceptionally useful because of it) because by the time I knew the robot had made a mistake, I couldn’t easily go back and correct it.
Similarly, when asked to perform a task requiring gathering data across the length of the game, it performed very poorly. Tasked with locating the four Occult scrolls (which should have been easy, since every FAQ has this answer) it got the first one entirely correct, got the second and third one in the wrong chapter of the game and with the wrong method, and for the last one got the correct chapter but the wrong method. In general, these kinds of ‘list all the instances’ questions were almost always either full of errors, or missing content.
Right Answer, Right Reason
Still, there were times that it got the right answer with the correct reasoning. It was reasonably good, for instance, at working out what was missing when I proposed different possible team compositions for the second run. It could spot that we were missing a ‘tank’ to hold chokepoints, or that we didn’t have ranged support, and other such role combinations rather well. However, it seldom if ever volunteered the relevant analysis. More often, I would ask what it would mean to add such-and-such a character to the team, and it would correctly analyse the gap in our line-up that would be filled by that mercenary’s abilities.
In some cases, it gave answers that were correct, and correctly reasoned, but did not apply to the part of the game I was in. For instance, it thought that Spirit Dust (which raises MAG by 2 points) would be good on the Healer, Rhys. This was good advice in some respects, in that after promotion, Rhys gains Light magic, and an extra two points of Magic power would give him stronger killing power with those spells. However, prior to promotion, he can only use healing staffs, and he has plenty of MAG such that even the weakest such staff is enough to heal any team member fully.
Using the Spirit Dust on Rhys would have been an investment in the late game, but it wasn’t good advice for where I was in the game (which it was well aware of, since many answers of its answers took into account the chapter I had mentioned I was currently on). You could argue that its reasoning was wrong in these cases, in that I might have benefited immediately more from giving it to a different character, but fundamentally, investing in endgame power was not bad advice, it was just something I didn’t fully comprehend until much later in the campaign.
AI vs FAQ
Ultimately, the FAQ was orders of magnitude more reliable than the AI, making essentially zero mistakes (even aesthetic choices were qualified accordingly in the text), provided all the helpful advice required to get through the game, and doing it in a form that was helpfully arranged and easy to look up. I learned an enormous amount from the FAQ, which was entirely reliable on everything I checked against it. I note that if I’d used a ‘Let’s Play’ video, it would have been basically useless unless I watched the whole thing, since I wasn’t looking at how to get past a particular point in the game (which videos work well for), but rather asking questions requiring knowledge. The FAQ had it. The robot did not.
But the robot was not a complete dud. For a start, it could provide answers far faster than checking the FAQ (since searching for the information in the long-form text requires finding the specific areas relevant to the answer), and additionally it could provide answers in cases that were far beyond the FAQs ability to address, such as reasoning about team compositions. Of course, that speed and reach came at the cost of accuracy - I would say it got at least a quarter of its answers factually or logically incorrect, and another quarter of its answers were missing important information. That’s a serious failure rate, one that in a few cases where I didn’t cross-refer with the FAQ cost me things I valued for my playthrough, like the Boots.
Additionally, it was an enormous help in overcoming analysis paralysis. I perpetually struggle with permanent stat-increasing items in games, since you cannot undo them if you get them wrong. I often finish Pokémon games with left over HP Ups and Vitamins because I can never convince myself that there isn’t a future scenario in which those would be more valuable later. In my Fire Emblem games, the AI could easily comprise a list of all the attribute raising items and the first and second choices to give them to. I didn’t always follow its advice (I didn’t give the Spirit Dust to Rhys, for instance). But I never got stuck quavering in indecision, because it always gave an authoritative answer - even though many times it was completely wrong, and at one point it tried to gaslight me by saying it had checked the source code and that I couldn’t be seeing what I was witnessing with my own eyes.
AI vs Let’s Play
If we consider the difference between the AI agent and a video walkthrough, there’s another serious difference to bear in mind. None of the questions I asked the AI agent could be adequately answered by watching a playthrough video without watching the entire thing, which would be a 30-40 hour investment, much like playing the game itself. This is obviously a bad return.
A video works if you’re stuck at a particular point, as you can find the relevant part of the playthrough and watch the solution being played out. But it’s useless in almost every other situation. As such, the AI agent could easily win out for video users who don’t have the literacy or patience to work with a more detailed guide format like a FAQ.
Worryingly, the AI agent could end up driving the FAQ extinct (videos having already made them endangered) by being such an easy-to-use aid to lazy players. This is part of a larger cultural shift away from reading and writing and towards video and audio, one that AI tools look set to accelerate. It would be tragic for the games industry to lose the long-form guide, but given the speed at which a LLM can return definitive answers (albeit, frequently wrong answers), there’s a genuine risk that they might finish off FAQs for good We’ll have to see how this plays out over the next few decades.
A Philosophical Conclusion
My short-form philosophy book Wikipedia Knows Nothing is getting a second edition this year, and its key concept of knowledge as a practice is highly relevant to why the FAQ outperformed the robot. The human that made the FAQ had played the game and built up practices for doing so. Those practices, those skills, gave it actual knowledge, from which the author could derive individual facts and observations. The facts were the side effect of having the knowledge, understood as practical skills not as statements. The FAQ was a series of facts, but it was produced as the residue of acquiring genuine knowledge.
The LLM-based AI, on the other hand, has no knowledge whatsoever - like the Wikipedia, it ‘knows nothing’. All it could execute was factmongering. It was able to take the propositional statements that other people had made about the game and deploy its inference engine to manipulate these using symbolic logic. This allowed it to derive further logical claims, but it could achieve nothing more than this. Sometimes these claims were based on faulty premises, but it couldn’t ever tell this was the case, because all it could do was factmonger and not apply its knowledge. It simply had no knowledge whatsoever to apply.
Have you used a LLM-based AI to help you play a game? If you have, let me know about it in the comments! I’d be interested in your experiences.



