Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- so I went and looked at that game theory (study of field) textbook that had algorithms to solve payoff matrices
- and I took the time to actually understand the estimation algorithm
- turns out the estimation algorithm is very very very computationally feasible on a gba
- you could probably run it even on a gameboy
- so, now that actually solving a payoff matrix on a game boy is possible, this is how you would design a slightly more complex AI:
- - firstly, we need to explain payoff matrices:
- - this is an example of a payoff matrix: https://cdn.bulbagarden.net/upload/c/ca/GameTheory2.png
- - player 1 is represented by the vertical strategies, while player 2 is represented by the horizontal strategies
- - (note: the vertical move user is tyranitar (you), and the horizontal move user is alakazam (opponent))
- - each cell represents the payoff (in this case the effective base power of the move) each player gets based on the strategy that they both choose
- - for example, if tyranitar chooses earthquake and alakazam chooses HP fire, then player 1 would have a base power of 100 and player 2 would have a base power of 35
- - there are algorithms which "solve" payoff matrices by finding the nash equilibrium, which is a state where neither player can do better by changing their own strategy if the other player does not change their strategy
- - there are two types of strategy profiles: pure strategy (always pick this strategy) and mixed strategy (pick a combination of strategies with each strategy having a probability of being picked)
- - in addition, there is something called "expected payoff", which is the payoff you would expect given your strategy
- - if both players are playing a pure strategy, then the expected payoff is exactly the payoff you'd get
- - if both players are playing a mixed strategy, then the expected payoff is the average payoff over a long (infinite) amount of games
- - for this specific example, assuming that all you care about is effective base power, the nash equilibrium would be for player 1 to always pick pursuit and for player 2 to always pick shock wave
- - however, effective base power is not the only measure of a good choice in pokemon
- - if we look at pokemon hypothetically, the only reward of a battle is winning (excluding side effects like levelling up)
- - so if we look at a hypothetical turn before the battle ends (assuming that battles are finite), you either win or lose
- - the expected payoff of "win" or "lose" is either absolute, or an average. therefore it represents the probability of winning
- - now with our probability of winning this turn, we can go back to the turn before, which each cell would have their own probabilities
- - thus, we can work our way back all the way to turn 1 to get the probability of winning
- - note that this is somewhat of an oversimplification, but I think the general concept is correct
- - of course, you wouldn't be able to search deep enough to "all possible endgame states" as you'd run out of memory
- - but this idea tells us that hypothetically, the payoffs are probabilities of winning
- - of course, the issue is that estimating the probabilities of winning from turn 1 is really hard and the probabilities might be around 50% for each action
- - and even if there was a way to get very accurate estimations of probabilities, just picking probabilities isn't the only key thing for a good AI. exploitation is what a good AI does
- - for example, there are rock paper scissors AIs which get a higher win % than what would be expected (33% win, 33% tie, 33% lose), because it exploits the fact that humans are terrible random number generators
- - e.g. try playing 50 rounds of rock paper scissors (WITHOUT USING A RANDOM NUMBER GENERATOR): https://www.afiniti.com/corporate/rock-paper-scissors
- - also, human players are not necessarily rational, are not able to evaluate a large number of options, and we don't necessarily know the most optimal estimation strategy
- - therefore, an AI that exploits would recognize common patterns that human players do and factor that into calculations
- - for example, you can assume that on turn 1, if the player has a statusing move (e.g. toxic) then they are likely to use it. made-up probabilites could be 75% status move, 25% damaging move for the player
- - an even better AI would profile the player to use their past actions as an indicator of what they would do
- - one way to do this would be to create preset profiles of how most pokemon players play games, and try to guess which profile the player matches the most
- - note that I don't have an algorithm to determine the best probabilities to pick a move given that we know how the player will choose their move, but I am guessing that it exists somewhere and is simple to implement
- - so the tl;dr is:
- - make a payoff matrix given every possible action of the player and AI, and guess the probability of winning for each combination of strategies
- - find the nash equilibrium of the payoff matrix
- - out of the possible strategies, choose each strategy with the probability provided
- - even more simplified tl;dr:
- - do something like the image except the numbers are "probability of winning"
- - the numbers don't necessarily need to be "probability of winning", if you find something that works better then use it
- sample code for the algorithm (no context so it probably won't help unless if you already have a basic understanding of game theory): https://gist.github.com/luckytyphlosion/e7c520d34dd7db6fa904d02df44a8205
- Textbook referenced: https://www.math.ucla.edu/~tom/Game_Theory/mat.pdf
- Algorithm taken from "4.7 Approximating the Solution: Fictitious Play"
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement