Advertisement
luckytyphlosion

Pokemon AI Model.

Jun 11th, 2019
705
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.54 KB | None | 0 0
  1. so I went and looked at that game theory (study of field) textbook that had algorithms to solve payoff matrices
  2. and I took the time to actually understand the estimation algorithm
  3. turns out the estimation algorithm is very very very computationally feasible on a gba
  4. you could probably run it even on a gameboy
  5. so, now that actually solving a payoff matrix on a game boy is possible, this is how you would design a slightly more complex AI:
  6. - firstly, we need to explain payoff matrices:
  7. - this is an example of a payoff matrix: https://cdn.bulbagarden.net/upload/c/ca/GameTheory2.png
  8. - player 1 is represented by the vertical strategies, while player 2 is represented by the horizontal strategies
  9. - (note: the vertical move user is tyranitar (you), and the horizontal move user is alakazam (opponent))
  10. - each cell represents the payoff (in this case the effective base power of the move) each player gets based on the strategy that they both choose
  11. - for example, if tyranitar chooses earthquake and alakazam chooses HP fire, then player 1 would have a base power of 100 and player 2 would have a base power of 35
  12. - there are algorithms which "solve" payoff matrices by finding the nash equilibrium, which is a state where neither player can do better by changing their own strategy if the other player does not change their strategy
  13. - there are two types of strategy profiles: pure strategy (always pick this strategy) and mixed strategy (pick a combination of strategies with each strategy having a probability of being picked)
  14. - in addition, there is something called "expected payoff", which is the payoff you would expect given your strategy
  15. - if both players are playing a pure strategy, then the expected payoff is exactly the payoff you'd get
  16. - if both players are playing a mixed strategy, then the expected payoff is the average payoff over a long (infinite) amount of games
  17. - for this specific example, assuming that all you care about is effective base power, the nash equilibrium would be for player 1 to always pick pursuit and for player 2 to always pick shock wave
  18. - however, effective base power is not the only measure of a good choice in pokemon
  19. - if we look at pokemon hypothetically, the only reward of a battle is winning (excluding side effects like levelling up)
  20. - so if we look at a hypothetical turn before the battle ends (assuming that battles are finite), you either win or lose
  21. - the expected payoff of "win" or "lose" is either absolute, or an average. therefore it represents the probability of winning
  22. - now with our probability of winning this turn, we can go back to the turn before, which each cell would have their own probabilities
  23. - thus, we can work our way back all the way to turn 1 to get the probability of winning
  24. - note that this is somewhat of an oversimplification, but I think the general concept is correct
  25. - of course, you wouldn't be able to search deep enough to "all possible endgame states" as you'd run out of memory
  26. - but this idea tells us that hypothetically, the payoffs are probabilities of winning
  27. - of course, the issue is that estimating the probabilities of winning from turn 1 is really hard and the probabilities might be around 50% for each action
  28. - and even if there was a way to get very accurate estimations of probabilities, just picking probabilities isn't the only key thing for a good AI. exploitation is what a good AI does
  29. - for example, there are rock paper scissors AIs which get a higher win % than what would be expected (33% win, 33% tie, 33% lose), because it exploits the fact that humans are terrible random number generators
  30. - e.g. try playing 50 rounds of rock paper scissors (WITHOUT USING A RANDOM NUMBER GENERATOR): https://www.afiniti.com/corporate/rock-paper-scissors
  31. - also, human players are not necessarily rational, are not able to evaluate a large number of options, and we don't necessarily know the most optimal estimation strategy
  32. - therefore, an AI that exploits would recognize common patterns that human players do and factor that into calculations
  33. - for example, you can assume that on turn 1, if the player has a statusing move (e.g. toxic) then they are likely to use it. made-up probabilites could be 75% status move, 25% damaging move for the player
  34. - an even better AI would profile the player to use their past actions as an indicator of what they would do
  35. - one way to do this would be to create preset profiles of how most pokemon players play games, and try to guess which profile the player matches the most
  36. - note that I don't have an algorithm to determine the best probabilities to pick a move given that we know how the player will choose their move, but I am guessing that it exists somewhere and is simple to implement
  37.  
  38. - so the tl;dr is:
  39. - make a payoff matrix given every possible action of the player and AI, and guess the probability of winning for each combination of strategies
  40. - find the nash equilibrium of the payoff matrix
  41. - out of the possible strategies, choose each strategy with the probability provided
  42.  
  43. - even more simplified tl;dr:
  44. - do something like the image except the numbers are "probability of winning"
  45. - the numbers don't necessarily need to be "probability of winning", if you find something that works better then use it
  46.  
  47. sample code for the algorithm (no context so it probably won't help unless if you already have a basic understanding of game theory): https://gist.github.com/luckytyphlosion/e7c520d34dd7db6fa904d02df44a8205
  48.  
  49. Textbook referenced: https://www.math.ucla.edu/~tom/Game_Theory/mat.pdf
  50. Algorithm taken from "4.7 Approximating the Solution: Fictitious Play"
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement