obi
formerly david stone
I'm working on some new and exciting usage stats. I've just finished parsing some old data as a test -- 2019-12-01 through 2020-02-29. The sample data was anonymized, which means I don't have access to player ratings, so I'm unable to do any smart weighting work (although I have plenty of interesting ideas for that, too!). Despite that limitation, I think what I have so far is super useful. I wanted to share the kind of statistics my new parser is able to generate.
I generate the usual stats. You can tell how often each Pokemon is used. You can tell how often each Pokemon has a move, item, or ability. These are the stats that we already have, so they aren't that interesting. My stats have two advantages:
Most excitingly, though, you can correlate most interesting factors with a Pokemon's move. For every Pokemon's top 20 most used move, you can determine how often that move on that Pokemon is used with every item, every ability, every Pokemon, and in fact, every other move on every Pokemon.
Let me give a couple examples to help explain what this gives you.
The data set contained 13,232,976 teams (so half that many battles, or 6,616,488). My output file is 450 MiB large. For comparison, the current Gen 8 OU chaos stats are 14 MiB. My stats just on Mew are 1.7 MiB.
The first result I looked at was Abomasnow because alphabetically, it's first. 6.6% (3112) of Abomasnow use Ice Beam.
If Abomasnow has Ice Beam:
Ice Beam Abomasnow is not exactly a metagame threat, so maybe those stats aren't that interesting -- Ice Beam Abomasnow was used on 0.02% of teams in that 3 month period. Let's look at something a little more OU: Mew.
Mew was on 8.19% (541906) of teams. 29.5% (159650) of Mew use Spikes. If Mew has Spikes:
Like I said, the file is 550 MiB. This is too large for a normal person to open and read. If you tried, it would probably crash your text editor or at least make it run very slow. Part of the reason for the current size is an arbitrary decision I made. I do correlations with the top 20 moves for each Pokemon. I didn't have any basis for the number 20, I just needed to choose something to ensure my script would be able to fit in memory (it could theoretically require 1.5 TiB of RAM just to run). I'm going to do some analysis to determine what % of move usage is accounted for by the top N moves to make a better decision based on that data. However, I also want to add even more data, which means even bigger file. I'm working on how best to let people browse the data and get useful insights -- a file probably isn't it.
I have a team generator / team predictor that can take advantage of these stats, and chain the results of multiple correlations to generate a coherent team. This is the algorithm that Technical Machine (my AI) has used to generate teams for years, but I've had to disable most of the intelligent parts of that algorithm because (until now) the usage stats generated for Pokemon Showdown have been insufficient.
There is still some work to do, but I hope to finish it up over the next few days and get something out there that people can use. Feel free to let me know if you have any suggestions for more useful stats or questions. The next steps for me are to wait for the data approval process so I can access to battle logs with user ratings in them. I'll have a follow-up post here soon explaining exactly what I plan to do with that, but it includes a (hopefully) smarter way to weight teams than we do right now.
I generate the usual stats. You can tell how often each Pokemon is used. You can tell how often each Pokemon has a move, item, or ability. These are the stats that we already have, so they aren't that interesting. My stats have two advantages:
- It took about 70 minutes for me to run the stat generator over 3 months of OU logs using a single thread of execution. I am about to do some optimizations to bring that number down to probably around half of that, and I could easily cut it down a lot further.
- It generates more stats than that.
Most excitingly, though, you can correlate most interesting factors with a Pokemon's move. For every Pokemon's top 20 most used move, you can determine how often that move on that Pokemon is used with every item, every ability, every Pokemon, and in fact, every other move on every Pokemon.
Let me give a couple examples to help explain what this gives you.
The data set contained 13,232,976 teams (so half that many battles, or 6,616,488). My output file is 450 MiB large. For comparison, the current Gen 8 OU chaos stats are 14 MiB. My stats just on Mew are 1.7 MiB.
The first result I looked at was Abomasnow because alphabetically, it's first. 6.6% (3112) of Abomasnow use Ice Beam.
If Abomasnow has Ice Beam:
- it most common ability is Snow Warning (99.4%), not Soundproof (0.6%)
- its most common item is Leftovers (29%)
- the most common second move is Giga Drain (55.3%)
- it's rarely used with Hippowdon (1%)
- never used with Heatran
- if you see a Hydreigon on its team (20.8% of the time you will), Hydreigon has a 16% chance to have U-turn
Ice Beam Abomasnow is not exactly a metagame threat, so maybe those stats aren't that interesting -- Ice Beam Abomasnow was used on 0.02% of teams in that 3 month period. Let's look at something a little more OU: Mew.
Mew was on 8.19% (541906) of teams. 29.5% (159650) of Mew use Spikes. If Mew has Spikes:
- its most common item is Red Card (51.8%)
- the top few most common second moves are:
- Stealth Rock (87.8%)
- Taunt (75.3%)
- Self-Destruct (22.4%)
- Flare Blitz (20.1%)
- Skill Swap (16.3%)
- Cloyster has a 52.8% chance of being on the same team. If they have a Cloyster, its top moves are:
- Icicle Spear (99.3%)
- Shell Smash (99.2%)
- Rock Blast (94.2%)
- Ice Shard (61.7%)
- Liquidation (29.8%)
- Everything else has a 2.3% chance or less
- Despite the fact that Cloyster in general has a 5.6% chance to have Spikes, when used with Spikes Mew, it has only a 1.5% chance to have Spikes.
Like I said, the file is 550 MiB. This is too large for a normal person to open and read. If you tried, it would probably crash your text editor or at least make it run very slow. Part of the reason for the current size is an arbitrary decision I made. I do correlations with the top 20 moves for each Pokemon. I didn't have any basis for the number 20, I just needed to choose something to ensure my script would be able to fit in memory (it could theoretically require 1.5 TiB of RAM just to run). I'm going to do some analysis to determine what % of move usage is accounted for by the top N moves to make a better decision based on that data. However, I also want to add even more data, which means even bigger file. I'm working on how best to let people browse the data and get useful insights -- a file probably isn't it.
I have a team generator / team predictor that can take advantage of these stats, and chain the results of multiple correlations to generate a coherent team. This is the algorithm that Technical Machine (my AI) has used to generate teams for years, but I've had to disable most of the intelligent parts of that algorithm because (until now) the usage stats generated for Pokemon Showdown have been insufficient.
There is still some work to do, but I hope to finish it up over the next few days and get something out there that people can use. Feel free to let me know if you have any suggestions for more useful stats or questions. The next steps for me are to wait for the data approval process so I can access to battle logs with user ratings in them. I'll have a follow-up post here soon explaining exactly what I plan to do with that, but it includes a (hopefully) smarter way to weight teams than we do right now.
Attachments
-
14.8 KB Views: 174
-
12.7 KB Views: 178