Metagame Quantitative Ubers

I've been thinking about "metagame". What is a metagame really is? It appears that it's derived from a consensus between people based on the performance of pokemon in games. Since we can observe the performance in games quantitatively, therefore it is possible to model the metagame.
Current method
In viability thread, people debate the niches of pokemon and how they interact with eachothers however the amount of data being used is laughable. Sure, there's odd damage calcs, but does it really take account of everything? Did people realize the connection between Bugceus and Zygarde while ranking Bugceus at D and Zygarde at A-? Probably not.

Proposed method
You might have noticed how I bolded the word "niche". The reason will become apparent shortly. I propose that metagame is an ecosystem. Each pokemon has its own set (set = genotype) which interacts with environment (game mechanics = environment (which is just really complex interactions of mechanics)) to survive (win game = fitness). As people play the game, the metagame evolves as we can observe by viability thread changes. Some sets are realized to be bad or whatever which means that set has lower fitness than some other set. Therefore, it is possible to model the metagame via a fitness landscape and maintain it as metagame evolves to observe the evolutionary effect of ubers players' mind in progress.

If that went over your head. I'll make a simple example-

Magearena vs Primal Groudon. What happens here? Primal Groudon beats Magearena. Now consider a parallel of this situation - a Wolf and rabbit. Wolf eats rabbit.

Primal Groudon is able to check xern, check kyogre, and etc. Magearena is able to check xern, check yveltal and etc.

Wolf eats squirrel, birds, and etc. Rabbits eat berries, insects, carrot, and etc.

So now, you can see how a "metagame" for wolf and rabbit will emerge. Just like in pokemon! This metagame is usually called food web. Now, how do you measure the most successful species? It's not just by population size, since a rabbit breeds a lot but it can get owned by wolves anyways. Maybe rabbit will mutate some way to beat wolves. Who knows? This is evolution that acts on the food web. Again, the same as how pokemon would change the sets through time. What I posted earlier is the way to look at this objectively.

I intend to post more thoughts/ideas and create the actual tools to make quant ubers a reality. Please feel free to post ideas, critiques, and whatever.
 
Ok i see what u mean about like primal groudon beating magerna and how magerna always checks xern like a wolf eating a rabbit but sometimes this is not always the case like take primal groudon vs primal kyogre if kyogre came in on pdon it can check it if its faster but if primal groudon has higher speed and enough attack primal groundon checks kyogre. (there is a lot more mons fighting each other that can be like this but this was the only example i could think of atm) So its not really fitting with that wolf eats rabbit. and its not really finding a new set that always beats the other mon (like the mutation thing of how u beat the wolves). So how do u measure these mons? This is my only problem i see with this and other then this i like what u said.
 
Ok i see what u mean about like primal groudon beating magerna and how magerna always checks xern like a wolf eating a rabbit but sometimes this is not always the case like take primal groudon vs primal kyogre if kyogre came in on pdon it can check it if its faster but if primal groudon has higher speed and enough attack primal groundon checks kyogre. (there is a lot more mons fighting each other that can be like this but this was the only example i could think of atm) So its not really fitting with that wolf eats rabbit. and its not really finding a new set that always beats the other mon (like the mutation thing of how u beat the wolves). So how do u measure these mons? This is my only problem i see with this and other then this i like what u said.
That is the problem with a simplified example. You're right about how it's really complicated and messy. This is why I mentioned set = genotype. Let's not use kyogre vs Groudon as example because weather changing on switch in makes this tedious to use as an example.

Let's use Primal Groudon and Arceus-water as an example. We will restrict the number of possible sets to the following to make this fast and simple:

Defog Waterceus
Arceus-Water @ Splash Plate
Ability: Multitype
EVs: 248 HP / 200 Def / 56 Spe
Bold Nature
IVs: 0 Atk
- Toxic
- Ice Beam
- Defog
- Recover

Mixed Offensive Pdon
Groudon-Primal @ Red Orb
Ability: Desolate Land
EVs: 120 Atk / 252 SpA / 136 Spe
Mild Nature
IVs: 30 Atk
- Rock Polish
- Fire Blast
- Hidden Power [Ice]
- Precipice Blades

Swords dance Stealth rock Pdon
Groudon-Primal @ Red Orb
Ability: Desolate Land
EVs: 252 Atk / 4 SpD / 252 Spe
Jolly Nature
- Swords Dance
- Stealth Rock
- Precipice Blades
- Stone Edge

Defog Waterceus vs Mixed Offensive Pdon - Defog Waterceus wins!
Defog Waterceus vs Swords dance Stealth rock Pdon - Swords dance Stealth rock Pdon wins!
Swords dance Stealth rock Pdon vs Mixed Offensive Pdon - Swords dance Stealth rock Pdon wins!

So, we can see that there's a pokemon (species) that wins and loses to other pokemon. Knowing which pokemon beats other pokemon only gives us limited information. From pokemon level analysis, it would seem like that Primal Groudon is the best pokemon ever and waterceus beats it sometimes magically. This is why we're focused on sets (genotype). With sets analysis, we can see the scenarios that I listed and conclude that Mixed Offensive Pdon is a bad set (genotype) and have zero fitness in this example.

Why am I emphasizing genotype (sets)? Genotype is information encoded in an organism (pokemon) that produces tools (moves, stats, and abilities) to interact with mechanics of environment (the game) and other organisms (other pokemons). Every unique genotype are different from other genotypes. So, by observing the population of a particular genotype interacting with all other genotypes, we can record the data and model it to create quantitative metagame even if it's really complex.
 
I personally find this a quite confusing, albeit an intriguing, idea.

A check compendium, where we see what checks / counters what, would be simpler to use and more useful for a player.
 
thanks for minimodding
Why do we need this? Is this a way to determine an absolute ranking of Pokemon based on best to worst?
why do we do anything on smogon? it's not absolute. it will change based on the evolution of metagame. it's just a way to do this quantitatively rather than qualitatively.
I personally find this a quite confusing, albeit an intriguing, idea.

A check compendium, where we see what checks / counters what, would be simpler to use and more useful for a player.
an automated comprehensive check compendium is one of possible outcome from this project.
 
What I mean was that from what I am reading, you're trying to build a dynamic absolute ranking system for Pokemon in Ubers.

In my honest opinion, if you don't have a working prototype of this method (either by hand or as a Python/any language script), then it's sincerely not worth talking about, let alone starting a thread on it. I have a million ideas on how to handle bans suspect tests in UU, but I'm not going to start a new thread about it.
 

mags

Banned deucer.
primal groudon beats pogre but pogre can defeat or escape pdon's wrath with certain sets. It's like how a dog beats a squirrel if it's caught but a squirrel can climb up a tree and drop acorns on the dog till it dies. Orch is a genius.
 
What I mean was that from what I am reading, you're trying to build a dynamic absolute ranking system for Pokemon in Ubers.

In my honest opinion, if you don't have a working prototype of this method (either by hand or as a Python/any language script), then it's sincerely not worth talking about, let alone starting a thread on it. I have a million ideas on how to handle bans suspect tests in UU, but I'm not going to start a new thread about it.
and people complain about lack of interesting threads in ubers subforum. maybe if people like you got your stick out of your ass, then we'd have a far more active and interesting subforum rather than worrying about what to talk about. you aren't even a regular in ubers community and you are actively trying to undermine content in a barely active subforum.

i am currently programming, and it of course takes time. it is very convenient to see discussion as i program so that i can avoid missteps in logic before it happens. also, i am not an effective communicator so that allowing people to post and ask me questions will clarify my progress as well. ranking is one of few potential application of this project.

a) objective ranking of all sets vs sets in 1v1 setting (obviously flawed due to exclusion of dynamism in a typical ubers game)
b) get ladder games and see how often a set in any particular team beats other team (all sets in winning team get increased fitness and losing sets decreases their fitness). this -should- get me the dynamic data which will somehow resemble the actual reality of pokemon game
c) use the above data to recreate mazar
d) automatic generation of all sets vs sets in a graphic
e) observe overlaps in niches
f) observe differences between ladder and tournament by doing b on different data sets
g) see how evolution of each different environment (ladder/tour/whatever) differ.

magsyy seems to get it somehow lol...
 
Last edited:
and people complain about lack of interesting threads in ubers subforum. maybe if people like you got your stick out of your ass, then we'd have a far more active and interesting subforum rather than worrying about what to talk about. you aren't even a regular in ubers community and you are actively trying to undermine content in a barely active subforum.

i am currently programming, and it of course takes time. it is very convenient to see discussion as i program so that i can avoid missteps in logic before it happens. also, i am not an effective communicator so that allowing people to post and ask me questions will clarify my progress as well. ranking is one of few potential application of this project.

a) objective ranking of all sets vs sets in 1v1 setting (obviously flawed due to exclusion of dynamism in a typical ubers game)
b) get ladder games and see how often a set in any particular team beats other team (all sets in winning team get increased fitness and losing sets decreases their fitness). this -should- get me the dynamic data which will somehow resemble the actual reality of pokemon game
c) use the above data to recreate mazar
d) automatic generation of all sets vs sets in a graphic
e) observe overlaps in niches
f) observe differences between ladder and tournament by doing b on different data sets
g) see how evolution of each different environment (ladder/tour/whatever) differ.

magsyy seems to get it somehow lol...
Clearly you're not effective at communicating when your response to someone disagreeing with you is telling them to get the "stick out of my ass". I may not be a regular, but your concept extends to other tiers as well (OU, UU, RU, etc.).

If we want to talk about side projects, take http://azelf.info/. This guy's project was to make looking up usage stats more dynamic and interactive for the average Showdown player. When he pitched the idea, he had a fully working prototype and posted in the Technical Projects section on Smogon, not on the Ubers subforum.

So you essentially did three things wrong:
1) Pitch a technical project with no working prototype at all.

I can have an idea of a robotrader that can make completely accurate trades to maximize profits, but I don't have a working prototype of it, so my idea pretty much holds no weight at all. I'm sure you're coding the script right now, but please talk about it after you have something working on it (i.e. a theoretical metagame with like 5 Pokemon in it as a proof of concept).

2) Post the project in the completely wrong subforum (literally, this post is only tangentially related to Ubers)

This is more or less a technical project and has NOTHING to do with metagame discussion.

3) Insult a person giving you actual feedback.

Most of what I'm telling you is also applicable in the real world. You think a corporate board will greenlight your project if you don't even have a working demonstration of your concept? Besides, I didn't even insult you in any way (call you names, say your idea is stupid, etc.). I'm only asking legitimate questions: Why do we need this? Is this better than the current model? Is there a working prototype. If you're want to be immature about it, go ahead. People will just have less inclination to take you seriously.

Also, Magsyy is most likely mocking you if you couldn't already tell from the "...drop acorns on the dog till it dies" bit.
 

aVocado

@ Everstone
is a Site Content Manager Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
This is an interesting idea. anything that can somehow visualize how good a Pokemon is/can be is always a good thing imo. And unlike traditional viability rankings this can also be used for other things

If I understand this correctly, you're basically making a list of ubers pokemon and ranking them based on how well they do in the "metagame", measured by how many pokemon it beats? are there any other criteria that you haven't mentioned or I haven't thought of? set variability, easiness of using/building around it, etc could also be factors I suppose.

hope this isn't locked tbh cuz I always disliked the concept of "u need to get the thread approved by a mon and only certain types of threads are allowed!!"
 

mags

Banned deucer.
Don't assume what my intentions are you non ubers regular scum. I think the logic is common sense as the world has lived by this system forever whether it be the food web or business or mons. I'm not sure if it's worth the effort to make some prototype but it's a cool thought. As far as orch not having a prototype ready that's a dumb argument especially when he said he wanted the thread to provoke discussion that can help him make sure his ideas aren't flawed. Also this concept might be able to extent to other tiers but if you want my honest opinion ubers is so much differant than other tiers that it would need it's own regulations so it's not exactly a bad idea to start a thread in ubers subforum. I doubt this got permission from a mod but let them handle it. This thread shouldn't be locked though since the subforums are dead and discussion on ubers metagame and what makes a mon good etc should always be welcome especially since vb rankings are locked so no other platforms where you can talk about why mons are good and why mons deserve to be above others. I was meming tho with the acorn part but i wasn't meming on the idea itself.
 
Ok I'm going to outline my plan as of now. There are two different goals here:
1) See how sets match up 1v1
a) gather data (sets)
b) pit sets against each others
c) measure fitness of sets
d) observe potential fitness (limited to offensive power)
2) See how sets match up in real games (this includes the intangibles)
a) gather data (replays)
b) use replays data to measure fitness of sets via win/loss record vs all other sets
c) observe realized fitness
d) observe evolution of players
3) help me out here

#2 is probably more interesting, but since the new game is coming out within like 2 months, so I figured that it'd be better to make the tools to gather replay data for a fresh meta and observe evolution its entirety.

What I have done so far:
Translate text file into convenient data structure
What will I do?:
Create tool that interacts with damage calculator with the data to automate the data gathering progress
Refactor the project structure so that adding new sets is easy
 
It's an interesting idea, but how will you handle the slight variations in a set ? For example, rock polish pdon can run pblades/stone edge/fire punch, but also Fire Blast, Eruption over Fire Punch. It can also have HP Ice over stone edge, it can have rock slide over stone edge.... That leads us to like 10 variations for 1 set of 1 mon ! So every time, arc water beats those Pdon sets, so Arceus Water wins 10 different matchups ? That kind of falses your measures in my opinion.

One way to solve this issue would be to factor the set with usage (sorry if that's not clear, can't find the words), so for example Rock Polish/Fire Punch/PBlades/Stone Edge has 10% use, Fire Blast version has 1.5%, etc... So in the end we can say "Arc-Water beats 80% of Pdon sets by usage", even if it only loses to one or 2 sets / 100 in the whole list of Pdon variations you will have.

So if i understand well, the aim of your project is to say "That one defensive Arc-Water set beats 80% of Pdon sets, 100% of Zygarde-C sets, 0% of Toxapex", etc... Am i right ?

Anyway i love this kind of project, i hope we'll see it work soon ! :)
 

Ropalme1914

Ace Poker Player
is a Tiering Contributoris a Contributor to Smogonis a Smogon Media Contributoris a Site Content Manager Alumnusis a Social Media Contributor Alumnusis a Forum Moderator Alumnus
So, I found the idea very interesting, but would be any way to weight the stats? Like, Arceus-Water maybe can beat more things than Zygarde-C on a 1v1 situation, however, Zygarde-C can be better overall due to it beating more relevant things like most Primal Groudon. We could have some way to weight individual sets, like SD Primal Groudon is more relevant than Mixed Primal Groudon, so winning against the first set will give you more points, or we could have a more general way to make it, like "This Pokémon wins against 45% of the metagame in a 1v1 situation, so it's 15th place".
 
It's an interesting idea, but how will you handle the slight variations in a set ? For example, rock polish pdon can run pblades/stone edge/fire punch, but also Fire Blast, Eruption over Fire Punch. It can also have HP Ice over stone edge, it can have rock slide over stone edge.... That leads us to like 10 variations for 1 set of 1 mon ! So every time, arc water beats those Pdon sets, so Arceus Water wins 10 different matchups ? That kind of falses your measures in my opinion.

One way to solve this issue would be to factor the set with usage (sorry if that's not clear, can't find the words), so for example Rock Polish/Fire Punch/PBlades/Stone Edge has 10% use, Fire Blast version has 1.5%, etc... So in the end we can say "Arc-Water beats 80% of Pdon sets by usage", even if it only loses to one or 2 sets / 100 in the whole list of Pdon variations you will have.

So if i understand well, the aim of your project is to say "That one defensive Arc-Water set beats 80% of Pdon sets, 100% of Zygarde-C sets, 0% of Toxapex", etc... Am i right ?

Anyway i love this kind of project, i hope we'll see it work soon ! :)
good point. i think that these minor variations are important and they all have their own subtle niches which i hope will be become apparent with my project. so basically what i want to happen is: there's 20 pdon sets with all of minor differences, then you can pick one with the exact niche that you need for your team. ultimately, this project is about finding these niches even if there's overlaps- it's the differences that interests me. for example, fire blast pdon is usually used as rock polish or stealth rock sets. eruption is usually only used on sticky and never as a sweeper. fire punch is usually used on support or physical offensive sets. so even though the typing and purpose is superficially similar. the realized niches are very different. as for this influencing the weighing of ranking, usage is a good way to weigh this. ultimately the step #2 of my previous post will make this inherent part of the progress. (real battle data = usage data is included implicitly). other idea that i had was to weigh sets by unrealized niche so that most high unrealized niche sets weigh more than lower unrealized niche sets. weighing by usage stats is basically what weighing by realized niche would be.

unrealized niche = the full range of potential that a set can do
realized niche = what actually happens

So, I found the idea very interesting, but would be any way to weight the stats? Like, Arceus-Water maybe can beat more things than Zygarde-C on a 1v1 situation, however, Zygarde-C can be better overall due to it beating more relevant things like most Primal Groudon. We could have some way to weight individual sets, like SD Primal Groudon is more relevant than Mixed Primal Groudon, so winning against the first set will give you more points, or we could have a more general way to make it, like "This Pokémon wins against 45% of the metagame in a 1v1 situation, so it's 15th place".
see step #2 in my previous post. i don't plan to make it just based on damage calc, but using data from real battles eventually. of course 1v1 data is obviously flawed. klefki loses practically all of 1v1 match ups yet it's clearly an useful pokemon with powerful niche. the benefits that klefki provide are intangible to an extent. however i suppose that one could measure the average of hazard damage over games and impact of status (every time that opposing pokemon is slower than your pokemon is a "win" for klefki). yet it doesn't quite capture the utility of klefki fully. any ideas/suggestions about this problem is useful.

now thinking about it more, the step #1 seem like a total waste of time. it's simply a measurement of offensive power and neglecting support/defensive strength that many unique sets have.

Saving for myself http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0175138
 
Last edited:
ty hyw

So, I've been busy last two weeks but it doesn't mean that I didn't put any thought in this project! I reached conclusion that my original goal was far too ambitious. To find genotypes means that I would have to have access to full set information. Unfortunately, vast majority of battles end up with some unrevealed moves and other necessary details. Therefore, if I want to move on with that direction, I would have to get the privileged access to PS which I do not want as a competitive Ubers player. So, I am moving my goal down to a weaker one.

The weaker goal happens to coincide with what evolutionary genetics and population genetics scientists do: assess genotypes by populations. So, what does this mean by pokemon terms? I would be looking at pokemon as pokemon itself rather than sets. So, I would look at all Primal Groudon as Primal Groudon even if the sets have different purposes. Now, the next logical question is: isn't this what current usage stats tell us? I argue - no. Current usage stats is an incomplete picture of a metagame. Having a superficial winrate of every pokemon versus every other pokemon doesn't tell us very much at all honestly.

What I think is more interesting is specific match ups. For example, what is winrate of Primal Kyogre when it faces a Xerneas? What is the winrate of Ho-Oh and Defog verus Ho-Oh and rapid spin? This goes on, and I think that this is valuable information that is accessible and interesting.

cont... I'll update this post as I work on my scrapper/analyzer more today.
Thoughts: save moves for later. This requires dynamic scrapping and some other stuff that I haven't thought of yet.
1) Get battle data - Winner name, pokemons of winner, loser name, pokemons of loser
2) Derive winrate data for specified selection aka: I want Winner Xern vs Loser Kyogre win rate - I get winner xern vs loser kyogre (A), loser xern vs winner kyogre data (B).
3) Add up total matches that matches these conditions. I subtract A by B then divide by total matches.

finally got scrapper to scrap arceus type too! that was frustrating...
 
Last edited:

hyw

Banned deucer.
ok now that is a much better model that we can work with

approaching this by trying to apply ideas of stochastic processes to a network of variables is much more efficient and meaningful than trying to map everything out by hand piece by piece which is what seems to be ur original proposal

we can start off macro and get nitty gritty abt details once we have some overview of what this map will look like first; seems counterintuitive to work from inside out

math enthusiasts unite! :D
 
It's here!
Code:
Magikarp's winrate against Arceus-Ground is  Magikarp has always lost to Arceus-Ground
Arceus-Ground's winrate versus Magikarp is  100%
Xerneas' winrate versus Arceus-Ground is  This combination have never happened in this dataset
Arceus-Ground's winrate versus Xerneas is  This combination have never happened in this dataset
Zygarde's winrate against Zygarde is  50%
The next step is to get a lot of replay files. Suggestions on where to get them from is appreciated.

e: I also made a script that automates making a spreadsheet for this data which makes it easier to visualize!
https://docs.google.com/spreadsheet...8o-esufHnX/pubhtml?gid=1056132193&single=true



How to read the graph: Row versus Column. So, the the cells represent the left pokemon's winrate against pokemon on the top. The second number after percent is the total of games for that match up.
 
Last edited:

Users Who Are Viewing This Thread (Users: 1, Guests: 1)

Top