Programming Turning Battle Logs into Usage Stats

As I'm sure many of you have seen, I recently took a crack at generating usage stats.

Unlike my predecessors, however, the only raw data I was able to access were the battle logs stored on the server. These logs, which are pretty much identical to the ones that get generated client-side, leave a lot to be desired--they only show the pokemon that appeared in the battle itself, they don't contain natures/items/EV spreads/movesets, and they don't tell the players' current ranking.

Also, they're in HTML. Great for turning into warstories, pretty annoying for trying to cull data from.

But, nonetheless, I managed to write a few python scripts which turn these battle logs into usage stats (what we're going to end up DOING with these stats is a question for another thread), and I'm posting them here. Feel free to make suggestions as to how to modify them or improve them--I'll need all the help I can get.
Course of Action

To turn battle logs into usage stats, here's what needs to be done:

  1. Identify the tier and whether the battle was rated.
  2. Make sure the battle meets with any arbitrary criteria we decide upon ("longer than 5 turns," "player has rating above 1000", "loser said gg after the battle...")
  3. Find all lines beginning with <div class="SendOut">
  4. Identify the name of the trainer and the species of the pokemon sent out (THANK GOD we play with Species clause). This is a bit tricky because the string is different depending on whether the pokemon was nicknamed or not.
  5. Remove redundant entries (to account for switching)
  6. Write the species of all pokemon used in the battle to a file (write the species name twice if both trainers used it, obviously).
  7. Make another script. This one will take that giant file and simply tally each pokemon's usage (doing this step separately, rather than keeping a running tally, prevents racing conditions if you're parallelizing the workload).
  8. Sort the usage stats.
  9. PROFIT!!!

This script will take a battle log (server version 1.0.23) and write the names of all pokemon used in the battle to a file corresponding to the battle's tier.

python "name-of-log-file.html"
import string
import sys
filename = str(sys.argv[1])
file = open(filename)
log = file.readlines()

if (len(log) < 15):
#determine tier
if log[2][0:25] != '<div class="TierSection">':
tier = log[2][string.find(log[2],"</b>")+4:len(log[2])-7]
if log[3][0:19] == '<div class="Rated">':
	rated = log[3][string.find(log[3],"</b>")+4:len(log[3])-7]
	if log[5][0:19] == '<div class="Rated">':
		rated = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
		print "Can't find the rating"
		for line in range(0,15):
			print line

#make sure the battle lasted at least six turns (to discard early forfeits)
longEnough = False
for line in log:
	if line == '<div class="BeginTurn"><b><span style=\'color:#0000ff\'>Start of turn 6</span></b></div>\n':
		longEnough = True
if longEnough == False:

#trainer = []
#species = []
ts = [] #handle in one array to allow for sorting
#find all "sent out" messages
for line in range(6,len(log)):
	if log[line][0:21] == '<div class="SendOut">':
		ttemp = log[line][21:string.find(log[line],' sent out ')]

		#determine whether the pokemon is nicknamed or not
		if log[line][len(log[line])-8] == ')':
			stemp = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
			stemp = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]

		#determine whether this entry is already in the list
		match = 0
		for i in range(0,len(ts)):
			if (ts[i][0] == ttemp) & (ts[i][1] == stemp):
				match = 1
		if match == 0:

ts=sorted(ts, key=lambda ts:ts[0])

outname = "Raw/"+tier+" "+rated+".txt"

while (ts[i][0] == ts[0][0]):
	i = i + 1
for j in range(i,len(ts)):

Here's a new version that does quite a bit more--this one identifies not only usage but culls data for other "pokemetrics." It does this by keeping track of all matchups in a battle and the outcome of that matchup.

import string
import sys
filename = str(sys.argv[1])
file = open(filename)
log = file.readlines()

if (len(log) < 15):

oldWay = 1
#determine tier
if log[2][0:25] == '<div class="TierSection">':
	tier = log[2][string.find(log[2],"</b>")+4:len(log[2])-7]

	if log[3][0:19] == '<div class="Rated">':
		rated = log[3][string.find(log[3],"</b>")+4:len(log[3])-7]
		if log[5][0:19] == '<div class="Rated">':
			rated = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
			print "Can't find the rating for "+filename
			for line in range(0,15):
				print log[line]
	if log[5][0:25] != '<div class="TierSection">':
		print "Can't find the tier for "+filename
	tier = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
	if log[6][0:19] == '<div class="Rated">':
		rated = log[6][string.find(log[6],"</b>")+4:len(log[6])-7]
		if log[8][0:19] == '<div class="Rated">':
			rated = log[8][string.find(log[8],"</b>")+4:len(log[8])-7]
			print "Can't find the rating for "+filename
			for line in range(0,15):
				print log[line]

#make sure the battle lasted at least six turns (to discard early forfeits)
longEnough = False
for line in log:
	if line == '<div class="BeginTurn"><b><span style=\'color:#0000ff\'>Start of turn 6</span></b></div>\n':
		longEnough = True
if longEnough == False:

#get info on the trainers & pokes involved
ts = []
skip = 0
if oldWay == 0:
	for line in range(1,len(log)):
		if log[line][0:19] == '<div class="Teams">':
			for x in range(0,2):
				trainer = log[line+x][50:string.rfind(log[line+x],"'s team:")]
				if string.find(trainer,"send out") > -1:
					print trainer+" is a dick."
				stemp = ""
				for i in range(string.rfind(log[line+x],"</span></b>")+11,len(log[line+x])):
					if log[line+x][i:i+3] == ' / ':
						skip = 3
					if log[line+x][i] == '<':
					if skip > 0:
						stemp = stemp+log[line+x][i]

if (line == len(log)) or oldWay == 1: #it's an old log, so find pokes the old way
	#find all "sent out" messages
	for line in range(5,len(log)):
		if log[line][0:21] == '<div class="SendOut">':
			ttemp = log[line][21:string.find(log[line],' sent out ')]
			#determine whether the pokemon is nicknamed or not
			if log[line][len(log[line])-8] == ')':
				stemp = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
				stemp = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]

			#determine whether this entry is already in the list
			match = 0
			for i in range(0,len(ts)):
				if (ts[i][0] == ttemp) & (ts[i][1] == stemp):
					match = 1
			if match == 0:
	ts=sorted(ts, key=lambda ts:ts[0])
	#gotta fill in the gaps
	while (ts[i][0] == ts[0][0]):
	if i<6:
		for j in range(i,6):
	ts=sorted(ts, key=lambda ts:ts[0])
	if len(ts)<12:
		for j in range(i,12):

#find where battle starts
active = [-1,-1]
for line in range(1,len(log)):
	if log[line][0:21] == '<div class="SendOut">':
		for x in range(0,2):
			#ID trainer
			trainer = log[line+x][21:string.find(log[line+x],' sent out ')]
			if trainer == ts[0][0]:
			#it matters whether the poke is nicknamed or not
			if log[line+x][len(log[line+x])-8] == ')':
				species = log[line+x][string.rfind(log[line+x],'(')+1:len(log[line+x])-8]
				species = log[line+x][string.rfind(log[line+x],'sent out ')+9:len(log[line+x])-8]
			for i in range(0,6):
				if species == ts[6*t+i][1]:
					active[t] = i
start = line +2

#metrics get declared here
turnsOut = [] #turns out on the field (a measure of stall)
matchups = [] #poke1, poke2, what happened

for i in range(0,12):

#parse the damn log

roar = 0
uturn = 0
ko = 0
switch = 0
doubleSwitch = -1
uturnko = 0
ignore = 0

for line in range(start,len(log)):
	#identify what kind of message is on this line
	linetype = log[line][12:string.find(log[line],'">')]

	if linetype == "BeginTurn":
		#reset for start of turn
		roar = uturn = switch = ko = uturnko = 0
		doubleSwitch = -1

		#Mark each poke as having been out for an additional turn

	if linetype == "UseAttack": #check for Roar, etc.; U-Turn, etc.
		#identify move
		move = log[line][string.rfind(log[line],"'>")+2:len(log[line])-19]
		if move in ["Roar","Whirlwind","Circle Throw","Dragon Tail"]:
			roar = 1
		elif move in ["U-Turn","Volt Switch","Baton Pass"]:
			if line+3 < len(log):
				if log[line][12:string.find(log[line],'">')] == "SendBack":
					uturn = 1

	elif linetype == "ItemMessage": #check for Red Card, Eject Button
		#search for relevant items
		if string.rfind(log[line],"Red Card") > -1:
			roar = 1
		elif string.rfind(log[line],"Eject Button") > -1:
			uturn = 1

	elif linetype == "Ko": #KO
		ko = ko+1
		#make sure it's not the end of the battle
		o = p = 0
		if line+2 < len(log):
			o = 1
		if line+1 < len(log):
			p = 1
		if log[line+2*o][12:string.find(log[line+2*o],'">')] == "BattleEnd":
			pokes = [ts[active[0]][1],ts[active[1]+6][1]]
			matchup=pokes[0]+' vs. '+pokes[1]+': '
			if ko == 1:
				matchup = matchup + ts[active[t]+6*t][1] + " was KOed"
			elif ko == 2:
				matchup = matchup + "double down"
				matchup = matchup + "no clue what happened"
		elif log[line+p][12:string.find(log[line+p],'">')] == "SendBack":
	elif linetype == "SendBack": #switch out
		switch = 1
	elif linetype == "SendOut":
		#ID trainer
		trainer = log[line][21:string.find(log[line],' sent out ')]
		if trainer == ts[0][0]:
		#make sure it's not a double-switch
		o = 0
		if line+2 < len(log):
			o = 1
		if ignore == 1:
			ignore = 0
		elif (o == 1) and (log[line+2*o][12:string.find(log[line+2*o],'">')] == "SendBack"):
			doubleSwitch = active[t]+t*6
			#close out old matchup
			if doubleSwitch > -1:
				pokes = [ts[active[0]][1],ts[doubleSwitch][1]]
				pokes = [ts[active[0]][1],ts[active[1]+6][1]]
			pokes=sorted(pokes, key=lambda pokes:pokes)
			matchup=pokes[0]+' vs. '+pokes[1]+': '
			if doubleSwitch > -1:
				matchup = matchup + "double switch"
			elif (uturnko == 1):
				matchup = matchup + ts[active[(t+1)%2]+((t+1)%2)*6][1] + " was u-turn KOed"
				ignore = 1
			elif ko == 1:
				matchup = matchup + ts[active[t]+6*t][1] + " was KOed"
			elif ko == 2:
				matchup = matchup + "double down"
				ignore = 1
			elif roar == 1:
				matchup = matchup + ts[active[t]+6*t][1] + " was forced out"
			elif (uturn == 1) or (switch == 1):
				matchup = matchup + ts[active[t]+6*t][1] + " was switched out"
				matchup = matchup + "no clue what happened"

		#new matchup!
		uturn = roar = 0
		#it matters whether the poke is nicknamed or not
		if log[line][len(log[line])-8] == ')':
			species = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
			species = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]
		for i in range(0,6):
			if species == ts[6*t+i][1]:
				active[t] = i
outname = "Raw/"+tier+" "+rated+".txt"

while (ts[i][0] == ts[0][0]):
	outfile.write(ts[i][1]+" ("+str(turnsOut[i])+")\n")
	i = i + 1
for j in range(i,len(ts)):
	outfile.write(ts[j][1]+" ("+str(turnsOut[j])+")\n")
for line in matchups:
2011/09/14 -- Instead of just writing the species names, this version gives the trainer's names, too, and divides up the pokemon used in the battle by their teams.

Once the LogReader has been run over the set of battle logs, you're left with a bunch of pokemon names and not much else. tallies these lists and turns them into usage stats.

I'm planning to modify this script soon to have the end result appear in a forum-friendly option, rather than the excel-friendly csv it currently does.

python "Raw/[Tier].txt"
where [Tier] is the tier you want to generate the stats for, e.g. "Raw/Standard OU Rated.txt"

import string
import sys

file = open("pokemons.txt")
pokelist = file.readlines()

lsnum = []
lsname = []
for line in range(0,len(pokelist)):
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
teamCount = 0
counter = [0 for i in range(len(lsnum))]
for entry in range(0,len(species)):
	found = False
	if trainerNextLine:
		trainer = species[entry]
		trainerNextLine = False
		ctemp = []
		if species[entry] == "***\n" or species[entry] == "---\n":
			trainerNextLine = True
			#decide whether to count the team or not
			#if you were going to compare the trainer name against a database,
			#you'd do it here.
			if len(ctemp) == 6: #only count teams with all six pokemon
				for i in ctemp:
					counter[i] = counter[i]+1.0 #rather than weighting equally, we
					#could use the trainer ratings db to weight these... 
				teamCount = teamCount+1
			if species[entry] == "---\n":
			for i in range(0,len(lsnum)):
				if species[entry] == lsname[i]:
					found = True
			if not found:
				print species[entry]+" not found!"
total = sum(counter)

#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
	counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556]  #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
	counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0

#sort by usage
pokes = []
for i in range(0,len(lsname)):
pokes=sorted(pokes, key=lambda pokes:-pokes[1])

print " Total battles: "+str(battleCount)
print " Total teams: "+str(teamCount)
print " Total pokemon: "+str(total)
print " + ---- + --------------- + ------ + ------- + "
print " | Rank | Pokemon         | Usage  | Percent | "
print " + ---- + --------------- + ------ + ------- + "
for i in range(0,len(pokes)):
	if pokes[i][1] == 0:
	print ' | %-4d | %-15s | %-6d | %6.3f%% |' % (i+1,pokes[i][0],pokes[i][1],100.0*pokes[i][1]/teamCount)

#csv output
#for i in range(len(lsnum)):
#	if (counter[i] > 0):
#		print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
2011/09/14 -- Updated for compatibility with new (and to make use of the new data). Also, this implementation only counts teams with six pokemon (but you can comment that part out quite easily).

import string
import sys

file = open("pokemons.txt")
pokelist = file.readlines()

lsnum = []
lsname = []
for line in range(0,len(pokelist)):
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
counter = [0 for i in range(len(lsnum))]
for entry in range(0,len(species)):
	if species[entry] == "---\n":
		for i in range(0,len(lsnum)):
			if species[entry] == lsname[i]:
total = sum(counter)

#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
	counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556]  #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
	counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0

print "Total battles: "+str(battleCount)
print "Total pokemon: "+str(total)
for i in range(len(lsnum)):
	if (counter[i] > 0):
		print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
This version only counts teams where the user had a rating greater than or equal to 1337 at the time I pulled the player rankings. An example "ranking.txt" is included below.

python Raw/[Tier].txt
Sample "rankings.txt" (all this info is publicly available on the Smogon server, so I don't feel bad about posting it):
import string
import sys

file = open("pokemons.txt")
pokelist = file.readlines()

lsnum = []
lsname = []
for line in range(0,len(pokelist)):
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])

file = open("rankings.txt")
ratings = file.readlines()

elite = []
for line in ratings:
	if int(line[str.rfind(line,'\t')+1:len(line)-1]) < 1337:

filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
teamCount = 0
counter = [0 for i in range(len(lsnum))]
for entry in range(0,len(species)):
	found = False
	if trainerNextLine:
		trainer = species[entry][0:len(species[entry])-1]
		trainerNextLine = False
		ctemp = []
		if species[entry] == "***\n" or species[entry] == "---\n":
			trainerNextLine = True
			#decide whether to count the team or not
			#if you were going to compare the trainer name against a database,
			#you'd do it here.
			if trainer in elite:
			#if len(ctemp) == 6: #only count teams with all six pokemon
				for i in ctemp:
					counter[i] = counter[i]+1.0 #rather than weighting equally, we
				#could use the trainer ratings db to weight these... 
				teamCount = teamCount+1
			if species[entry] == "---\n":
			for i in range(0,len(lsnum)):
				if species[entry] == lsname[i]:
					found = True
			if not found:
				print species[entry]+" not found!"
total = sum(counter)

#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
	counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556]  #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
	counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0

#sort by usage
pokes = []
for i in range(0,len(lsname)):
pokes=sorted(pokes, key=lambda pokes:-pokes[1])

print " Total battles: "+str(battleCount)
print " Total teams: "+str(teamCount)
print " Total pokemon: "+str(int(total))
print " + ---- + --------------- + ------ + ------- + "
print " | Rank | Pokemon         | Usage  | Percent | "
print " + ---- + --------------- + ------ + ------- + "
for i in range(0,len(pokes)):
	if pokes[i][1] == 0:
	print ' | %-4d | %-15s | %-6d | %6.3f%% |' % (i+1,pokes[i][0],pokes[i][1],100.0*pokes[i][1]/total*6.0)

#csv output
#for i in range(len(lsnum)):
#	if (counter[i] > 0):
#		print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
This version is designed to function with "" and generates a table containing not only usage stats, but two relevant "pokemetrics." It also creates an "encounter matrix" that keeps track of what happens when two pokemon go head-to-head, but I'm not sure how it process it yet (hence, pickle it for later).

python "Raw/[Tier].txt" matrix.p
where matrix.p is the name of the file where you're going to dump your matrix.

import string
import sys
import cPickle as pickle

file = open("pokemons.txt")
pokelist = file.readlines()

lsnum = []
lsname = []
for line in range(0,len(pokelist)):
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
teamCount = 0
counter = [0 for i in range(len(lsnum))]
realCounter = [0 for i in range(len(lsnum))]
turnCounter = [0 for i in range(len(lsnum))]
encounterMatrix = [[[0 for k in range(9)] for j in range(len(lsnum))] for i in range(len(lsnum))]
for entry in range(0,len(species)):
	found = False
	if trainerNextLine:
		trainer = species[entry]
		trainerNextLine = False
		ctemp = []
		turnt = []
	elif eventNextLine:
		if species[entry] == "---\n":
			eventNextLine = False
			trainerNextLine = True
			poke1 = species[entry][0:string.find(species[entry]," vs.")]
			poke2 = species[entry][string.find(species[entry]," vs.")+5:string.find(species[entry],":")]
			event = species[entry][string.find(species[entry],":")+2:len(species[entry])-1]
			#ID pokemon involved
			for i in range(0,len(lsnum)):
				if poke1+"\n" == lsname[i]:
			if i == len(lsnum):
				print poke1+" not found!"
			for j in range(0,len(lsnum)):
				if poke2+"\n" == lsname[j]:
			if j == len(lsnum):
				print poke2+" not found!"
			#ID event type
			e = f = -1
			if (event == "double down"):
				e = f = 2
			elif (event == "double switch"):
				e = f = 5
			elif (event == "no clue what happened"):
				e = f = 8
				poke = event[0:string.find(event," was")]
				event2 = event[len(poke)+5:len(event)]
				p = 1
				if poke1 == poke:
					p = 0
				elif poke2 != poke:
					print "Houston, we have a problem."
					print entry
				if (event2 == "KOed") or (event2 == "u-turn KOed"):
					e = p
					f = (p+1)%2
				elif (event2 == "switched out"):
					e = p+3
					f = (p+1)%2+3
				elif (event2 == "forced out"):
					e = p+6
					f = (p+1)%2+6
					print "Houston, we have a problem."
					print entry
				encounterMatrix[i][j][e] = encounterMatrix[i][j][e]+1
				encounterMatrix[j][i][f] = encounterMatrix[j][i][f]+1

	elif species[entry] == "***\n" or species[entry] == "@@@\n":
			if species[entry] == "***\n":
				trainerNextLine = True
				eventNextLine = True
			#decide whether to count the team or not
			#if you were going to compare the trainer name against a database,
			#you'd do it here.
			#if len(ctemp) == 6: #only count teams with all six pokemon
			for i in range(len(ctemp)):
				counter[ctemp[i]] = counter[ctemp[i]]+1.0 #rather than weighting equally, we
				turnCounter[ctemp[i]] = turnCounter[ctemp[i]]+turnt[i]
				if turnt[i] > 0:
					realCounter[ctemp[i]] = realCounter[ctemp[i]]+1.0
				#could use the trainer ratings db to weight these... 
			teamCount = teamCount+1
		stemp = species[entry][0:string.rfind(species[entry]," (")]+"\n"
		turns = eval(species[entry][string.rfind(species[entry]," (")+2:string.rfind(species[entry],")")])
		if stemp != "???\n":
			for i in range(0,len(lsnum)):
				if stemp == lsname[i]:
					found = True
			if not found:
				print stemp+" not found!"
total = sum(counter)
for i in range(len(lsnum)):
	if realCounter[i] > 0:
		turnCounter[i] = turnCounter[i]/realCounter[i]


pokes = []
for i in range(0,len(lsname)):
	for j in range(0,len(lsname)):
		pokes[i][3] = pokes[i][3] + encounterMatrix[i][j][1]+encounterMatrix[i][j][2]
	if pokes[i][2] > 0:
		pokes[i][3] = pokes[i][3]/pokes[i][2]

#for appearance-only form variations, we gotta manually correct (blegh)
for j in range(1,5):
	pokes[172][j] = pokes[172][j] + pokes[173][j] #spiky pichu
	for i in range(507,534):
		pokes[202][j] = pokes[202][j]+pokes[i][j] #unown
	pokes[352][j] = pokes[352][j] + pokes[553][j] + pokes[554][j] + pokes[555][j] #castform--if this is an issue, I will be EXTREMELY surprised
	pokes[413][j] = pokes[413][j] + pokes[551][j] + pokes[552][j] #burmy
	pokes[422][j] = pokes[422][j] + pokes[556][j]  #cherrim
	pokes[423][j] = pokes[423][j] + pokes[557][j] #shellos
	pokes[424][j] = pokes[424][j] + pokes[558][j] #gastrodon
	pokes[615][j] = pokes[615][j] + pokes[616][j] #basculin
	pokes[621][j] = pokes[621][j] + pokes[622][j] #darmanitan
	pokes[652][j] = pokes[652][j] + pokes[653][j] + pokes[654][j] + pokes[655][j] #deerling
	pokes[656][j] = pokes[656][j] + pokes[657][j] + pokes[658][j] + pokes[659][j] #sawsbuck
	pokes[721][j] = pokes[721][j] + pokes[722][j] #meloetta
	for i in range(507,534):
		pokes[i][j] = 0
	pokes[173][j] = pokes[553][j] = pokes[554][j] = pokes[555][j] = pokes[551][j] = pokes[552][j] = pokes[556][j] = pokes[557][j] = pokes[558][j] = pokes[616][j] = pokes[622][j] = pokes[653][j] = pokes[654][j] = pokes[655][j] = pokes[657][j] = pokes[658][j] = pokes[659][j] = pokes[722][j] = 0

#sort by usage
pokes=sorted(pokes, key=lambda pokes:-pokes[1])
print " Total battles: "+str(battleCount)
print " Total teams: "+str(teamCount)
print " Total pokemon: "+str(int(total))
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print " | Rank | Pokemon         | Usage  | KOs/b  | Turns/b| Percent | "
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
for i in range(0,len(pokes)):
	if pokes[i][1] == 0:
	print ' | %-4d | %-15s | %-6d | %6.3f | %6.3f | %6.3f%% |' % (l,pokes[i][0],pokes[i][1],pokes[i][3],pokes[i][4],100.0*pokes[i][1]/total*6.0)
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print "Sorted by KOs/battle"
pokes=sorted(pokes, key=lambda pokes:-pokes[3])
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print " | Rank | Pokemon         | Usage  | KOs/b  | Turns/b| Percent | "
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
for i in range(0,len(pokes)):
	if pokes[i][1] == 0:
	if (pokes[i][1] > 100) or (100.0*pokes[i][1]/total*6.0 > 1.0): #otherwise you get all sorts of silliness
		print ' | %-4d | %-15s | %-6d | %6.3f | %6.3f | %6.3f%% |' % (l,pokes[i][0],pokes[i][1],pokes[i][3],pokes[i][4],100.0*pokes[i][1]/total*6.0)
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print "Sorted by Turns in/battle"
pokes=sorted(pokes, key=lambda pokes:-pokes[4])
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print " | Rank | Pokemon         | Usage  | KOs/b  | Turns/b| Percent | "
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
for i in range(0,len(pokes)):
	if pokes[i][1] == 0:
	if (pokes[i][1] > 100) or (100.0*pokes[i][1]/total*6.0 > 1.0): #otherwise you get all sorts of silliness
		print ' | %-4d | %-15s | %-6d | %6.3f | %6.3f | %6.3f%% |' % (l,pokes[i][0],pokes[i][1],pokes[i][3],pokes[i][4],100.0*pokes[i][1]/total*6.0)

print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
#csv output
#for i in range(len(lsnum)):
#	if (counter[i] > 0):
#		print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"

Putting it all together, I wrote a bash script to compile stats for the entire month on my Linux computer.

The computer has multiple processor cores, so I did some parallelizing to make use of them.

File Structure:
  • sits in a folder with my two python scripts.
  • The month's battle logs are all in a folder called "2011-08".
  • In that folder are sub-folders for each day's logs (example: "2011-08-05").
  • Back in the main folder where the scripts sit, there are two empty folders, called "Raw" and "Usage". "Raw" will contain the lists of pokemon, while "Usage" will contain the stats.
rm Raw/*
rm Stats/*

maxjobs=6 #set to number of multiprocessors

for  i in 2011-08/* 
	for j in "$i"/*
		jobcnt=(`jobs -p`)
		while [ ${#jobcnt[@]} -ge $maxjobs ]
			jobcnt=(`jobs -p`)
		echo Processing $j
		python "$j" &

#serial version:
#	for j in "$i"/*
#	do
#		echo Processing $j
#		python "$j"
#	done


#stupid tier name changes--gotta consolidate...
cat "Raw/BW LC Rated.txt" >> "Raw/Standard LC Rated.txt"
cat "Raw/BW LC Unrated.txt" >> "Raw/Standard LC Unrated.txt"
cat "Raw/BW OU Rated.txt" >> "Raw/Standard OU Rated.txt"
cat "Raw/BW OU Unrated.txt" >> "Raw/Standard OU Unrated.txt"
cat "Raw/BW UU Rated.txt" >> "Raw/Standard UU Rated.txt"
cat "Raw/BW UU Unrated.txt" >> "Raw/Standard UU Unrated.txt"
cat "Raw/BW RU Rated.txt" >> "Raw/Standard RU Rated.txt"
cat "Raw/BW RU Unrated.txt" >> "Raw/Standard RU Unrated.txt"
cat "Raw/BW Uber Rated.txt" >> "Raw/Standard Ubers Rated.txt"
cat "Raw/BW Uber Unrated.txt" >> "Raw/Standard Ubers Unrated.txt"
rm "Raw/BW*.txt"

echo Compiling Stats...
for i in Raw/*; do python "$i" > "Stats/${i/Raw}" ; done
Miscellaneous Scripts

For a file in the "Raw" folder (list of pokemon used in battle), this script will generate a list of the number of pokemon used in each battle. This list can then be imported into an analysis program for easy binning and plotting.

python "Raw/[Tier].txt" > [output.dat]
Example plot:

import string
import sys

filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()


for entry in range(0,len(species)):
	if species[entry] == "---\n":
		print ppb
		ppb = 0
		ppb = ppb+1

Reads in a standard Smogon usage table and turns it into a csv sorted by species. Useful for comparing pokemon usage from month to month, or for doing statistics on multiple months.

python file.txt
import string
import sys

file = open("pokemons.txt")
pokelist = file.readlines()

lsname = []
for line in range(0,len(pokelist)):
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])-1])

filename = str(sys.argv[1])
file = open(filename)


counter = [0 for i in range(len(lsname))]
for i in range(5,len(table)):
	found = False
	while found == False:
		if table[i][j] != ' ':
			found = True		

	name = table[i][10:j+1]
	found = False
	for j in range(0,len(lsname)):
		if name == lsname[j]:
			counter[j]=eval(table[i][string.rfind(table[i],' ',0,40)+1:43])
			found = True
	if found == False:
		print name+" not found..."

for i in range(len(lsname)):
	print lsname[i][0:len(lsname[i])]+","+str(counter[i])
I modified the PO source code in a really simple way to get it to write player rankings to the console (which can, of course, be redirected to file) whenever I view them on PO (getting around the "can't copy/paste" issue). Unfortunately, you still need to navigate through each and every page of the rankings in order to get all the stats, and rapid-paging is flagged by the PO server as being "overactive," and it has a tendency to kick you. This means you have to log back in, RELOAD the page, and find where you left off. If you do this, you end up with a LOT of redundant entries. This script will remove the redundant entries.

python filename.txt
import string
import sys

filename = str(sys.argv[1])
file = open(filename)


while i < len(ranking):
	rank = int(ranking[i][0:str.find(ranking[i],'\t')])
	if rank < i+1:
		del ranking[i]
		if rank > i+1:
			print "You screwed up."

for line in ranking:
	print line[0:len(line)-1]
This script combines data from the previous three months, with weighting given by the ratio of 20-3-1. It needs the CSVs generated by as its inputs, although if I were less lazy, i could rewrite it to work with the standard usage tables.

python ThreeMonthsAgo.csv TwoMonthsAgo.csv LastMonth.csv
import csv
import string
import sys

may = []
jun = []
aug = []
maycsv = csv.reader(open(sys.argv[1], 'rb'), delimiter=',')
for line in maycsv:
juncsv = csv.reader(open(sys.argv[2], 'rb'), delimiter=',')
for line in juncsv:
augcsv = csv.reader(open(sys.argv[3], 'rb'), delimiter=',')
for line in augcsv:

counter = [0 for i in range(len(aug))]
for i in range(0,len(aug)):
	counter[i] = (1.0*may[i][1] + 3.0*jun[i][1] + 20.0*aug[i][1])/24.0
	#counter[i] = (3.0*jun[i][1] + 20.0*aug[i][1])/23.0

pokes = []
for i in range(0,len(aug)):
pokes=sorted(pokes, key=lambda pokes:-pokes[1])

print " + ---- + --------------- + ------ + ------- + "
print " | Rank | Pokemon         | Usage  | Percent | "
print " + ---- + --------------- + ------ + ------- + "
for i in range(0,len(pokes)):
	if pokes[i][1] == 0:
	print ' | %-4d | %-15s | %-6d | %6.3f%% |' % (i+1,pokes[i][0],0,pokes[i][1])
This code takes the usage tables and pulls out the list of pokemon that have a usage greater than 3.41%.

python filename.txt
import string
import sys

file = open("pokemons.txt")
pokelist = file.readlines()

lsname = []
for line in range(0,len(pokelist)):
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])-1])

filename = str(sys.argv[1])
file = open(filename)


counter = [0 for i in range(len(lsname))]
for i in range(6,len(table)):
	found = False
	while found == False:
		if table[i][j] != ' ':
			found = True		

	name = table[i][10:j+1]
	found = False
	for j in range(0,len(lsname)):
		if name == lsname[j]:
			counter[j]=eval(table[i][string.rfind(table[i],' ',0,40)+1:43])
			found = True
	if found == False:
		print name+" not found..."
outstring = ''
for i in range(0,len(lsname)):
	if counter[i] > 3.41:
		outstring = outstring+"\n"+lsname[i][0:len(lsname[i])]
print outstring
If you take the lists of pokemon in each tier from and put them in one file, you don't *quite* have a tier list yet, since pokemon that moved up a tier will be shown twice, and pokemon that moved down a tier will disappear altogether. So this program takes the previous tier list and the current "tier list" (as generated through and some fancy concatenation--you'll need to put in Ubers and BL yourself) and generates a NEW tier list, perfect for posting on forums. Note that the old and new tier list files that the program takes as inputs are NOT in the same format. I've included two sample files for reference.

python currentTiers.txt oldTiers.txt
Example of currentTiers.txt




Example of oldTiers.txt



Source (since I've got UBB stuff in here, you're going to want to look at the source--hit the "quote" button):
import string
import sys

#read in files
file = open("pokemons.txt")
pokelist = file.readlines()
curList = file.readlines() #current lists
oldList = file.readlines() #previous cycle's tiers

#parse files into tier lists
curUber = []
curOU = []
curBL = []
curUU = []
curRU = []

tn = 0
for line in curList:
	if line == '\n':
		tn = tn+1
	elif tn == 0:
	elif tn == 1:
	elif tn == 2:
	elif tn == 3:
	elif tn == 4:
		print "You screwed up, bub."
oldUber = []
oldOU = []
oldBL = []
oldUU = []

tn = 0
for line in oldList:
	if line == '\n':
		tn = tn+1
	elif tn == 0:
	elif tn == 1:
	elif tn == 2:
	elif tn == 3:
		print "You screwed up, bub."

tiers = []
for line in range(0,len(pokelist)):
	tn = 5
	name = pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])-1]

	#identify current tier
	for i in range(0,len(curUber)):
		if name == curUber[i][0:len(curUber[i])-1]:
			tn = 0
	if tn == 5:
		for i in range(0,len(curOU)):
			if name == curOU[i][0:len(curOU[i])-1]:
				tn = 1
	if tn == 5:
		for i in range(0,len(curBL)):
			if name == curBL[i][0:len(curBL[i])-1]:
				tn = 2
	if tn == 5:
		for i in range(0,len(curUU)):
			if name == curUU[i][0:len(curUU[i])-1]:
				tn = 3
	if tn == 5:
		for i in range(0,len(curRU)):
			if name == curRU[i][0:len(curRU[i])-1]:
				tn = 4

	#make sure the poke isn't "NU" because it fell down a tier
	if tn == 5:
		otn = 5
		for i in range(0,len(oldUber)):
			if name == oldUber[i][0:len(oldUber[i])-1]:
				otn = 0
		if otn == 5:
			for i in range(0,len(oldOU)):
				if name == oldOU[i][0:len(oldOU[i])-1]:
					otn = 1
		if tn == 5:
			for i in range(0,len(oldBL)):
				if name == oldBL[i][0:len(oldBL[i])-1]:
					otn = 2
		if tn == 5:
			for i in range(0,len(oldUU)):
				if name == oldUU[i][0:len(oldUU[i])-1]:
					otn = 3
		#no need to search RU
		if otn == 0:
			tn = 1 #not that I think anyone if coming off the Ubers list
		elif otn == 1:
			tn = 3 #OU to UU (we don't want 'em going straight to BL)
		elif otn == 2:
			tn = 3 #coming off the BL list. Put 'em back in UU
		elif otn == 3:
			tn = 4 #UU to RU
	#get name of tier
	if tn == 0:
		tier = 'Uber'
	elif tn == 1:
		tier = 'OU'
	elif tn == 2:
		tier = 'BL'
	elif tn == 3:
		tier = 'UU'
	elif tn == 4:
		tier = 'RU'
	elif tn == 5:
		tier = 'NU'


tiers=sorted(tiers, key=lambda tiers:tiers[0])
print '[B]Uber[/B]\n[CODE]'
print tiers[0][1]
for i in range(1,len(tiers)):
	if tiers[i][0] == 5:
	if tiers[i][0] > tiers[i-1][0]:
		print '
		print tiers[i][1]
print '
Smogon isn't really friendly to developers, isn't it?
do you have any idea on what to do with this stats?
That's what we've been discussing here.
i made a script that converts PO binary usage stats...

i made a script that converts PO binary usage stats...
do you have any specific knowledge of whether the Smogon server is still generating this data? Because if it is, with your help, I'll be able to parse it, and all the problems described in the thread above will vanish.
do you have any specific knowledge of whether the Smogon server is still generating this data? Because if it is, with your help, I'll be able to parse it, and all the problems described in the thread above will vanish.
I'm pretty sure it is, it's done by server plugin and since Smogon provides limited usage stats each month, I assume they collect it. You need too ask the new server administartor for that though.
I used Beta's stats since they are always available.

As for the script, here's the package (nevermind russian, just press the big black button).
It's my second python script (after Hello, world!), so it might be coded pretty poorly.
The idea is pretty simple: It converts PO's binary files directly into MyISAM files (this is the fastest way) and adds necessary files (db structure and Index file) from templates so it can be used by MySQL.
I'm pretty sure it is, it's done by server plugin and since Smogon provides limited usage stats each month
I'm pretty sure it is, it's done by server plugin and since Smogon provides limited usage stats each month
I remember having an issue like this, but wasn't it fixed?

Or so is my understanding...
I remember having an issue like this, but wasn't it fixed?
Your client version (1.0.30) doesn't match with the server's (1.0.23).
Oh, nevermind...
I guess we have to wait untill Smogon gets better with server handling.
While this is obviously pretty cool, I have one question: does this only take into account battles where 'Save Log' is on? Cause then the stats would be kinda off...
While this is obviously pretty cool, I have one question: does this only take into account battles where 'Save Log' is on? Cause then the stats would be kinda off...
I do not believe so. It's already been shown that the server and the client software produce slightly different battle logs (client version 1.0.30 gives the full teams, while no version of the server does so currently), so I really doubt the server is querying whether the users have opted to save their battle logs.

But the only way to be 100% sure would be do dig around in the PO source code, and--I'll be honest--I'm not going to be doing that.

