Wolfram Alpha’s New Data about Pokémon

carlob · on Oct 17, 2013

I think it's pretty cool that you can actually compute stuff about the data you extract from these queries

https://www.wolframalpha.com/input/?i=linear+regression+of+h...

_frog · on Oct 17, 2013

It would be extremely cool if they'd expose information such as what moveset each Pokémon has available to them so I could make queries like 'what water type Pokémon that can learn Hydro Pump has the highest Special Attack stat'.

mappu · on Oct 17, 2013

Maybe consider scraping bulbapedia into an SQL database? :)

    -- i'm dreaming
    SELECT pokemon.name
    FROM
      pokemon
      JOIN pokemon_learn_moves ON pokemon.id = pokemon_learn_moves.pokemon_id
      JOIN move ON pokemon_learn_moves.move_id = move.id
    WHERE
      move.name_en = "Hydro Pump"
    ORDER BY pokemon.sp_atk DESC LIMIT 1;

Actually, from this perspective there's a lot of boilerplate, no wonder people like key-value stores... Then use python-nltk to make an english wrapper (and say goodbye to your free time for the next month!)

gonnakillme · on Oct 17, 2013

The veekun pokedex[1] is on github[2] and they have a crapload of csv[3] files available.

[1]: http://veekun.com/dex [2]: https://github.com/veekun [3]: https://github.com/veekun/pokedex/tree/master/pokedex/data/c...

mappu · on Oct 17, 2013

That's excellent, thank you for pointing those out! Renders the whole discussion somewhat moot.

klodolph · on Oct 17, 2013

> no wonder people like key-value stores

You're doing a relational query: select, join, project. Translate it to a key-value store and you're going to end up with way more boilerplate. Using a key-value store for a relational query is like trying to use a screwdriver to drive a nail.

mappu · on Oct 17, 2013

It definitely suits a relational query.

But assuming you have indexes `pokemon_learn_moves_by_move_id` and `moves_by_name`, you could write

    pokemon_learn_moves_by_move_id[ moves_by_name["Hydro Pump"].id ]
        .map( |id| [pokemon[id].sp_atk, pokemon[id].name] )
        .greatest()[1];

(using a hypothetical system) which i think is a bit less boilerplate. Although, now i've explicitly written an execution plan rather than letting the SQL engine decide (i had `sort()[0]` there instead of `greatest()` for a while, which i think sums up why the SQL approach is generally better).

_frog · on Oct 17, 2013

I've thought about doing this before actually, don't know how they'd feel about someone scraping their content though.

mappu · on Oct 17, 2013

It's not like bulbapedia own the fundamental content - do they place a public license on the wiki pages?

Ideally bulbapedia would provide mediawiki dumps for this, but they don't, and they've gone on record saying they don't intend to. They did leave the mediawiki API open though if you want to crawl a clean rip of each page's wikitext - the default mediawiki API guidelines are also intact, which say that single-threaded crawls should be acceptable in almost all instances, but you should warn the site owner before initiating a multi-threaded scrape.

_frog · on Oct 17, 2013

It looks like all of Bulbapedia's content is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike license. http://bulbapedia.bulbagarden.net/wiki/Bulbapedia:Copyrights

kozlovsky · on Oct 17, 2013

Actually, it is not always necessary to write a lot of boilerplate code when using a SQL database. For example, with Pony ORM the same SQL query could be written as following:

    select(p.name for p in pokemon
           if "Hydro Pump" in p.learn_moves.move.en_name
           ).order_by("p.sp_atk").first()

arcatek · on Oct 17, 2013

Somewhat related, I made a prolog library to make this kind of query few months ago. I didn't add the moves (yet), but learning Prolog with this kind of project is pretty fun !

https://github.com/arcanis/trivia.pokemon-prolog

officemonkey · on Oct 16, 2013

If only "pikachu versus bulbasaur" suggested the best strategy for each. :-D

_frog · on Oct 17, 2013

I'm going to say Bulbasaur has the edge here. Pikachu has a pretty decent speed stat, but it doesn't have access to many worthwhile moves. Let's talk about the build you'd use to pretty decisively beat a Pikachu.

Pikachu's common passive ability 'Static' means that it can potentially inflict paralysis when hit with physical moves. On top of that, Bulbasaur has a fairly high Special Attack stat, so it'd be a good idea for it to use Special moves rather than Physical. On top of that Pokémon get a Same Type Attack Bonus (STAB) for using moves that match their type, so we want a non-physical Grass type move. 'Leaf Storm' is probably the best move we could use in this case. To get a Bulbasaur with that move, you have to do some breeding, but it's totally possible.

Now we'd also want to take advantage of Bulbasaur's own common ability, 'Overgrow', which boosts Grass type moves' effectiveness by 50% when it goes below 1/3 of its HP which, combined with our STAB means Grass type moves are now 200% as effective. Of course once we get our HP that low we're going to want to avoid taking much more damage. Using Bulbasaur's 'Sleep Powder' move at this point will cause Pikachu to fall asleep, preventing any further attacks.

So we've got our basic strategy, get down to 1/3 of our HP, put Pikachu to sleep and then wail on it with Leaf Storm. While we're letting our HP drop we could use something like 'Swords Dance' to boost our attack stat as well.

omnipath · on Oct 17, 2013

Bulbasaur may have an edge, type wise, but I'd put money on Pikachu actually winning. Pikachu with Light Screen, Thunder Wave/Knock Off, Protect/Wish, and Hidden Power (Ice) with Light Ball, would completely take on a Bulbasaur. http://bulbapedia.bulbagarden.net/wiki/Pikachu_(Pok%C3%A9mon...

Bulbasaur would need Toxic/Sleep Powder, Growth, Energy Ball/Petal Dance/Giga Drain, and if you must, Hidden Power (Ground) with leftovers. Would NOT suggest using Leaf Storm unless you know you could knock out the opposing pokemon within two hits, even with the two stage lowering of the Special Attack. http://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9m...

The problem with your strategy is that it doesn't take into account crit attacks. Bulbasaur could get a critical hit on it and get all the way down to around 30% health left. The problem what that is that Bulbasaur is MUCH slower than Pikachu, so by the time that round is over, you don't have enough health to survive another neutral hit. The only way I could see Bulbasaur winning is with crit hits, or with just stalling out Pikachu (by changing growth to protect, hidden power ground to leech seed, and going with Giga Drain, you would have a really great chance to stall out Pikachu, if Pikachu doesn't 3HKO Bulbasaur), and hope he didn't have hidden power ice.

(If you're not using physical attacks, why waste a slot on Sword Dance, a move that increases the Physical attack stat??)

_frog · on Oct 17, 2013

Yep I've been totally outclassed. I've only recently started getting into the online aspect of Pokémon so I still have a ton to learn with regards to strategy. Though does the speed stat really matter all that much outside of first mover advantage?

omnipath · on Oct 17, 2013

Yes. Most pokemon battles, unless they're designed to stall, or there are a lot of misses, pokemon as a rule usually faint within two to three hits. Unless Pikachu's stats are totally a bad fit for a special attacker, and Bulbasaur's stats are totally for special walling, chances are even with attacking with Thunderbolt, Pikachu could take down Bulbasaur within 4 rounds. With Hidden Ice, holding the Light Ball, probably 2 rounds.

Now, let's say we gave Bulbasaur all the extra ev training into special attacking. It would still take about two hits to knock out Pikachu. Problem is, since Pikachu is faster, it'll get in two hits before Bulbasaur does. This is why both Pikachu and Bulbasaur are considered LC (Little Cup) or NU (Never Used), as they have a lot of problems dealing with a lot of different pokemons that occur in battling.

May I suggest you try a website called Pokemon Showdown? It's an online simulator which allows you to try out different strategies, and test them against other people. http://play.pokemonshowdown.com/

joshschreuder · on Oct 17, 2013

Great analysis, looks like someone is a Pokemon master.

recuter · on Oct 17, 2013

Umm, that's a very unbalanced matchup, bulbasaur wins every time. Water > Electricity. If I recall, even a Raichu would lose to bulbasaur.

Probably shouldn't be posting this. :P

mtinkerhess · on Oct 17, 2013

But Bulbasaur is grass / poison type. Electric is not very effective against grass, so Bulbasaur will have an advantage unless Pikachu avoids using its electric attacks.

aspensmonster · on Oct 17, 2013

IIRC, neither one has an advantage or disadvantage against the other. It'd actually be an even match!

Edit: According to http://pokemondb.net/type , when Grass is attacking Electric, the attack is "normal," 100% of the damage takes. However, when Electric is attacking Grass, the attack is "not very effective," and only 50% of the damage takes.

So Bulbasaur is totally going to win this.

joyeuse6701 · on Oct 17, 2013

May have meant squirtle

joshschreuder · on Oct 17, 2013

Bulbasaur is a grass type Pokemon!

gosukiwi · on Oct 17, 2013

Electric > Water

Earth & Rock > Electric

Bulbasour its Grass though :p Electric is not very effective against Grass. /endOfPokemonNerdComment

solox3 · on Oct 17, 2013

A level 100 Raichu can kill a level 5 Bulbasaur using one Body Slam.

I wonder what Wolfram has to say about this...

NicoJuicy · on Oct 16, 2013

Always liked WolframAlpha, this is also awesome.

But what i really like, is to match stuff like protein in bananas vs spinach to know what i'm going to (prefer to) eat soon ^^..

So, thanks!

scott_karana · on Oct 17, 2013

Wow, I didn't realize it had food. Good call! :)

NicoJuicy · on Oct 17, 2013

Yeah, no problem, just shared some awesomesauce

aestra · on Oct 17, 2013

How come when I search "food by protein" I only get results from Wendy's?

joshfraser · on Oct 17, 2013

The technology behind WolframAlpha is truly incredible. It's likely one of the most unvalued resources of our day.

jnazario · on Oct 17, 2013

yes and no. it's a tremendous amount of work to organize information and build associations between what you ask and what you get, although it certainly has a lot of gaps. it's also a neat way to look at how to combine information.

that said it's within grasp of many of us: natural language interfaces (NLI) and SPARQL databases and endpoints. have a look at this semanticweb q&a:

http://answers.semanticweb.com/questions/12747/natural-langu...

some good links in there. basically find your SPARQL endpoints, have a list of synonyms mapped between your inputs (which you parse with NLP tools like weka or the stanford parser, or even python's nltk) and map your query to your ontologie(s) from your endpoints. then try successive answers.

a good, simple interface to play around with that is quepy:

http://quepy.machinalis.com/

a few others exist.

hope that helps. despite challenges in the adoption rates of the semantic web, i think it's the future of information retrieval because it makes sense for us as users and truly organizes information.

taliesinb · on Oct 17, 2013

Some of our stuff is simple database lookup (ala Google knowledge graph), other stuff is more algorithmic and computational in nature.

The problem we've had with SPARQL and co is that we feel it isn't optimized for computational queries. Ontologies don't matter as much in that case, and inference in tuple stores costs you significantly in performance, although the technology is improving.

As often as not, however, the computationally irreducible work lies in making a domain suitable for computational consumption, not in the technology used for representation.

To analogize, UTF-8 is great, but without the notion of Unicode code points it wouldn't exist.

luikore · on Oct 17, 2013

Do you have any reference of wolframalpha using SPARQL? I don't think they are using similar things.

Helianthus · on Oct 17, 2013

... It's a parlor trick.

The computational complexity going on is minimal compared to the effort it takes to create and maintain the dataset. Accessing it with a formal query language is then straightforward.

The parlor trick is making the formal query language seem as informal and colloquial as possible. But fiddle with breaking it and you'll see you're just running searches on very specific and very specifically tagged data.

Edit: Having thought about it more, the reason it's 'underutilied' is that it's actually not that useful. Your knowledge of its dataset is more important than its ability to provide it for you--witness the person below who knew that it could provide nutrition information.

Your index on its information is better than its index, in other words.

ecto · on Oct 16, 2013

How would one generate this equation from an image? http://www.wolframalpha.com/input/?i=snorlax+plane+curve

zalzane · on Oct 16, 2013

I've never done it, but here's how I would do it off the top of my head.

-Vectorize the image into a set of bezier curves that have endpoints at intersections

-Walk the bezier curves so that the entire image can be represented by a single "strip" of piecewise bezier sections. If there's any overlapping sections from having to rewalk the a curve, maybe apply some offsets/small modifications to those curves to give it a "sketched" look.

-Rasterize the bezier curves in order at your desired resolution into a list of XY coordinates. Make sure the XY coordinates remain in order they were rasterized.

-Split the coordinates of each rasterized pixel so that you have two lists, (t, x) and (t, y), where t is the index of the rasterized pixel. Now you can represent the X and Y coordinates of your sketch on two coordinate planes as a function of t.

-For each coordinate plane X and Y, record the index where the second derivative of the rasterized graph changes. What this does is lets you distinguish subcurves on the line. For this to work it's probably important that the rasterization was done at a relatively high resolution.

-For each subcurve, generate an extremely low frequency trigonometric function that can represent that subcurve. Ideally outside of that subcurve, the trig function should be at or very near zero. This might require layering some additional trig functions in order to eliminate any noise.

-With the trig functions generated, return the results as a parametric function.

pkaye · on Oct 16, 2013

The equation is provided on the same page. It is using the Heaviside step function which allows you to piecewise define the curves and add them all together.

unknownian · on Oct 17, 2013

I tried Xerneas (new legendary) but it is still in the process of researching. Also, for those who haven't played pokemon in a while, it's much more than just type match-ups, no matter what the anime leads you to think. There is much more strategy involved as it is a somewhat detailed RPG.

greyfox · on Oct 17, 2013

pwnt~ http://www.wolframalpha.com/input/?i=porygon-like+curve

tharshan09 · on Oct 17, 2013

This is pretty neat. Anyone have any ideas about how they obtained such complete data? Is it really from scraping the other sites mentioned in the threads?

bitemix · on Oct 17, 2013

If they did this for League of Legends characters and build calculations, I'd never leave the site.

KnightHawk3 · on Oct 17, 2013

That is actually a really good idea for a website.

I should attempt it as a hobby sometime.

bdz · on Oct 17, 2013

And Dota2 as well! (:

Derikulous · on Oct 17, 2013

only to pwn some noobs

jwr · on Oct 17, 2013

I'd be much more interested in WolframAlpha merging data about StarCraft II.

There is a lot of computing going on there, DPS (Damage Per Second) possibly modified by upgrades and enemy type, etc. This data could be genuinely useful to the SC community.

benvds · on Oct 17, 2013

And when including timings you could easily calculate and optimize build orders.

hayksaakian · on Oct 17, 2013

Does this mean that Siri can now answer questions about pokemon?

cbhl · on Oct 17, 2013

Ironically, the new dataset doesn't seem to contain any of the Pokémon introduced in X or Y (yet).

cmiller1 · on Oct 17, 2013

I entered in "(the pokedex number of geodude)*5"

Expected result: 370 Result: 74

hiddensanctum · on Oct 17, 2013

All I have to say is that is awesome

thealphanerd · on Oct 16, 2013

I love the pikachu-like curve

Cookingboy · on Oct 17, 2013

I just bursted out laughing imagining people saying that as a pickup line.

pyrocat · on Oct 16, 2013

That's... cool I guess? There are a bunch of other sites that already do this though. Serebii, bulbabedia, veekun, pokemondb .etc

scott_karana · on Oct 17, 2013

Those other sites allow you to plot distribution and find sets of specific numerical characteristics that easily? I only remember seeing generic statistic listings.

omnipath · on Oct 17, 2013

http://veekun.com/dex/pokemon/search does all that Alfram is claiming to do, other than the grouping, which to be honest, I'm not sure how useful that'll be, other than statical notekeeping.

jseip · on Oct 17, 2013

New pickup line: I maintain the Pokemon database for Wolfram Alpha #chicksdignerds #notreally #hopeforIPOmoney