I'm happy to answer any questions - this is based on the QuickDT library (see ht...

bockris · on Dec 7, 2011

Followup: I see now that 'city' is in and 'region' is out.

Did you do that or did it figure it out on it's own.

Is there anyway to see the full decision tree without re-voting? I've accidentily refreshed a couple of times so my extra votes are skewing the tree but I'm interested in seeing how the tree evolves.

sllrpr · on Dec 7, 2011

It figures it out on its own.

You can see the entire tree at http://cord-sa.appspot.com/dumpDT

bockris · on Dec 7, 2011

Nice!

How big is the pool of variables it can choose from?

Any insight on why it choose 'region' and 'browserHeight' first?

sllrpr · on Dec 7, 2011

Ugh, I thought I'd answered this but looks like HN lost my response :-(

> How big is the pool of variables it can choose from?

Currently it is just: screenWidth/Height, browserWidth/Height, browser, browserVersion, os, referrer, city, region, country.

It picks the variables based on how well they partition the datasets into diverse sub-groups. This is the code that does that job:

https://github.com/sanity/quickdt/blob/master/src/main/java/...

bockris · on Dec 7, 2011

It guessed wrong for me but if the 'ExampleCount' value is accurate, you are dealing with a paucity of data.

Will it look for more decision nodes to add or are you stuck with region and browserHeight?

If not, how certain are you that you have the right data to accurately model the problem?

sllrpr · on Dec 7, 2011

> It guessed wrong for me but if the 'ExampleCount' value is accurate, you are dealing with a paucity of data.

Yes, you guys are the guinea pigs, sorry about that :-) We'll need a lot more data before it is really going to get interesting.

> Will it look for more decision nodes to add or are you stuck with region and browserHeight?

There are quite a few more, I pasted them in another comment. You can also see them if you "view source" on that page.

> If not, how certain are you that you have the right data to accurately model the problem?

I'm not, the fun of this is seeing how well people's favorite pets correlate with information their browsers give up automatically.