Speed tests for *json and cPickle in Python

thezilch · on April 1, 2011

And because caching systems are typically a means for speeding up reads, where the article only focuses on writes.

Python 2.5 -- including pickle's (faster) bin mode

    Starting simplejson dumps...
    done: 31.7636611462
    Starting cjson encode...
    done: 3.80062890053
    Starting pickle dumps...
    done: 10.4873199463
    Starting pickle dumps (protocol=-1 -- force HIGHEST_PROTOCOL)...
    done: 5.98110699654
    Starting simplejson loads...
    done: 138.709590197
    Starting cjson decode...
    done: 1.85300803185
    Starting pickle loads...
    done: 5.24735999107
    Starting pickle loads (protocol=-1 -- force HIGHEST_PROTOCOL)...
    done: 3.66428518295

j2d2j2d2 · on April 1, 2011

Great point, thanks.

spenrose · on April 1, 2011

He doesn't use the cPickle binary format, which is significantly faster: http://docs.python.org/library/pickle.html#pickle.HIGHEST_PR...

j2d2j2d2 · on April 1, 2011

Thanks for pointing this out.

I am going to do some follow up tests to cover the advice I got here. I will include this, tnetstrings and decodes too.

thezilch · on April 1, 2011

Any interests in also including Google's Protocol Buffers and Facebook's Thrift since it would appear your intentions are to use these blobs in an RPC?

d0mine · on April 1, 2011

Python 2.7 on Ubuntu:

  $ python -mtimeit -s "from json import dumps; 
  d = {
      'foo': 'bar',
      'food': 'barf',
      'good': 'bars',
      'dood': 'wheres your car?',
      'wheres your car': 'dude?',
  }
  " "dumps(d)"
  100000 loops, best of 3: 6.89 usec per loop

  $ python -mtimeit -s "from cPickle import dumps; 
  d = {
      'foo': 'bar',
      'food': 'barf',
      'good': 'bars',
      'dood': 'wheres your car?',
      'wheres your car': 'dude?',
  }
  " "dumps(d)"
  100000 loops, best of 3: 7.78 usec per loop

So `json` seems faster than `cPickle`. Right? Wrong!:

  $ python -mtimeit -s "from cPickle import dumps; 
  d = {
      'foo': 'bar',
      'food': 'barf',
      'good': 'bars',
      'dood': 'wheres your car?',
      'wheres your car': 'dude?',
  }
  " "dumps(d, -1)"
  100000 loops, best of 3: 3.59 usec per loop

trafficlight · on April 1, 2011

What does the "-1" dump argument mean in 3rd example?

thezilch · on April 1, 2011

The equivalent of using pickle.HIGHEST_PROTOCOL or pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL) -- any negative value will force highest protocol.

kingkilr · on April 1, 2011

It says to use the highest optimization level.

d0mine · on April 1, 2011

No. It says use the highest available protocol.

The highest available protocol is not necessarily the fastest. There might be other than pure speed reasons to introduce the next Pickle protocol.

michaelfairley · on April 2, 2011

Apple and oranges (even though they're mostly used for the same thing). pickle is a Turing complete language used to recreate python objects, while json is only used for serialization. pickle is slightly more powerful, but should also not be used to load untrusted data. See http://nadiana.com/python-pickle-insecure

ericflo · on April 1, 2011

We were just talking about this a few weeks back, with a few more libraries compared with interesting results: https://convore.com/python/faster-json-library/

j2d2j2d2 · on April 1, 2011

demjson looks crazy fast.

llambda · on April 1, 2011

I'm confused, is cjson still recommended? Is it still maintained? Its status seems ambiguous based on the PyPi discussion. Furthermore it seems there's a number of issues with cjson that need to be resolved...

bdarnell · on April 1, 2011

When I tested cjson (~6 months ago?) it had bugs with escaping - strings containing quotes and/or backslashes would be encoded/decoded incorrectly. Not recommended, unless there have been fixes since then.

llambda · on April 1, 2011

Based on the PyPi page for cjson, the comments seem to indicate many problems still persist, notwithstanding new updates that have yet to be pushed

ericflo · on April 1, 2011

My understanding is that cjson is not recommended or maintained.

davvid · on April 1, 2011

Maybe our information is out of date, but here's what we learned:

https://github.com/jsonpickle/jsonpickle/commit/0b97652e4102...

jsonpickle: Remove cjson support

"First, please don't use cjson for anything. It's got multiple bugs and misfeatures, and is generally unsuited for anything except impressive benchmarks. It was easier to write my own library, from scratch, than try to fix cjson."

-- John Millikin, author of jsonlib

davvid · on April 1, 2011

Here are some more relevant links:

http://news.ycombinator.com/item?id=529104

http://metaoptimize.com/blog/2009/03/22/fast-deserialization...

jchrisa · on April 1, 2011

sounds like the kind of project a decent coder could make big improvements on fast

antlong · on April 1, 2011

insane