Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Speed tests for *json and cPickle in Python (j2labs.tumblr.com)
40 points by j2d2j2d2 on April 1, 2011 | hide | past | favorite | 21 comments


And because caching systems are typically a means for speeding up reads, where the article only focuses on writes.

Python 2.5 -- including pickle's (faster) bin mode

    Starting simplejson dumps...
    done: 31.7636611462
    Starting cjson encode...
    done: 3.80062890053
    Starting pickle dumps...
    done: 10.4873199463
    Starting pickle dumps (protocol=-1 -- force HIGHEST_PROTOCOL)...
    done: 5.98110699654
    Starting simplejson loads...
    done: 138.709590197
    Starting cjson decode...
    done: 1.85300803185
    Starting pickle loads...
    done: 5.24735999107
    Starting pickle loads (protocol=-1 -- force HIGHEST_PROTOCOL)...
    done: 3.66428518295


Great point, thanks.


He doesn't use the cPickle binary format, which is significantly faster: http://docs.python.org/library/pickle.html#pickle.HIGHEST_PR...


Thanks for pointing this out.

I am going to do some follow up tests to cover the advice I got here. I will include this, tnetstrings and decodes too.


Any interests in also including Google's Protocol Buffers and Facebook's Thrift since it would appear your intentions are to use these blobs in an RPC?


Python 2.7 on Ubuntu:

  $ python -mtimeit -s "from json import dumps; 
  d = {
      'foo': 'bar',
      'food': 'barf',
      'good': 'bars',
      'dood': 'wheres your car?',
      'wheres your car': 'dude?',
  }
  " "dumps(d)"
  100000 loops, best of 3: 6.89 usec per loop

  $ python -mtimeit -s "from cPickle import dumps; 
  d = {
      'foo': 'bar',
      'food': 'barf',
      'good': 'bars',
      'dood': 'wheres your car?',
      'wheres your car': 'dude?',
  }
  " "dumps(d)"
  100000 loops, best of 3: 7.78 usec per loop
So `json` seems faster than `cPickle`. Right? Wrong!:

  $ python -mtimeit -s "from cPickle import dumps; 
  d = {
      'foo': 'bar',
      'food': 'barf',
      'good': 'bars',
      'dood': 'wheres your car?',
      'wheres your car': 'dude?',
  }
  " "dumps(d, -1)"
  100000 loops, best of 3: 3.59 usec per loop


What does the "-1" dump argument mean in 3rd example?


The equivalent of using pickle.HIGHEST_PROTOCOL or pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL) -- any negative value will force highest protocol.


It says to use the highest optimization level.


No. It says use the highest available protocol.

The highest available protocol is not necessarily the fastest. There might be other than pure speed reasons to introduce the next Pickle protocol.


Apple and oranges (even though they're mostly used for the same thing). pickle is a Turing complete language used to recreate python objects, while json is only used for serialization. pickle is slightly more powerful, but should also not be used to load untrusted data. See http://nadiana.com/python-pickle-insecure


We were just talking about this a few weeks back, with a few more libraries compared with interesting results: https://convore.com/python/faster-json-library/


demjson looks crazy fast.


I'm confused, is cjson still recommended? Is it still maintained? Its status seems ambiguous based on the PyPi discussion. Furthermore it seems there's a number of issues with cjson that need to be resolved...


When I tested cjson (~6 months ago?) it had bugs with escaping - strings containing quotes and/or backslashes would be encoded/decoded incorrectly. Not recommended, unless there have been fixes since then.


Based on the PyPi page for cjson, the comments seem to indicate many problems still persist, notwithstanding new updates that have yet to be pushed


My understanding is that cjson is not recommended or maintained.


Maybe our information is out of date, but here's what we learned:

https://github.com/jsonpickle/jsonpickle/commit/0b97652e4102...

jsonpickle: Remove cjson support

"First, please don't use cjson for anything. It's got multiple bugs and misfeatures, and is generally unsuited for anything except impressive benchmarks. It was easier to write my own library, from scratch, than try to fix cjson."

-- John Millikin, author of jsonlib



sounds like the kind of project a decent coder could make big improvements on fast


insane




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: