February 1
Introducing CyPB - Improving the performance of Protocol Buffers under Python
At connex.io we use Google's Protocol Buffers quite heavily. An analysis of our system showed that the Python implementation of Protocol Buffers was a bottleneck. There was no solution available back in December so we started implementing a faster Python version ourselves.
The goal: Making it available to everyone.
A first version was announced to the Protocol Buffers mailing list on January 7th. We received some valuable feedback that we since have incorporated. Just on the verge of releasing it, Greplin "rained" on our parade by launching their own fast implementation called fast-python-pb. We could have leaned back and seen how this played out, yet we were still convinced that our own implementation, CyPB, is of value to the public. We therefore sat down and worked hard to make it useable by others.
So here it is: CyPB is a fast and lightweight Protocol Buffers decoder for Python that we release to the public under the new BSD license today.
Key Features are:
- Speed. Full decoder is twice as fast as Google's new C++ wrapper module in PB 2.4
- Lazy Decoding support. Message are decoded on the fly as attributes are accessed for the first time. This results in a sizeable performance gain (~100x in some cases) if only some parts of a message are accessed.
- No additional dependencies and libraries. The automatically generated C code compiles to a Python module.
Benchmark
In benchmarking against other implementations, we only used the full decoding method (foregoing potential performance boosts by using the lazy decoding support). To measure the performance the same message was decoded 5,000 times. A total of ten runs were carried out on a MacBook Pro 2.66Ghz Intel Core 2 Duo (Python 2.6.1, GCC 4.2.1). The averaged results for each implementation are as follows:
- Google's Python module (2.3): 0.711271 seconds
- Google's C++ implementation for Python (2.4): 0.093410 seconds
- CyPB: 0.041584 seconds
The Way Forward
Currently we deem CyPB to be at an Alpha stage - the API is very likely to change in the future. In the near future we want to work on:
- Support all data types (doubles, bytes, etc.)
- Handle unknown fields properly based on wire-type
- Throw exception if invalid attribute is accessed
- Merging lazy decoding and full decoding into one API
- Encoder
Any contributions and feedback are highly welcome and greatly appreciated.
We are also looking for talented developers. If you are a Python hacker take a look at our latest job post.
About Us
connex.io keeps your address book clean, complete and up-to-date. By merging contacts from your phone, email and social networks we effortlessly make them accessible for you - anytime, anywhere.
Sign Up Now Learn MoreMost popular
Tags
marketing (6)
address book (5)
connex.io (5)
startup (5)
