February 7

Why we replaced SyncML with our own contact sync protocol

At connex.io we have created our own protocol to connect with different devices and services and syncronize their address books. In this post we want to outline why we chose to create our own syncing protocol, what the ideas behind our protocol are and introduce its main components.

SyncML is an open standard which allows, among other things, the synchronization of contacts. It is supported by many devices and services out of the box and therefore could have been an obvious choice for connex.io. We however were not too happy with SyncML and the problems it caused.

SyncML as a standard defines that only changes since the last sync are transmitted between the server and the client. This makes the implementation of client plugins challenging: There is a mismatch between what the server wants and the underlying storage on the client side. SyncML's requirement to transmit only the changes makes it necessary, that additional information needs to be stored for each contact which leads to the aforementioned complication.

As almost all address book APIs - be it on the web or on devices - expose a simple CRUD interface we decided to leverage these and incorporate the complexity of keeping track of changes on the server side.

The connex.io Contact Syncing Protocol (CCSP) is an abstract interface - a thin layer - that directly links our Sync Server with each device's or service's underlying contacts API which is being exposed through a plugin. All the magic, such as pulling changes, conflict resolution, merging, propagating updates, etc., happens on the server.

Our protocol is a RESTful web service and has the following properties and features:

  • Stateless request / response communication (RESTful)
  • No additional information is stored on the client side.
  • Data is serialized as Protocol Buffers and then sent via HTTPS POST.
  • CCSP plugins expose a client's CRUD functionality to the server.
  • Operations are batched and sent in chunks of 64 operations per request.

In the following we want to quickly introduce the different components of our system.

Sync Server - Tying things together

Our sync server is based on the Tornado Web Server and handles Sync and API requests. All the data is stored in a Cassandra database to allow for scalability.

One additional reason for rolling out our own protocol was flexibility in experimenting with new features. Many sync features are implemented as add-ons (extending functionality, not to be confused with the sync client plugins explained in more details in the next paragraph). By using Python decorators and modules we get them for free. There is no special interface for add-ons nor callbacks to handle events. Instead add-ons just decorate the original functions in our core. Despite the design being not the most elegant, it has worked great for us so far.

Sync Plugins - Supporting many different Devices and Services

Another main objective when designing our own protocol was the simplicity of implementing client plugins. The simplicity becomes apparent when looking at our Thunderbird plugin whose sync code is made up by a total of ~530 lines of JavaScript. The sync logic and transport layer are implemented in only 70 lines of code. The remaining 460 of these 530 lines are dedicated solely to field-mapping - converting Thunderbird's Contacts structure to connex.io's and backwards.

The simplicity of implementing the protocol for a new device was proven again when it took only a few hours to develop an initial vesion of a two-way sync application for iOS based devices.

De-duplication - Cleaning up

At connex.io we do not only sync address books with each other, we also improve the address books by aggregating the information from many sources into one address book. To do this in a clean way we need to deduplicate the imported data under less than ideal circumstances. But only that allows us to provide a usable address book to our users.

When an address book is first connected with connex.io all the contacts stored within it are indexed based on certain features. Contacts that have anything in common are grouped together. In a next steps all contacts in each of the groups are compared to each other one by one. At the moment a relatively simple decision tree decides if two records can be merged or not. But this decision tree will will soon be expanded or even replaced by a machine learning algorithm that will improve based on feedback.

We focus on maintaining a very high precision as false positives - wrongly merged contacts - are detrimental to the perceived quality of our address books. Carefully optimizing recall- merging missed duplicates - is the second priority when improving our algorithm's accuracy.

Work for or with connex.io

We are always looking for full-time hackers to join our team. If you believe you can help us, shoot an email to atamurad@connex.io and we talk.

We also plan to support as many devices and services as we can. You can help us getting there faster by developing plugins (Blackberry, Windows Phone 7, Symbian etc.) on a per-project basis for us. If you think you can help us, shoot an email to marcus@connex.io.

In a future post we will tell you more about our internal processes - how we develop, test and deploy our solution.

 

About Us

connex.io keeps your address book clean, complete and up-to-date. By merging contacts from your phone, email and social networks we effortlessly make them accessible for you - anytime, anywhere.

Sign Up Now Learn More

Most popular

How it all began

Why we replaced SyncML

Introducing CyPB

Tags

working at a startup (8)

marketing (6)

address book (5)

connex.io (5)

startup (5)

View all 87 tags »

Archive

2011 (37)
Subscribe via RSS