The Simplest PubSub Protocol

What do we need exactly?

You really only need a few write operations:

  • pub - publish message(s) to a channel
  • sub - subscribe to new channel(s)
  • unsub - cancel subscription(s)

and one of the following read operations:

  • receive_pub - read new message (you'll probably do this in a loop)
  • onpublish - callback function to receive published messages

Almost all of the messages will be of the pub variety.

The real win: moving communication away from the business logic layer into the DevOps layer.

Realistically, you can't actually ignore the details of the network, but you can at least minimize the effects and make it easier for everyone.

How to implement

Let's make some design decisions:

problems: reliability, cardinality (is it in the right order), partial messages, aggregated packets, slow lorises,

One problem with network messages is reliability, etc

Using a raw TCP socket would work, but then we have to do our own framing (short packets, long packets, a slow loris). Not a huge deal, but everyone has to agree.

UDP: could use this. then we also have an ordering issue. if the packets are large, they probably get flushed by the network. In fact, delivery isn't really guaranteed at all.

WebSockets: has framing built-in (although we probably want a way to split up large packets so they don't clog up the system) (small note, what about that bit framing thing? (one way) is that expensive?).

UTF8 vs binary mode: An attempt to side-step the problem of what-encoding-of-string-am-i (a common cross platform problem). JSON uses UTF8 by default, so we'll go with that right now, but JSON has its own problems (binary data must be encoded in a maybe-not-so-efficient format)

Maximum message size: probably a good idea (protect network against greedy client)

Other things to think about

What about retry? No retry! Why? Do you need it? Does the network really need to tell you "hey i got this message, but I can't guarantee the whomever's downstream listening will get it"

Guaranteed delivery: Is it worth it? And should this layer implement it? (Short answer: if it can be easily implmented on top, that's the way to go.)

Over the wire protocol: JSON. Ok, not the most efficient but probably the easiest to write tools for. We'll use JSON for v1.0 of the standard.

Pattern matching: Nice to have: Client should ideally be able to use pattern matching in subscribe commands.

Exchanges: allows multiple clients to multiplex listening on a channel so each gets their own messages.

what else: kill switch, backoff, bandwith limiting, lots of stuff.

PubSub patterns:

Fire and Forget: ok!

  • one-to-one: ok!
  • one-to-many: ok!
  • many-to-one: ok!

is it a queue? (it's an unreliable queue)
does it use exchanges? (can we do load balancing? Let's sidestep this in v1.0)

What if you don't have websockets?

  • implement them
  • use some kind of shim (like a long poller)
  • use SSL to fake out a raw socket (you need control of both sides of the connection to do this)

What else: Every protocol should have a simple way to determine "version". If not, you know this has been designed by someone that didn't know what they were doing. It should also be either at to (or close to) the beginning, so the consumer can quickly reject packets it doesn't understand.

So what do we call this thing? I'm going to go with Yotta PubSub Protocol v1.0 (Yotta means super big).

Let's put this into a design document. Not only is the basic protocol established, but

Now that we've established a baseline for communication, let's implement it!