Why not REST?

When writing a README for Chatterbox (my attempt at a new federated chat protocol), I wanted to explain all of my design choices. One that quickly gave me trouble was the API’s format: it uses Protobuf over a single WebSocket connection, instead of a REST API. Why did I do it this way?

The obvious answer is that it’s just interesting to me. I have a lot of trouble choosing boring technology, because I have to keep the project interesting enough for me to stick to it. I reimplemented something like gRPC on top of WebSockets as SockRPC, in order to ensure browser compatibility and avoid the HTTP/2 requirement, and, while it’s not terribly complicated, it’s still a barrier to understanding the protocol, so I need a reason why it’s necessary.

But, as I tried to give a better explanation, every reason I could find rang hollow:

  • Performance? This is the main one, and WebSockets+Protobuf are almost certainly faster than REST+JSON, but it’s not like I’ve actually profiled anything. How much of a difference is it? Is it enough to matter?

  • Simplicity of sessions and pubkey challenge auth? Chatterbox uses public key authentication, and one websocket connection with one auth challenge at the beginning is simpler. But it’s not that hard to have a REST auth endpoint that generates a JWT after completing a challenge.

  • Security? Several open chat and social media protocols, including Matrix, have made the mistake of leaving their uploaded media URLs open to the world, no auth required. Streaming everything over websocket makes this impossible by design. But is this much added complexity (streaming media over websocket in chunks!) really necessary to prevent a simple mistake that could be mentioned in the spec?

  • Succinct documentation? In theory, the entire Chatterbox protocol can be defined in one .proto file, using gRPC’s service definition syntax (since SockRPC reimplements gRPC). But that’s not quite true: there’s SockRPC itself, as well as out-of-band alternate authentication methods like OAuth. With a basic REST API, I could use Swagger, which actually has widespread tooling support.

  • Binary data? The protocol uses binary public keys and UUIDv7s everywhere, not to mention timestamps, and by using Protobuf I’ve been able to avoid committing to a specific string representation of any of these. With JSON (or REST URLs, even without JSON) I’d have to represent all of these as strings.

None of these seems like a big enough deal to justify adding more implementation barriers in the form of a new RPC protocol that will need to be ported to each new language that implements a Chatterbox client or server.

So what’s standing in the way of me making it a boring old REST API?

Event streaming, mostly. Chatterbox, as currently defined, has 3 types of streams: user event streams (one per client), server event streams (one per server the user has joined), and room event streams (one per room, but only one should be open at a time). These will have to be implemented with either WebSockets or Server-Sent Events.

SSE is the simpler approach: just give each type of event its own URL. But, surprisingly, SSE has a limit on the number of open connections, so this won’t work reliably unless I enforce HTTP/2 (at which point why not just use gRPC?).

So we’re back to WebSockets, now as part of the API instead of the whole thing. And this still makes it inelegant: WebSockets can’t be described by Swagger, for example. Certainly it would be much simpler to describe and implement a WebSocket connection that does simple pub/sub on event streams using JSON instead of a full RPC protocol with error handling and 3 types of streaming calls.

There are a bunch of places to land in this design space and I’m unsure what’s worth prioritizing. Do I use REST with WebSockets just for events, but use Protobuf or CBOR instead of JSON (and accept that that prevents me from desribing the API with Swagger)? Maybe I should do REST+JSON for everything but the WebSockets, which can use a binary format? Should I stop trying to force something clever and just make it all JSON?

If you have thoughts or feedback, you can reply on Mastodon.