Matrix: Decentralized, Federated Chat

2021-06-21 10 min read Cool stuff Privacy Tech Teknikal_Domain Unable to load comment count

Do you like secure chat apps? but actually secure, not like Telegram? And end-to-end encrypted, if selected? And ones that support sending media, and files, and even voice and video calls? And completely decentralized meaning you don’t need to rely on any one company or any one third-party server?

Well do I have a deal for you: Matrix.

Okay I’m not being completely accurate, Matrix isn’t a chat protocol. Matrix is a generalized communications protocol, that at the low level is just a secured way of federating JSON objects. It’s almost completely up to the clients to interpret those events. Heck, one client can be IM and another can be basically Twitter. That actually exists as a proof of concept right now, the client is called Cerulean. Clients make and send events, servers communicate events with other servers more or less blindly, and then clients receive those events.

For the buzzwords here, Matrix is federated meaning that multiple instances will sync (“federate”) with each other without any formal agreements or pre-arranged configuration, and it’s decentralized meaning there’s no central server or central “Matrix Inc.” that you’re dependent on. in theory, anyone is capable of setting up their own local Matrix server (called a “homeserver”), and it will just communicate with everything else with no issues.

The actual requirements for running your own are a little complex, requiring a TLS certificate, some spare and not that slow storage space (Need a SQLite or PostgreSQL database, plus enough space to store all media / files you upload and get sent from others), and a fair amount of RAM (as in like, 8 to 16 GB, currently, see “Running Your Own” below for why), but if you don’t want to do any that, well, matrix.org operates a public homeserver, meaning you can grab a client (like Element¹) and register on there. You can also just head on over if you want to learn more, they’ve got this all pretty well documented.

All usernames are of the form @user:server, like @teknikal_domain.matrix.tdstoragebay.com, and yes, I do need to shorten that, I’m working on it. But that aside, you’ll see that there’s the name part, which identifies you, and then your server is part of your username, which is how we get federation to work: since the server to contact is in the name itself, you don’t need to rely on some service to direct you.

Rooms

All communication on matrix takes place within a “room,” and rooms have a similar identifier, like #photography:matrix.org. However in the case of rooms, the domain part is only for namespacing, so that you can have one photography room on matrix.org, and another photography on some-other-homeserver.net, and the two don’t conflict.²³ Rooms themselves have a set of rules for who can join, who can read the history, and, encryption. If encryption is enabled (usually enabled for 1-to-1 chats by default), then Matrix uses a version of Olm (“Megolm”) to create and share per-message encryption keys, meaning even the homeserver(s) cannot read the message data. Otherwise, your communications are still passing through a standard TLS tunnel. Encrypted rooms also have another feature: a visual identifier of veracity. You, as a user, can request to “verify” some other user’s current login sessions (“devices”), usually by comparing a set of emoji through some out-of-band communication that you know is trusted. Once this happens, a little green shield next to their name shows that you’ve verified they’re not some impostor. If they ever sign in on a new device, that’ll change to red to indicate there’s something new that you’ve yet to check, and you can’t be 100% sure that session is legit.⁴ Message keys are controlled by a Megolm session, which is a double-ratchet system, kinda like Signal. Every so often, these sessions are discarded and re-created, meaning that perfect forward secrecy can be achieved, because no amount of data from session A will assist in breaking session B.

The APIs

Matrix basically has two major REST APIs that control 99% of the work: the client-server API (CS API), and the server-server API (Federation API). These are just defined endpoints to send HTTP requests to, starting with either /_matrix/federation/ or /_matrix/client/.⁵ Client authentication is done pretty simply, using a Bearer token in the Authorization header that you’re given on login, and from there, it’s just exchanging JSON objects. Servers are different, they use a message’s signature, which is generated from the server’s keys. Every homeserver has at least one ECDSA key pair that it uses to sign messages, meaning that forged or altered messages can be detected.

A client has very little required complexity to be usable, even if it doesn’t implement everything. Realistically, anyone with a few spare hours could probably get enough of the CS API implemented to have a working program, which is neat.

Third-Party Identifiers (3PIDs)

Here’s a cool part: Matrix has support for so-called “3PIDs”), which are other identifiers that you can request be tied to your matrix user ID. For example, I can associate [email protected] with my name, and that means that, in theory, if you tried to invite [email protected] to a room, an Identity Server would check there’s a binding from that address to a matrix ID (me), and then send that back, so your client can invite me properly. Currently, it’s got support for emails and phone numbers, but there’s nothing stopping it from having more. It’s a neat concept, not like it hasn’t been attempted before (Telegram accounts are bound to your phone number), but it’s nice that you can ease the transition a little by using a known other contact method, like an email address. And if configured, you can use a 3PID for login, instead of your username. Because chances are, typing in your email and password are probably so far ingrained in your muscle memory that it’s a no-brainer.

P.S.: You can run your own IS too, the plan is to have them be as decentralized as the rest of the network. And they federate, meaning that if IS A has my binding, and you query IS B, it could, with a DNS SRV lookup, know to ask IS A about it. One of the more popular ISes for people that run one, ma1xd, does this.

Messaging

Normally, you’re using Matrix as a chat protocol, and for that, it’s got a lot of cool features. Most clients are probably going to accept Markdown formatting, Element uses CommonMark, which is then sent to your homeserver as a message with formatted HTML. There’s support for images, videos, arbitrary files, and, in the unstable features section of Element Desktop, sending voice messages, a la Telegram. By connecting your client to an Integration Manager (which most have already), you can expand the feature set with things like, well, stickers, again, a la Telegram.

Messages can be edited after-the-fact, leaving a little (edited) marker (though anyone who looks at the event source can see the original), and even deleted, usually stating nothing more than just Message deleted or similar. You can also “react” to messages with emoji (like Discord), and reply (like Telegram, and Discord!), including the previous message in yours.

VoIP

Matrix itself doesn’t handle VoIP. However, Matrix can be the signaling layer to initiate a VoIP connection. For example, Element has a button for voice call / video call, and in a one-on-one room, this is as simple as sending an event with an SDP payload, and if the other user accepts, reading their SDP payload, probably gathering some ICE candidates, and constructing a two-way WebRTC stream. For group chats, this is a little more complex, and the usual way of handling this is through a central Jitsi server that acts as the mediator, taking in everyone’s streams and re-broadcasting them. This way, each client doesn’t have to have a stream connection to every other client, just the Jitsi instance.

Running Your Own

The requirements are as stated above, but once you have everything you need secured, choose your homeserver. The two reference implementations at the moment are Synapse and Dendrite. Synapse is a Python 3 program, and Dendrite is written in Go. Synapse is a huge memory hog, but Dendrite isn’t feature-complete, or even at feature-parity at the time of writing, meaning that you might be missing a feature or two for a while. There’s also third-party servers, but those I’m not familiar with. I would personally recommend sticking with one of the reference implementations, since the Matrix protocol is constantly changing and getting improved, and they’re likely going to be the first ones to support new functionality. Same thing for using the reference client, Element. It’s likely on the front line for proofing and testing new functionality. But, your preference, choose what you like.

Note that if you do this: DO NOT do what I did: And put it on a subdomain! This just makes names stupidly long. What you’re supposed to do is use delegation. That is, you tell the homserver that it’s to respond to, say example.com, meaning all your user IDs are @something:example.com, and then you delegate from example.com to whatever. For example, delegate to… matrix.some-other-domain.com. This can be done in two ways: the DNS SRV method, or the .well-known method, the latter is preferred.

`.well-known` Method

This involves the creation of two files on your chosen domain. Using the example above, that means that https://example.com/.well-known/matrix/client should return this JSON:

{
    "m.homeserver": {
        "base_url": "https://matrix.some-other-domain.com"
    }
}

You can also add an additional m.identity_server (looks identical) if you want clients to use a server-specified identity server, not their built-in default. This is so that clients trying to login and send events know which server to talk to, but this doesn’t help federation. For federation, you need to add /.well-known/matrix/server:

{
    "m.server": "https://matrix.some-other-domain.com:8448"
}

And this will mean that when any server wants to contact what appears to be example.com for federation, it’ll find the correct place. Also note, HTTP redirects aren’t followed well. Make sure it actually responds with a 200 OK.

DNS `SRV` Method

If you want to delegate with DNS, you’ll still need the client’s .well-known file, making that method preferred in the first place. However, if a homeserver fails to find a valid .well-known/matrix/server file, it’ll consult DNS, performing an SRV lookup for, using our examples, _matrix._tcp.example.com. This is a normal SRV record, meaning it can specify a port and domain name.

So, there’s not much more I have to say on this point. Some people find it finicky (I… don’t), some people say it’s missing critical functionality (like to fully delete a user, which is next to impossible because of the structure), and some people just… don’t want to learn. As with any new project, there’s going to be pushbike, and in my opinion, there’s a bit more than is actually deserved by people that seem to want to find excuses to complain.

But, hear me out: I switched from Keybase to Matrix after Keybase was acquired by Zoom. I’ve been using it since, and the only real problems I’ve had were my own fault with running my own server, and those are more than readily solved by people in #synapse:matrix.org. Even if you’re not enough of a tech head to run your own homeserver, grab an account on matrix.org or something. I’ve yet to find a better alternative (maybe Delta / PGP emails is a close second, but, that’s also just emails, it’s a bit less flexible, so I think Matrix holds the top slot.)

Element, formerly Riot / Riot.im, then Vector, has… gone through a few name changes. It’s pretty much the de facto Matrix client at this point, but a number of others exist too. ↩︎
Technically, it serves as a helper for your homeserver, as an initial point of contact to ask who is available to assist in joining the room, but for intents and purposes here, it’s unimportant. ↩︎
Rooms are actually identified by an internal ID, like !hLKWHiqSaEvEfGnyNJ:example.com, the # ones are a alias that a server can map alias -> ID. ↩︎
Well… devices can sign other devices (“cross-signing”), meaning if the user manually confirms their new session from an existing old session (emoji compare / QR code scan), the old session will cross-sign the new session, meaning it’s now trusted and verified by default. ↩︎
I’m leaving out the Identity Service API, the Application Service API, and the Push Gateway API. They’re not important here. ↩︎