Experiences writing an ActivityPub server in Python with Django
This post is about how I wrote an ActivityPub server using the Django Python framework, to provide a fediverse account on the same domain as a WordPress blog.
My motivation
On The Aperiodical, we do a monthly post collecting bits of maths news that we've seen. This is the compromise we came to as we realised that we're too old and busy to keep up with writing in-depth posts about individual things any more.
When we started doing this, I set up a /news
slash command in our Slack channel which would take a URL and some explanatory text, and add it to the current draft post.
Slack insists you give a username to the account that replies to this command, so we have our happy little Aperiodipal secretary.
Since we only publish the news posts once a month, we sometimes miss out on spreading the word about time-limited events, such as deadlines to register for conferences or mathematical holidays. And when there's Big Maths News such as a proof of an old conjecture, it'd be nice to put something out immediately rather than waiting until it's old news.
So I thought it would be a good idea to automatically toot on the fediverse each time one of us adds an item to the news post, as part of the /news
Slack command.
Ideally, I'd like the fediverse account to belong to the aperiodical.com domain, instead of a Mastodon instance such as mathstodon.xyz.
That meant I'd have to serve the ActivityPub protocol on aperiodical.com. This is the kind of thing you do when really you're more curious about how the protocol works than making a pragmatic decision of the best course of action.
Why run an ActivityPub server, instead of making a Mastodon bot or an RSS feed?
The fediverse runs on the ActivityPub protocol: servers send Activity Streams messages to each other to announce events like the creation of a post, and return ActivityPub objects when another server asks for information about objects such as a user account or a post.
In order to interact on the fediverse, your server can't just serve up data in response to requests: it also has to keep a list of followers, and send out notifications to them whenever it creates or updates a post. This is lots more work than syndicating posts through RSS, where you just have to present an XML file of recent posts, and it's up to readers to regularly check it for updates.
Servers that implement ActivityPub, such as Mastodon, usually also offer their own API for managing your account on that server. So you can avoid all that book-keeping by creating an account on a Mastodon server, and using its API to create posts.
But that means that your account will appear on the fediverse under the Mastodon server's domain.
You can provide a WebFinger service to make it look like there's an account on your domain when people search for it, pointing to an account on a Mastodon server. But ultimately it resolves to the Mastodon server's domain, and that's what people see when they interact with it in the fediverse.
So to have an ActivityPub actor that is genuinely on your domain, you need to handle all the ActivityPub stuff through URLs on that domain.
If you do want to just make a bot account on a Mastodon instance, Terence Eden has written a good guide to building Mastodon bots.
My train of thought, spread over several months
The Aperiodical runs on WordPress, so I looked for plugins that would add ActivityPub support. I didn't like how they worked: they all seemed to work on the assumption that you'd have a fediverse actor for each WordPress user, and an announcement of a new post would only go out through that user's actor. I half-heartedly spent a few hours looking at changing the most popular plugin, ActivityPub for WordPress, to break the link between ActivityPub actors and WordPress users, but I decided I'd have to rewrite so much stuff, it would take a huge amount of work to end up with something that could be merged back into the main plugin instead of forking.
And if I was going to go to that much effort, I might as well learn how ActivityPub works by implementing it myself!
Aim: make an ActivityPub server that runs on aperiodical.com, serving a bot called @mathnews@aperiodical.com
, linked to our news Slack command.
The spec for the protocol is managed by the W3C. It's about as good a spec as I've ever seen: it clearly explains the general idea, and almost all of the technical details you need to know about are easy to find.
There are three separate documents that you need to look at in order to understand all of ActivityPub:
The ActivityPub protocol, which relies on
the ActivityStreams 2.0 data format, which relies on
the Activity Vocabulary.
When I first skimmed the ActivityPub spec, I didn't realise the other two documents existed and thought there was a lot more "up to the implementation" behaviour than there really is. It is quite abstract, leaving a lot of the interpretation of activity objects up to the implementer, but the technical details are pretty much all there, across the three documents.
The hardest bit I was missing was how to sign messages: the spec makes some reference to HTTP Signatures, which are an existing standard, but doesn't insist on them. It wasn't clear which authorisation method different fediverse software uses, or how to find that out. In the end, I've just tested against Mastodon and I'll deal with anything else later.
There's also the WebFinger spec, which gives a means of resolving a username and domain pair, such as @me@mydomain.com
, to a canonical URL giving the ActivityPub Actor object.
This is a really simple spec: you just have to respond at the URL /.well-known/webfinger
, given an account name as a GET parameter.
The remainder of this section is stuff I wrote down while writing code.
Remixed Darius Kazemi's debugging tool at https://tinysubversions.com/notes/activitypub-tool/.
Looked at https://socialhub.activitypub.rocks/pub/guide-for-new-activitypub-implementers
I got it to post a message using the template Darius linked to, then made a shortcut to parse a news post of the sort we'd get from the Slack command.
Then, I wondered about profile data. Filled out the actor JSON by looking at the JSON returned by my mathstodon account.
How to mark it as a bot? Bots have "type": "Service"
in their actor JSON.
Made it read keys and actor JSON from disk instead of the sqlite database.
How to get mathstodon to read the new profile data?
Send an "Update"
activity, with actor
and object
both the URL of my bot's actor JSON.
Then, want to rewrite in Python.
I tried requests-http-signature, but it doesn't seem to sign requests in the way that mastodon enjoys (or I couldn't work out how to configure it to do that).
Then found this post giving Python code to sign an activity.
Spent ages wondering why pycryptodome said my key was the wrong length to decode as base64. Turns out that the key generated by Darius's tool contains spaces, which cryptodome strips out.
Generated a new key using pycryptodome.
It works! But I get a 401 response from mathstodon, even though it does update the profile.
401 response is because the signature wasn't correct. Further down in the thread, I found a link to this working implementation.
Unsure if I want to store as much stuff on disk as possible, or in a database.
Darius's tool stores followers in a single TEXT field in the sqlite database. Why not store them in a file?
Well, I implemented making posts and saving them in a file, but then I discovered that resolving mentions also needs caching, so remote account data should be saved in a file too. You identify remote accounts by username and domain.
Except you also need remote account data when distributing to followers, identified by URI.
So maybe accounts should be in a database after all!
And in fact, to produce a list of published notes, it would help to have them in a database so you can filter on visibility.
Several hours in: I have tables for local account, remote account, note, and "remote account following local account". Might need tables for likes and boosts (Announce activities) if I want to expose those properly, but it doesn't look like mastodon does.
I think it's a mistake to have a separation between remote and local data in the database.
Tried to put both local and remote accounts in one Actor model: there's loads of stuff that you end up marking as "local only". So in fact Mastodon's arrangement, with separate "Account" and "Actor" tables, makes sense: both remote and local actors have entries in the small Actor table, and local actors also have a linked Account table with other stuff.
I want to store mentions of local accounts by remote accounts.
If it's possible for one local account to mention another local account, then it makes no sense to have two separate Mention models.
Maybe the Note model should have local_actor
and remote_actor
fields, one of which is null.
Updating posts was tricky: it looks like Mastodon rejects updates if the Note object doesn't have an updated
field.
This isn't mentioned in the ActivityPub spec, but there's an entry in the Activity Streams spec.
To get this to work on aperiodical.com, I added some rules to the nginx to proxy everything under the root URL /activitypub
, as well as /.well-known/webfinger
, to my Django server.
I tried to minimise the amount of abstraction in my code, so that I didn't waste loads of time implementing bits of the protocol that I don't need or doing stuff awkwardly in order to allow a use case that I don't want.
The only concession to abstraction I've made is that the ActivityPub inbox view has a list of InboxHandler
classes that effectively work as filters on the incoming activity object, in order to produce different side-effects.
A separate Django app can register a new inbox handler class, and associate it only with certain actors or domains.
I've used this to add a handler for the aperiodical.com domain which will send a message to our Slack channel when the maths news account is mentioned by someone else. My aim is that people can send us tips for bits of news we haven't seen yet, and replies to the account's posts might give information that we can use in the monthly blog post.
So, as I finish this post, I have @mathnews@aperiodical.com, and it's been discreetly posting bits of news for a couple of weeks.
Re-use my code, if you dare?
I've published this both for my own benefit, and because there's probably some insight in there that someone else will be missing when they come to implementing ActivityPub.
I don't really want to commit to maintaining my code for other people to use: I've got more than enough projects on the go already. But I'm sharing the code as-is, because it doesn't cost anything.
I've put my code (without the Aperiodical-specific bits) on GitHub at github.com/christianp/django-activitypub-bot.
Along the way, I found Helge Krueger's bovine project, which seems to do a lot of the same things but is written in a more modular fashion so should be adaptable to more use cases. Try that first!