Skip to main content

Every toot I've ever written, in alphabetical order

In the shower this morning, one of those stupid thoughts formed itself in my mind:

I wonder what the entire corpus of toots looks like, in alphabetical order?

I can sort of answer this: I'm one of the admins of mathstodon.xyz, a big-ish Mastodon instance. I could run a query to get the text from every toot the server is aware of.

The immediate concern that arises is whether this could reveal any private information. I don't think it would unless I do it twice in quick succession and someone recreates a private toot from the difference between the two, but just to be safe I've done it on just my posts.

I ran this query on the postgres database:

sudo -u postgres psql mastodon_production -c "copy (select text from statuses where account_id=1) to '/tmp/cp_statuses.csv';"

(a normal mastodon user who doesn't have direct database access could request an export of their account data and wait a few minutes)

That produced a CSV file with a row containing the text of each post.

Then I sorted that alphabetically in Python:

import csv

with open('cp_statuses.csv') as f:
    r = csv.reader(f)
    rows = list(r)

all = ''.join(sum(rows,[]))

import re

with open('cp_statuses_sorted.txt', 'w') as f:
    ordered = ''.join(sorted(re.sub(r'\s','',all)))
    for i in range(0,len(ordered),80):
        f.write(ordered[i:i+80]+'\n')

And that produced this: the text of every Mastodon post I've ever written, in alphabetical order.

So there you go!