I Had 4,291 Twitter Bookmarks I Knew Nothing About and Found a Map of My Own Mind

I Had 4,291 Twitter Bookmarks I Knew Nothing About and Found a Map of My Own Mind

AI LLMs Python Productivity Personal

Twitter is my most used social app. And apparently I have been using it to build a knowledge base for years without realising it.

That is not how I would have described my relationship with Twitter before this project. I would have said I use it to follow AI news, watch the fintech space, and stay connected to what is happening in tech. All of that is true. What I did not account for was the bookmarks.

Every time something caught my eye — a paper, a thread, a tool, a take I wanted to sit with — I saved it. One tap. No system, no folder, no intention to ever organise them. Just a quiet accumulation of things that felt worth pausing for.

I had no idea I had done that 4,291 times.

A tweet from Andrej Karpathy planted the seed: your digital footprint is a more honest portrait of you than anything you would consciously curate. I had years of unexamined bookmarks sitting right there. So I decided to build something to finally look at them.


Step 1 — Getting the data out with ft

Twitter does not make bookmark export easy. The official API is restrictive and the UI is useless for bulk work. I used ft to sync my bookmarks programmatically, pulling everything into a structured JSON format I could actually work with. Unsexy but necessary — you cannot build anything without clean data.

Step 2 — Transcribing video bookmarks with Whisper v3

A meaningful chunk of my bookmarks were videos — AI explainers, conference talks, product demos. Text-only analysis would miss half the picture. So I downloaded those videos and ran them through Whisper v3, converting spoken content into text I could feed into the pipeline alongside tweet text.

This was one of the more satisfying parts of the build. A video of someone explaining a paper became a searchable, categorisable document. Knowledge that was locked in audio became part of the corpus.

Step 3 — Categorisation with Llama 3.3 via Groq

With clean text for every bookmark, I wrote a Python script to prompt Llama 3.3 70B via Groq to assign each one a category and tags. In the first attempt I ran it sequentially. It was going to take over two hours.

I switched to batch inference. Same job. Three minutes.

I knew batching was supposed to be faster. Experiencing a 40× difference firsthand is something else. If you are building any pipeline that categorises or classifies a large corpus, batch inference is not optional. I also used Claude Code throughout the testing phase — it was instrumental for iterating on the prompt structure and validating outputs without constantly context-switching.

Step 4 — Building the knowledge graph in Obsidian

With every bookmark tagged and categorised, I imported everything into Obsidian. Each bookmark became a note. Each category and tag became a link. The graph view rendered them all as a connected network.

What I saw genuinely surprised me. I expected a mess. What appeared instead were distinct clusters — dense galaxies of interconnected nodes, each one a domain I had been silently gravitating toward. Obsidian’s image preview let me see tweet screenshots and thumbnails inline, which made browsing the graph feel almost archaeological.


What I found out about myself

The LLM generated 55 categories across all 4,291 bookmarks. The biggest clusters:

CategoryCount
humor787
ai-news582
tool544
opinion484
design442
entertainment383

A strong thread of product launches, open-source releases, and UI work running through all of it — things I save instinctively as a developer watching the space.

The surprise was not any single category. It was the connections between them. Ideas that felt unrelated in isolation turned out to share edges. The graph made visible something I had never sat down to articulate: the actual shape of how I think.

Humor being the single largest category — with nearly 800 bookmarks — was not something I consciously knew about myself. The graph made it undeniable.

Your bookmarks do not reflect who you think you are. They reflect who you actually are. There is a difference.


How to build this yourself

  1. Use ft (or the Twitter API) to export your bookmarks as structured data
  2. Download any video bookmarks and transcribe them with Whisper v3
  3. Write a Python script to call an LLM with a categorisation prompt via Groq — use batch requests, it is the single highest-leverage change you can make
  4. Import the tagged output into Obsidian as linked notes
  5. Open the graph view. Then sit with it for a moment.

The whole pipeline took a weekend to prototype. And what I got out the other end was not just organised data — it was a surprisingly honest self-portrait built from years of split-second decisions I had already forgotten making.

If you try this, I would genuinely love to see what your clusters look like.

← All posts