Tailor

Tailor is an experiment that makes it seamless to tweak your website, from the web. I build all my websites in code, and while that provides me with the flexibility to design arbitrarily complex experiments, it makes tweaking them really hard. Context switching to my terminal, finding the right tmux session and making the right edit every time I want to try out a new color for a heading just made it unreasonably painful to play around with these websites.

What I ended up doing was using devtools or things like Tailscan to experiment in the browser, and then manually applying the edit in code once I got things into a good state. It felt wasteful to have to define my edit twice — once in the output code with devtools, and again in my source code. This was a job for a droid.

Then, I realized, I don’t have to live like this. I can build this droid. Three components make it possible:

Good sourcemaps give me a really good starting point for where my edit “starts” directly from the UI component. I don’t have to scour the entire codebase.
Webcontainers let me edit websites in the web. While this is not strictly necessary, having a seamless end-to-end edit flow massively helps with the pain of context switching to my terminal.
LLMs give me a fuzzy reasoning engine that can map an edit in output HTML to a change in source code. Giving LLMs tools like LSPs make them even better at making these edits.

That’s a bit of a mouthful, so let’s walk through an example.

You want to edit the CSS class on a heading to text-rose. Select Tailor, go to the class list, and edit it.

Tailor starts working, starting from the context you gave it and exploring the codebase to find the right place to edit things. You have feedback on what Tailor is doing the whole time.

Commit your changes from the browser by hitting the check mark. While this is ungated for the demo site, you can imagine this being protected behind some authentication.

The other part of Tailor that really excites me is the faint possibility it opens up of going back to the old days of web, when you could actually copy/ paste someone’s HTML as a starting point for your website. Tailor as a standalone tool doesn’t let you do that, but I think there are plausible extensions of the idea that could easily get there.

Future Direction

So what’s next for Tailor? I’ve spent way too much time on it and I’m pretty happy with where it’s reached, so it stays an experiment for now. People I’ve spoken to also aren’t clear about the value add - designers hate the UI, and devs find it too slow and dumb (“I could’ve done this so much quicker”).

There are a couple of possible things I might do in the short-term:

Actually make the LSP run in the browser.
Integrate Tailor with my blog so I can use it on a regular basis.

The long-term idea I’m most excited about is a way to improve the speed of Tailor. The idea is to do a lot of the work with the LLM at compile time to build kernels for common edits. The algorithm would roughly work like this: for every element in the output HTML, for a range of possible common edits, produce a function edit(editType, editValue) that applies the edit to the right part of the source code. There are tons of nuances around this idea because of how general code can be, but I think it’s a promising long-term direction to make this really fast.

Another interesting UI direction is leveraging more generative UI to build tools you need, but this isn’t something I’m too excited about building.

How does it work?

When you click “Get Started”, Tailor navigates to a different subdomain that’s running a different codebase completely. This codebase pulls in the Tailor code and starts up a Webcontainer with that code.

Something that might not be obvious is that the Tailor tool runs in the host application code itself, in fact you can see it here. The reason for this is that the Astro “sourcemaps” are only accessible from a same-origin iframe due to web security people, and Webcontainers iframe code is closed source. So when you make an edit from Tailor, the inner iframe sends a postMessage to the outer app that handles making the API calls for the edits, updating the code and refreshing the webcontainer.

The complicated UI pieces are all done with Cannon. The backend is just some fancy prompt engineering. GPT-4 did fail often enough that I wrote a simple testing setup to fine tune prompts and rapidly iterate outside of a UI context.

Things I learnt

I’m going to try to write longer articles about each of these different pieces, but here is a list of things I learnt:

Don’t spend so much time building things before you figure out the messaging. I’d heard this a billion times before, but you can’t really tell people anything, and I hope I’ve learned this lesson for good. The best week of development I had on this project was the week I spent a couple hours sketching things out on a notebook and really ironing out the “MVP” story.
The nuances with the JSON-RPC format that LSPs accept, and how that interacts with webcontainers. This is a long story, but if you type astro-lsp --stdio into your Unix terminal and try to type things according to spec, nothing happens. This is because the LSP spec requires you to end lines with \r\n which is a CRLF, but in Unix terminals carriage returns are typically swallowed by your terminal program. You can disable this by telling stdin to work in raw mode, but LSPs typically don’t do this (at least not with Volar). This is particularly painful when trying to integrate LSPs with WebContainers, since WebContainers use a TTY for the input of a process.
I learnt how to use Figma to make edits to a vector! I really wanted the split scissor look, but I only had a whole scissor, and I had to do a lot of work to make it split nicely. I understood why vector networks are awesome and a huge improvement over the past.
Typebox is awesome
Writing a Vite (rollup) plugin for the first time: it took all the local files I had and packaged them into a JSON file for consumption by Cannon.
Be really careful with 1 vs 0 indexing issues. Editors like codemirror use a 1-indexed representation since they deal with line-numbers as seen by the human, but nearly every other bit of string manipulation is 0-indexed. This is painful, and will result in lots of unnecessary frustration. Define the 1 vs 0-indexed boundaries very clearly early on to avoid this.
UI:
- It’s not that hard to build a devtools like UI interface.
- Writing any codemirror extensions requires a ridiculous amount of two-way sync.
LLMs:
- LLMs are really stupid. GPT4-06-13 (last year’s model) worked the best for me, but is really expensive.
- Making code edits with LLMs requires really thinking through the tool interface for the edit. The interface I settled on was letting the LLM replace text in code snippets that it sees.
- The user message seems to work much better than the system message for the GPT-4
- Groq is incredibly quick, and with Llama3, OK at making edits. It’s much worse than GPT4 though, but I’m really excited to see things improve with open source models to the point where these edits can be really really fast with Groq.