Not an audio post, just some thoughts on programming hackery, see also my old similar posts On Keyboards and Long Live 16 Color Terminals.
I think we have now entered the “Golden Age” of programming and use of computers in general—all thanks to AI. That’s because Programming tasks that once took days can now be finished in a couple of hours. A personal example of this is my experience extending the Emacs editor. This is ages old “universal” editor—its original concept was created in 1976 at the MIT AI Lab, while GNU Emacs program that I use today was created in the mid-80s. However, it is still popular today among programmers and geeks thanks to its infinite possibilities in customizing and embracing new technologies. I use it every day and continue customizing it to my needs.
The customization of Emacs is done by writing some Emacs Lisp code. If your need is simple, like creating some custom action, or fixing some annoying behavior, a small code snippet usually suffices. If your demand is more serious, like enabling code highlighting and completion for some obscure programming language, you need to create a relatively big chunk of code, which in Emacs world is called a “package.” If you are lucky, someone has already faced the same problem and written a package to solve it. In that case, the only thing you need to do is to plug this package into your configuration.
If your need is more or less unique for some reason, then you have to write code yourself. It’s not super hard, but it’s tedious, mainly because you need to study extensive Emacs APIs and consider how to express the solution in terms of list structures manipulation. Also, you need to write code that handles errors, and make sure that your solution works with acceptable speed. In order to accomplish all this, you used to have to comb through Emacs documentation, source code, and if nothing helps, resort to seeking help at online forums—this is what programming used to be. Since I usually needed to deal with this stuff during my work hours, the thought of writing a complex Emacs extension in my spare time sparked no joy.
Enter the Age of AI. Now, if I need to solve some simple Emacs customization task, or fix an annoyance, I can simply ask an LLM, “In Emacs, how do I … ?”, and it comes with a helpful answer and a snippet of Emacs Lisp code which does the thing you have asked about. In this way, I quickly resolved minor Emacs “friction points” that had bugged me for years. One big annoying thing still remained though—adding support for proper displaying of output from semi-interactive build and install scripts and programs in “shell” and “compilation” modes of Emacs. To explain this problem, I first need to give a brief lesson in computer history.
The interface between humans and computers has evolved constantly since their inception. At first, humans had to program computers—that is, setting them up for solving a particular problem—by patching cables between physical ports, or flipping myriad switches (examples from Wikipedia). And computers were presenting the results of their work using arrays of lights. After the next computer interface evolution step people could enter data into computers using manually perforated cards, and the computer could print the calculation results on long rolls of paper, using motorized versions of typewriters. Both of those kinds of interfaces were not very much interactive, and required a good amount of planning ahead in order to avoid wasting precious compute time.
The interaction aspect had been improved by combining the aforementioned motorized typewriters with a typewriter-like keyboard (see this article about the legendary “Model 33” teletype). Finally, humans could type their commands or code in, and the computer could print the result immediately. You might have noted that this is already very similar to how we are interacting with AI assistants on the Web today, except that teletypes were much noisier.
Crucially, even at this early stage, a distinction between “symbol” and “control” characters was already apparent. When you see a printed page, you only see the letters of alphabet and punctuation—that’s symbols. However, for producing this page, the typewriter operator (being that human or a computer) also needed to use some commands in order to drive the typewriter actions like performing a step roll of the paper, or move the printing head forward or back. Each of these actions was coming to the typewriter on the same lines as printable symbols, and is encoded using a control character.
Thus, when a computer sends the result to a typewriter terminal, symbol characters are interleaved with control characters in the same “stream” of data. Same thing for the human—although most of the typewriter keys are for inserting symbols, some keys like “carriage return” and “backspace” are for sending commands. The computer also has a bonus command called “bell” which it had adopted from telegraphy (so it actually predates computers). This command originally rang a physical bell inside the teletype machine. Nowadays, the computer just emits a short beep in order to attract the operator’s attention.
Despite being quite basic, this typewriter interface had already opened a
way to create interactive text editor programs. The most well known
line-oriented text editor is ed. Its interface was designed to be very
minimalist and terse, in order to save paper. ed was called “the standard
text editor” in Unix OS—and now this is a hacker’s joke. In fact, ed is
still supplied with most OSes that descend from Linux.
Upon launching ed, you see no prompt; the program simply waits for your
commands. The command is typically just one character, plus parameter. If
ed does not understand your command, it prints ?, that’s it. Since the
file you are editing can be lengthy, ed does not reveal its
contents—instead, you have to explicitly ask it to show a region of lines,
and of course you can not edit them “in place”, you need to enter a command
for making each edit. Also, since on a typewriter it’s not possible to
correct typing mistakes in-place—there was no concept of “line editing”
using “cursor left / right” commands, you could only use “backspace” and
then re-type part of your command, or discard the entire command line and
re-type from scratch. Needless to say, editing text files using these basic
capabilities required very good memory, skills, and a lot of patience.
Nevertheless, it was the best user interface that programmers had at that
time, and in fact the early Unix OS code was written using ed. As Brian
Kernigan recalls in “UNIX A History and a Memoir”
book, there were three main
components that allowed development of Unix for PDP-7 computer on the
computer itself: an editor (and that was ed!), an assembler, and a
kernel.
In the next evolutionary step, paper teletypes were replaced by “glass teletypes.” These used CRT displays instead of paper, and their keyboards started resembling modern ones. Note that these early glass teletypes did lack the scrollback buffer, thus the lines that have been scrolled away are gone forever. In some sense, this was a downgrade from paper rolls. On the other hand, since typed characters were appearing on a screen, it was possible to make in place corrections in the typed command text, and even move the cursor left and right in order to correct typos in the middle of the command—no more retyping!
The capabilities provided by this new type of terminals have spurred
improvements to ed editor. First, it has got a command for showing a
whole page of text from the file being edited, taking advantage of the
silent nature of video terminals. This version was called
em. Note that em
still had to be conservative in what it displays because the connection
line between the terminal and the computer was often painfully
slow. Implementing in-place editing for a whole document—a “visual”
editing—was not yet possible.
A lot of standard Unix utilities like bash, cat, du still operate in
a similar line-oriented mode and thus are still technically compatible with
typewriter-style terminals. In fact, Emacs exploits this fact by emulating
a “dumb” terminal (that’s another name for the kind of terminal that only
understands basic cursor moving commands) in its “shell” mode. But in fact,
since Emacs works on a real computer, its shell mode is much smarter,
because it can hold the screen history of your entire session, and you can
go back to any previous command, change it as needed, and send again.
Back in time, similar capabilities had also appeared in new generations of terminals that got their own CPU and RAM, and thus could hold in their memory much more than just one screen of text. They also got colors! The companies making them have coined the term “smart terminal.” The terminal technologies started to be a hot topic among technology companies (very much like AI these days), and there was a “Cambrian explosion” of terminal models, each with their own set of features.
These new features of smart terminals gave birth to a whole new set of control commands. Since the controlled display area had expanded from a single line into a two-dimensional array, there were commands that control the cursor position on the screen, and perform screen clearing and scrolling. For compatibility, and due to technical constraints, these new commands were not single characters anymore (as “backspace” and “carriage return” are), but rather entire sequences of characters, starting with the “escape” command symbol.
The terminal controlling commands were still sent inline with the printable symbols. When a smart terminal saw a command, it processed it immediately. This was usually resulting in a cursor position change, or the current color change, or enabling bold font, or something else (note that some similarity with HTML language can be noted, except that unlike HTML tags, control commands do not have a closing pair). If it was a symbol character outside of the command, the terminal just printed it. To get a sense of how many terminal models and various types commands were there, take look at the “terminal information database” here.
Finally, smart terminals created possibility for visual editing, and
the ex editor was reworked into vi. The first versions of vi were
implemented by relying on ex running in visual mode. You could still
use the same commands that ex has inherited from ed, but you could also
navigate and scroll the document you are editing. The modern versions of
vi still use these two modes of operation.
Of course, other visual versions of existing UNIX system utilities
started to appear, for example, top is a visual interactive version of
ps (processes control), and more and less are visual pagers,
offering an alternative to line-oriented cat.
By the way, that terminal information database I mentioned above was not
created just for lessons in computer history. In fact, it was solving the
problem of standardization of the “terminals zoo”. When a visual
program runs, it needs to know what the terminal is capable of, and also
what is the exact control characters sequence for each terminal command
(remember that there were hundreds of smart terminal models). There was a
library called
curses
which acted as a translator between a program and a
terminal. Unfortunately, a lot of modern command-line scripts and utilities
are unaware of this translation mechanism, and use control sequences
“blindly”, assuming that the terminal is able to interpret them
correctly. In part, this works because these days we use “terminal
emulator” programs that typically use the same set of control
characters. But when this is not the case, the user starts seeing a flurry
of cryptic sequences that start with ^[ (the escape character) in the
program’s output.
Besides the need for standardization, another interesting engineering
aspect that had emerged was the ability of terminal
virtualization. Since, as I mentioned previously, terminal control
characters are in the same stream with program’s output, and user cursor
control commands are in the same stream with program’s input, the standard
UNIX mechanism for I/O streams redirection created a possibility of
emulating terminal behavior within a visual program. Normally a visual
program assumes that it uses the entire screen of a physical terminal (the
terminal provides its dimensions in rows and columns). But if one program
launches another, it can direct the I/O streams of the child process into
itself, and maintain a virtual terminal for it. For example, the parent
program may run several child processes and maintain a virtual screen for
each of them (this is what the utility called screen does). Or it can
allocate a subsection of the terminal (half of a screen, for example) to a
child process, and run two of them side by side. This kind of virtual
terminal manager programs is called “terminal multiplexer”, with tmux
being a well known example.
An interesting question emerges: since there is only one user, with only
one keyboard, if a terminal multiplexer is running two other visual
programs side by side, which one is receiving the input? The answer is—the
input is received by the parent program, which then sends it to the child
process which currently has “focus.” To give the user the ability to switch
focus, or send any other commands to the parent program, they need to
prepend their input with a “control sequence” (or “escape sequence”). As an
example, for screen, the default control sequence is Ctrl-a. When
screen receives it, it understands that it does not need to retranslate
what follows this command to the child process, but rather interpret it
itself.
The virtualization can be arbitrarily nested. For example, you can launch
screen, and inside it launch another instance of it, but you need to be
careful with control sequences. screen has a command “send escape
command”, thus in order to send a control command to the nested screen we
send the command “send escape sequence” which is interpreted first by the
parent screen, and after it executes it (and thus sends the escape
further down), the nested screen enters the command interpretation
mode. If you need to send control commands to the nested screen
frequently, you should really change the default “escape sequence” for it
(to Ctrl-b, for example) so that the parent screen sends it directly.
Another problem that virtual terminal programs solve is working around the fact that in Unix, a process can only be “bound” to one terminal only. This reflected the normal use case of a user logging from a terminal into the OS, and when they log out, all their processes are automatically terminated. The only possibility for a process to outlive its terminal is to “detach” from it and become a “daemon” process. In fact, most of the OS own processes are daemons, so they can run even if no users are logged in. However, users cannot easily interact with a daemon—normally, the output from it goes into a log file, and commands are sent to the daemon using UNIX signals.
Terminal multiplexers / emulators created a new possibility of user session
persistence. Since they launch the user’s program under a virtual terminal,
it is not bound to a physical terminal, and can run until next system
restart, like a daemon. However, since the multiplexer also has a
user-visible part, that user program can interact with the user
normally. And in fact, only that user-visible part of the terminal
multiplexer gets terminated if the user’s physical terminal gets
disconnected from the OS. When the user reconnects, it can re-attach to the
already running session of screen, and continue their work. But this
feature makes the implementation of screen and tmux rather complicated
because the user can reconnect using a different kind of
terminal. Thus, essentially the terminal multiplexer needs to perform
adaptation of terminal control sequences that the user’s visual program is
sending to the virtual terminal into equivalent control sequences of the
current physical terminal.
With that history in mind, I can explain the Big Friction Point that I had with running command-line programs under Emacs.
Emacs entered the text editor scene much later than vi, and it was
designed right from the start to be a visual editor, so it does not have
the line-oriented mode like vi does. Moreover, since Emacs pretends to be
an operating system in itself (that’s another popular hacker’s joke), it
offers both dumb terminal emulators (the “shell” mode, and the
“compilation” mode), as well as full smart (or visual terminal)
emulators. That means, you can run vi inside Emacs if you wish to. The
caveat with visual terminal emulation in Emacs is that it goes against two
important principles of its design.
First, very much like Macintosh OS design, Emacs strives for having unified
key mappings across its editing modes, that’s to avoid interrupting users
“mental flow.” However, a vi instance running inside an Emacs virtual
terminal still expects standard vi input. And it will assume that it has
entire control of the user’s input and output. So, as it was discussed
above on the example of screen, the terminal emulator of Emacs normally
needs to capture entire user input and send it to vi. And in order to
interrupt that, the user needs to send some “escape” command to
Emacs. Thus, visual terminal emulators in Emacs also need to have at least
two “modes”, and this is inconvenient, because it breaks normal key chord
control for the user. The built-in emulator called term calls these modes
char (in which the user input goes into the child process) and line
(in which the child process does not receive any input and the user can
manipulate its output using normal Emacs commands). And if you are
interacting with the nested app and the rest of Emacs, you need to switch
them frequently.
The second thing of visual terminal emulation that goes against Emacs
design, is that a program designed for a smart terminal can only have one
output view (recall that in the design of Unix, an interactive program can
be bound to a single instance of terminal only). Although both tmux and
screen do allow connecting multiple clients to the same session—this is
often used for “live” or pair programming sessions—but since the program
“sees” only one terminal—the virtual terminal which tmux or screen
emulates—it can adjust its view to one terminal size only. There are only
two viable choices for the terminal multiplexer here in terms of which
terminal size it can report to the nested app: either use the size of the
smallest terminal for all connected physical terminals, or report the size
of the “current” one, and admit corrupt visual state on physical terminals
that have non-matching size. Note that this problem is somewhat a corner
case for traditional terminal multiplexers, but Emacs naturally allows
viewing the same text file (which is abstracted into a “buffer”) in
multiple views simultaneously, this situation can happen quite
naturally. And in this case, a buffer associated with a terminal for a
visual program can be displayed correctly in a single view of the buffer
only.
So, basically, existing terminal emulation solutions for Emacs are all of
two kinds. The first kind simulates a dumb terminal (like “shell” mode
does), and this allows to transform the output of the program into a normal
Emacs text buffer (with colors attributes, thanks to ansi-color
package). And this, in its own turn, allows this buffer to be manipulated
using standard Emacs editing commands, and also allows displaying it
simultaneously using views of different sizes. But if the user runs a
utility that uses more “advanced” terminal control characters, the output
from it can go awry. As I mentioned before, a lot of terminal-based
utilities, including build tools, and even OS’s own tools, do not query the
terminal type and just assume that they can use arbitrary terminal
displaying tricks for their fancy progress bars.
The second kind of Emacs terminal emulator provide full screen emulation.
From what I have seen on various forums, a lot of Emacs users sidestep the
problem of garbled output by resorting to this kind of emulation. That is,
these people run their shells and builds in full terminal emulators under
Emacs. Some of these emulators, like
eat, offer to solve the keyboard
input problem by providing a third “hybrid” mode—eat calls it
semi-char—where most symbols are sent into the child process, but
some are interpreted as usual by Emacs. So maybe the user can remain in
the “semi-char” mode for longer time while interacting with an app running
under Emacs, but once they will need, for example, to copy some output from
it, and for that they will have to engage into mode switching, which can be
disrupting to their mental flow.
So we see, that the hybrid solution from eat is applied on the user
input side. My idea was to make a hybrid on the program’s output side
instead. That is, to evolve the shell mode into a third kind of terminal
emulator, which still mostly supports the dumb terminal emulation mode,
but allows the child process to use a subset of control characters for
displaying fancy progress statuses. After all, once the long action carried
by the app completes, it normally erases all its intermediate output, and
the result looks very much like an output from a good old line-oriented
program. To state it in another way, I don’t need to run vi in my
“evolved shell”, but I need to be able to run apt get install and observe
a normally looking progress bar instead of colorful garbage interleaved
with control characters that the Emacs “shell mode” does not understand.
By the way, as I read in the materials about the unsuccessful planned
successor of Unix OS—the Plan 9
OS—its terminal
(called 9term) was built around similar ideas of considering program
output as buffers of text, and dropping support for terminal control
sequences completely. 9term was behaving more like Emacs shell or eshell
modes, representing the program output as a stream of text and making the
terminal basically a text editor which allows the user to work both with
the history of commands and with the program output as if they were a text
file. In Plan 9, if someone needed a visual program, they had to make it
a GUI app, and building “TUI” interfaces was considered a thing of the
past.
By then, I had already been using AI in “assistant” mode for quite a while. That means asking questions and then copy-pasting fragments of AI-generated code, basically it’s a replacement for Web and forum searches. But true AI agentic coding (also known as “vibe coding”) is completely different, and I had to start learning it.
Luckily, I was already experiencing something similar for a couple of years thanks to the reliance of modern tech companies on “vendors”, or what is called “outsourcing.” In this model, if there is a programming task that can be delegated to another company, this should be preferred to using the company’s own engineers time on it. This is just basic cost reduction effort. Since outsourcing companies are located in geographical areas with lower labor costs, they can offer much lower hourly rate.
So, I was already doing a lot of programming tasks in the manner where I was just formulating some high-level idea of what should be done, and how the produced code should be tested. Then I was reviewing the code contributions from my vendors, and passing them back my comments. Does that sound familiar? Right, this is very similar to how “vibe coding” works, with two major differences: the cycle speed with vendors is typically longer, measured in days, thus you don’t get that “vibe” feeling, and the capabilities of human vendor programmers used to be better than of AI agents. I said “used to be” because with late 2025 models I had noticed a big shift in their programming abilities.
I decided the time was right to unleash the power of those new AI models to
help me to finally fix my Big Annoying Thing with Emacs. Since I knew what
I need to achieve, I did not have to ask AI to come up with an
“implementation plan.” Instead, I started with writing tests for my new
Emacs mode extension. For this, I still used AI in assistant mode, and it
was really helpful in constructing these pesky ANSI escape sequences for
the scenarios that I cared about. The LLM was also able to analyze a full
output from apt package installation session in order to find out which
terminal control sequences it uses, and create a script which emulates its
output.
Having these tests, I had established a “continuous integration” (CI) loop
where I was loading the code of my extension into Emacs (by this time,
there was no code yet), launching those test scripts, and comparing the
results with “golden” outputs which I have produced with screen. Time to
unleash a fully autonomous AI coding!
This is where the real fun began. I was using Gemini CLI, and at first I’ve made a mistake of letting it use the 2.5 version of the LLM. And it was really struggling, to the point that it could not even write a correct Lisp code. Lisp syntax is very minimalist and consists of lots of parentheses that need to be balanced. Surprisingly, Gemini 2.5 had big problems with that. It actually broke my CI loop at first, because it was writing code that was causing Emacs to fail to load the module, or to hang up completely. This was something I had never experienced with vendors (I told you, so far humans were better programmers than AI). After I made my CI loop more resilient, the AI has entered its own loop by endlessly trying to fix the Lisp syntax, and never succeeding, eventually falling into a mode when it was continuously streaming its looping chain of thought into my terminal. Having wasted a couple of hours on this, I was about to give up and was considering to switch back to assistant mode of coding.
But then I’ve done two things: had switched to the “latest preview” model of Gemini, which was 3 at that time, and, again with help of AI, improved the project instructions, specifically insisting that the agent writes “Parinfer-compatible” code, and verifies it thoroughly. This was a night and day improvement—the agent finally managed to fix almost all failing tests by writing correct code, and I started feeling good vibes.
Over a week, during my spare time, we had finished the implementation to the point that I could really run my build script in Emacs “compilation” buffer and the output was looking exactly as it looks on a terminal with full capabilities. During this period, I was following the usual principles of Test-Driven Development: always write the test first, make sure the changes fix it, and do not regress anything else, then do a refactoring of code, and of test. So it’s like a real engineering cycle, except that I only had to type in plain English—no more coding myself.
I also realized that the AI agent is capable not only of writing code and tests, but actually investigating problems, and it can even write their own tools for that—like a real human programmer! At that point my feelings towards the agent shifted, and I started to consider them as my colleague, at least, like a robotic colleague, something like WALL-E robot, maybe. I still had to help this robot sometimes to fix Emacs Lisp parenthesis issues. That was mainly because I wanted to save my time, and also money.
Yes, one thing I would like to mention is the cost of this exercise. I have ended up spending about $55 on the inference, which is of course not a lot, but you need to realize that this wasn’t a big project either. So when I’m reading about huge projects that involve hierarchies of AI agents, I think they can burn a lot of money each day! Besides all the useful work that the agents do, when a problem gets really hard for them, they can easily end up dwindling down into a “confusion spiral,” and that is all at your expense! So be careful—I really would not let a swarm of agents to work without close human supervision.
If you are interested, the resulting Emacs extension is
here.
I called it comint-9term to indicate that it extends comint (command
interpreter) mode of Emacs, and it delivers the spirit of the 9term
terminal from the Plan 9 OS.
The code is complete now, I’m just planning to continue fixing any edge case scenarios that I might encounter. After all, as I have explained above, this hybrid terminal scenario is a bit unusual, and operates on a boundary between dumb and smart terminals, so some scripts or programs with super creative approach to progress displaying may cause issues. But the AI has added “tracing” sub-mode, so whenever that happens, I can grab a trace, and give it to my AI agent for analysis.
From this “vibe coding” experience, and also from reading experiences of other people, I think this new human-computer interaction mode will stay. Even if some of the companies that are currently making “frontier” LLMs will collapse due to economic reasons, the technology is out there, and people will find ways to make it more economical and efficient.
AI is definitely the new way for writing computer programs, and I think it may change how we treat our phones (or TVs, or cars). Since the appearance of the first iPhone, it was always annoying me that smartphones and tablets were always treated as “embedded” devices, which means, you had to program them using a “real” computer, despite the fact that the CPU power of modern phones is orders of magnitude greater than the supercomputers of the 1980s, let alone the personal computers. Compared to a Z80 (the heart of ZX Spectrum), a modern phone is like a starship compared to a bicycle.
I know, one big obstacle of using your phone for programming was absence of a real keyboard. Since phones do not have a convenient keyboard, writing a program for them in a “traditional” way was a pain. But not anymore! Finally, with AI it’s possible to write a program for your phone using only your phone (in theory, at least), by talking to an agent who is building and debugging the app for you. Thus, personal devices can become something like home computers have become for kids 40 years ago. So, despite all the “AI gloom” regarding its economic effects, I look into the future with big enthusiasm.