Brain on fire

Have you had that feeling before, that your brain's on fire? I hope so. Everyone should, from time to time.

The first time I can remember feeling it was when I was six, and discovered computers. I've written about this before, briefly: My grandparents left a TRS-80 at our house while going off to travel, and they also left the manual. As a kid I was a precocious little snot, already able to do some reading by the time I was in kindergarten. The computer's manual explained how to program it in BASIC. I picked up the book and started reading it, trying out the examples. Then I started improvising, mixing and matching the pieces, writing entirely new programs.

I don't think my grandparents had any idea what they'd set in motion, quite by accident. I was obsessed. I was enamored of this mysterious machine, that could be given instructions in a language vaguely like English but with much stricter rules, and would dutifully execute with absolute faithfulness. When I tried to imagine what it could be told to do, I dimly perceived a vast possibility space. It had limits but felt limitless. There was a universe inside this small box, and I could explore it freely.

Mind you, this was a computer whose capabilities would be put to shame by turn-of-the-millennium calculators. Very little memory. External storage was on cassette tapes and barely worked. No networking at all. Nonetheless, the feeling would be best captured by Arthur C. Clarke: "Any sufficiently advanced technology is indistinguishable from magic." Even this very early computer felt like magic to me, and I already knew I wanted more than anything to be a mage.

Over the years, better computers periodically emerged, and their growing capabilities bestowed wonder as each new one expanded the size of the universe they could contain. They ran faster, they produced prettier pictures, they made music (not just emitting inelegant beeps and boops), their disk-based storage was faster and sturdier and more capacious, and the programming facilities grew in power and sophistication. Those were all incremental changes. When they acquired the ability to talk to each other, sending data packets over wires strung through walls and attic spaces, entire new vistas opened up. First it was small networks, then large ones, then the staggering insight that the Internet could connect the entire world. Then Google made it easy to find any piece of information, anywhere. Cell phones evolved into computers in our pockets.

Then ChatGPT smacked the whole world in the face. I experienced this as a momentous arrival with fanfare and trumpets. Wake up, humanity: something new is here.

I've believed for a while that AI would have its day, that it was just a matter of time. (This 2019 post is mostly about religion and specifically agnosticism, but by its end the topic is the prospect of machine superintelligence.) Nonetheless, to observe what LLMs could do was startling to say the least. BASIC was kinda-sorta like English. That's worlds away from being able to give instructions in actual English.

Up to that point, all the changes had been evolutionary. Computers had proven amazingly easy to scale, as captured by the famous Moore's Law: "the number of transistors in an integrated circuit doubles about every two years". Exponential growth, as Ray Kurzweil is fond of pointing out, is not intuitive to us hairless apes. We evolved to understand linear paths, not the effects of consistent doubling. That's given us Apple and Amazon and Microsoft and Google and Netflix and Meta (Facebook) and everything else reshaping our society, and their dominance feels sudden, and we still haven't fully come to grips with all that as a society. But all of that is still not what ChatGPT is. There's an important difference between "rapid evolution" and "revolution".

Neural networks are different at a fundamental level, because they can recognize patterns, and because they are trained rather than programmed. No traditional computer works like this.

Traditional computers are fast, but dumb. They have to be told in excruciating detail how to do every little thing, down to the ones and zeros that represent your bank account number. Today's towering tech giants are standing on the shoulders of millions of software developers who have painstakingly solved the Mount Everest of smaller problems that had to be conquered first. (Many of them did so for no money.) And of course, things still go wrong, and security breaches make the headlines. But at least when they do, we can figure out why.

Neural networks were the first to crack the problem of reading handwriting, of recognizing what objects are in a picture, of understanding speech and of speaking fluently, of making a robot move with the agility of a living thing, of solving protein folding, and a lot else besides. It's important to grasp that all these problems proved devilishly difficult to solve with traditional programming, and it's now unlikely that any of them will be solved that way -- unless an AI builds the solutions for us.

I need to underscore this point. Neural networks do something that is more magical than traditional computing. They solve problems that we don't know how to solve. Nobody can fully explain how humans are able to read cursive text, and because nobody can explain it in detail, nobody can formulate the detailed instructions that would tell a traditional computer how to do the job. Neural networks do it similarly to how the brain cells in our heads do it... and that too is an area of ongoing research. When researchers try to understand how a neural net recognizes a cat in an image... well, there's been some progress, but it's still challenging. Never before have we had to work so hard to understand how something works after we've built it.

If that old TRS-80 and all its successors are magic, then neural networks are sorcery. This metaphor plays on the idea that in the fantasy realm, magic is perceived as somewhat rigid and prescriptive, and though it is indeed powerful, it has limits. Sorcery is what you get when you break past those limits. It's more intuitive, it's raw power. It reshapes reality. It can achieve far more, but it's unpredictable, perhaps dangerously so.

LLMs? You throw an obscene number of very fast parallel processors (most built by NVidia which has suddenly become wealthy as a result) into a data center, creating a vast neural network, and force-feed it the contents of the entire Internet, and it learns that certain words tend to follow other words. It turns out that once your LLM gets really, really good at predicting the next word in a sequence, it gains the ability to solve a wide range of problems that we would struggle, really struggle, to solve with traditional programming. It's so good at so many things, it has prompted a lively debate about whether the technique has limits, and how those limits might stack up against those of our own brains.

The "T" in "ChatGPT" stands for "transformer". Very loosely -- because I'm a lifetime traditional software developer and not an AI expert -- the addition of transformers to machine learning systems helped them to focus on a specific task, like a magnifying glass focusing sunlight. That analogy is probably more wrong than right, but I think I'm safe in saying that adding transformers was an evolutionary step that augmented the revolutionary step that was switching from traditional programming to neural-network training.

It remains to be seen whether more such evolutionary steps are forthcoming. Much of the last few years' progress has simply been the result of adding horsepower.

But! This is not an either-or situation! The world remains chock-full of traditional computers and traditional software that is very, very good at solving the specific problems for which it was built. LLMs like OpenAI's, Anthropic's, Google's (and so on) have Application Programming Interfaces (APIs), which you can think of as big plugs into the Internet. Anyone with a bit of programming know-how can tap into these systems and make them part of a traditional system, with its databases and websites and all the rest.

You don't have to choose between magic and sorcery. Instead, you choose how best to combine them.

For the past year or so, I've been playing with the new technology (as if I could resist). I started with giving it mental puzzles and observing its inability to solve them. I chortled at its inability to do math. Then I watched as successive versions got better. Instead of being baffled by my logic puzzles, they were uncovering gaps in my own understanding. I began using the chat systems to double-check my reasoning, as a "second pair of eyes" while working through the perplexing decisions of daily life. At times, it caused me to revise my thinking. I began toying with its ability to write software. These days I'm using Cursor and getting familiar with the more... collaborative development flow, of talking to the IDE, guiding it, using it to catch and fix my mistakes while I seek to find and correct its mistakes. I built a website with a fairly traditional database design that incorporated the LLM to calculate some of its answers.

My decades of hard-earned software engineering experience are telling me that the right magic/sorcery balance goes something like this:

If a part of the problem can be solved with traditional software, do that.
When you encounter an issue that requires something more like human judgment, intuition, creativity, or writing or understanding natural language, have the LLM (or other ML technology such as a vector database) do that part.
Watch the AI bits like a hawk. If you catch it making mistakes, see if you can break the problem into smaller ones, with traditional programming as the glue.

I'm reminded of the advice we were given when programming the older computers that understood BASIC, but could also be programmed in assembly language. Assembly was much faster to execute, but a hell of a lot more difficult to write, read, and debug. The advice was: First, write everything in the higher-level language. Then figure out where you need the extra speed, and rewrite just those parts in assembly. A little goes a long way. You end up with a hybrid program that gets the job done without too high a maintenance cost.

For many applications, I suspect the same basic principle applies to neural network tech: A little goes a long way. For best results, use it just as and where you need it, rather than expecting it to solve the whole problem.

This brings me to my latest project, the one that has most recently set my brain on fire. I'm tinkering with a system based on N8N. That's a short form of "nodemation", a mash-up of "node" and "automation". It's meant to plug lots of systems together, and critically, it includes an AI agent component powered by an LLM. N8N is open source, and not too difficult to set up on an EC2 box (in AWS, naturally -- I'm kind of steeped in the Amazon ecosystem after working there so long). It's even easier to get started now that you can ask Cursor to write the CloudFormation template and the Dockerfile and all that other low-level infrastructure. The stack looks like this, bottom to top:

AWS account with a CloudFormation template to create the infrastructure.
EC2 for the base system, running Docker. I could have gone another route, with ECR, or relied on Lambda, but the EC2 route feels like a great way to start small and iterate quickly.
Postgres database. Caddy for the website. N8N. Each runs in its own container.
Because I'm a Ruby enthusiast, there's another container for that, running Sinatra to provide a local REST-based API where I can compose more routines to plug into N8N.
Slack bot for communication with the human user.

Some of this is not novel. If you search on YouTube, you can easily find videos of others similarly assembling N8N-hosted agents that communicate via Slack. That doesn't bother me. In a world with so many software developers, I don't expect to be the first to come up with an idea. Rather, the fact that others have independently arrived at the same design serves to validate the choice.

What to do with this system? I couldn't help thinking about the Spike Jonze movie Her (2013, with Joaquin Phoenix and Scarlett Johansson). When Theodore begins conversing with his new AI operating system, the first problem they tackle is sorting through his mountain of email. Well, hell, I have a mountain of email too. These days, so does everyone (except those folks who are way more organized than the rest of us). I set about tying the system into my GMail account. This required tackling some thorny OAuth stuff, but with the endlessly patient LLMs (a mix of Cursor and ChatGPT conversations) guiding me past the land mines, I got over that hurdle. A conversation with the agent goes something like this:

Me, in Slack: "Please summarize my most recent email."
Agent: "You don't appear to be authenticated to Google yet. Let me start that process."
Agent: "OK, here's a <link to Google>. Click on that, then come back here when you're done."
Me: (Click the link, sign into my Google account, agree to grant permissions to the agent-website)
Me, in Slack: "I did that; please proceed."
Agent: "I see you're authenticated now. Let me scan your most recent emails..."
Agent: "Here are the subjects of those messages... <List of emails and what they're roughly about> Your inbox shows a mix of business news, job opportunities, service notifications, and social media updates from the past day."

From a developer experience, 90% of the time and effort has gone toward building the "magic" part, the traditional software and architecture bit. Once built, that portion is pretty close to 100% reliable. The agent provides the "sorcery" part. The result is a system that I couldn't, if I had another 20 years, have built entirely of traditional software because it incorporates understanding, or at least pattern matching. The sorcery bit is unpredictable, it'll make mistakes sometimes, but given a limited problem scope it should be reliable enough to be useful.

The Postgres database provides the storage layer, used by N8N and the agent itself to keep context. This is why you can say "please proceed" instead of starting all over or repeating the original request. Google credentials are also stored in Postgres, in a different table, and keyed to the user ID of the Slack account, so each person should see their own email. The Ruby API piece also has a table to hold customer-keyed information. The authentication flow is mediated by the Ruby API and uses a standard Google library.

There are several things about this that I find exciting, and remind me of how I felt when I was six.

First, as a developer this feels like a different way of developing, because I can converse with the program I'm writing as I'm writing it. Example conversation:

Me: "Please list what tools you have available."
Agent: "I see four tools I can use: <describes each>"
Me: "Please try authenticating me to Google."
Agent: "I'm receiving an error: <description>"
Me (some time later): "I checked the logs, found and corrected an issue. Please try again."
Agent: "Thank you for fixing the issue! I confirmed I can now authenticate. Here is the link:..."
Me: "I'm authenticated. Please list my recent emails and categorize them for me."
Agent: "I see three different email threads with <some boring details but no specifics about email subjects>"
Me: "I'm really looking for categorization of emails into topics based on the senders and subject lines."
Agent: "I'm sorry, the tool I have available only allows me to see opaque message IDs and not the emails' subjects or message bodies."
Me [smacking myself, going back and adding another API]: "OK, I've now given you another tool that should allow you to look up the contents of a message by its ID. Can you now try again to summarize my email?"
Agent: "Yes, I see the new tool. Let me try it out..."
Agent: "Success! I can now view the subject lines."

That's... pretty wild.

Second, there are occasional flashes of insight from the sorcery that leave me blinking. One of these happened while I was asking a fairly simple question:

Me: "Please list what tools you have available."
Agent: "I see four tools I can use:"

"A tool for authenticating to Google"
Etc.
Etc.
"A tool for sending messages in Slack (like this one)"

Emphasis mine. I stared at those three words and found them kind of shocking. Here's why: I didn't ask the LLM to note what was significant or interesting about any of the tools; all I wanted was a list. But it took the time to generate those three extra words "on its own initiative". What's more, it suggests the LLM noticed that this particular tool was one it was using to convey its answer. More than that, the LLM was still assembling its answer; it couldn't invoke the Slack tool until all the text had been written. So you could say it was anticipating that it would be using that tool to send its response.

I'm not gonna say that LLMs are self-aware. I'm not touching that with a ten-meter cattle prod. I just find myself gobsmacked when the sorcery spits out something like this.

I also got an interesting response when I asked the agent to describe its operating environment. Like all the big LLMs available today, if you ask it "who are you", it'll name itself accurately (in this case "I'm Claude, an AI assistant created by Anthropic"). But beyond that, the agent looked at the list of tools available and made educated guesses -- some correct, some not -- about what its purpose was:

I'm running in a specialized environment that integrates me with various tools and services. In this system, I have access to several functions including Slack messaging capabilities (which I'm using to respond to you now), Gmail integration through a Ruby API backend, and Google OAuth2 authentication management. The system appears to be designed as a customer service or productivity assistant platform where I can help users with email management and communication tasks. ... This suggests I'm operating within a larger business application ecosystem rather than as a standalone chatbot. The environment handles user authentication, API integrations, and maintains state across different services ...

At the moment, this thing primarily serves the purpose of educating me, its developer, on how to assemble an agent-based system and make it reliable and broadly useful. It may, in time, actually prove useful to me personally as a productivity assistant; that's broadly the goal I have in mind for it. "Operating within a larger business application ecosystem"? Intriguing conclusion. Maybe someday? I decline to speculate.

A third reason why this project is fomenting cerebral conflagration: It's hard to see where it would end. The world of software today is so, so different from what it was in 1980, or 2000, or even a few years ago. Nearly every business has an Internet presence, and many offer APIs, and that's an important reason why N8N (and similar automation tools) are now attractive prospects. The systems of corporations and hobbyists alike become Lego™️bricks that one assembles into magical new projects, with a dab of sorcery in the middle. One of my other projects has been assembling a tool to aid job seekers in their networking efforts. With a bit more glue, I could see asking the agent to periodically scan my inbox for messages from LinkedIn or Glassdoor, rate their suitability as a match for my own resume, add the details to my contact-tracking system, assemble a first draft of a cover letter and customized/targeted resume, then ask me if I'd like to apply. (It could assemble the draft in GMail, allowing me to do last-minute tweaking before hitting "send".)

What else? It could help keep my to-do list on track, reminding me of long-term goals I've set for myself, but for which I've yet to add anything specific to my calendar. It could become familiar with my writing on this blog, scan recent news articles, and point out recent events that look like fodder for topics I enjoy writing about. It could be given access to the household's recent Amazon order history and suggest ways we could better stick to a budget. It could suggest ways to rebalance my retirement portfolio. Naturally, I could plug text-to-voice and speech recognition into Slack and then talk to it. Perhaps Scarlett Johansson would license her voice.

It could, possibly, and with plenty of paranoid oversight (abundant "human in the loop" checks), be given a bit of introspective knowledge about its own API and the ability to try writing new routines. Rather than writing the Ruby (or whatever language) code myself and then plugging it in, or even enlisting Cursor to write the code, the agent itself could be given one of the broader goals above and asked to plan out how to achieve it. It could deploy the new API, modify and deploy the N8N configuration, then test its new tool, evaluate the results and iterate.

That's ambitious, and shades into the dicey "recursive self-improvement" camp that causes AI doomers to freak out. By nature I'm a cautious guy, so I'm not about to charge in that direction, but neither am I taking it off the table.

There was a universe of possibility inside those early personal computers. There's a multiverse of possibility inside the systems we have today. Not even the sky is the limit.

Weird science
Magic and technology
Voodoo dolls and chants, electricity
We're makin' weird science
Fantasy and microchips
Shooting from the hip, something different
We're making weird science, ooh
My creation, is it real?
It's my creation, I do not know
No hesitation, no heart of gold
Just flesh and blood, I do not know
I do not know
From my heart and from my hand
Why don't people understand
My intentions

- Oingo Boingo

Search This Blog

Infinite Quack

Brain on fire

Comments

Post a Comment

Popular posts from this blog

18 Lessons Learned After 18 Years At Amazon

Success is invisible