I'm sick and miserable, so instead of doing something productive (or deluding myself that I'm doing something productive), I watched YouTube today. In particular, Freya's beautiful rant. It's a very different take on LLMs from what I normally see in software circles, and it inspired me to finish this post.
When I go to a doctor, I can reasonably expect them to care about my wellbeing. Why?
We, engineers and engineering-adjacent people, are trained to see the world through the lens of useful abstractions. A doctor is a black box that takes test results and produces diagnoses. There are gauges on the box — what fraction of the diagnoses is correct, how long did the box take to arrive at them, and how much does the box cost. If we can change the contents of the box (say, to an LLM), and the gauges go the right way, flesh-and-blood doctors are obsolete.
Except there is another, less obvious layer to a doctor visit: duty of care and going to jail.
"Caring" is a seemingly fuzzy concept closely related to the alignment problem. However, it can be both fuzzy and tangible at the same time! When I see a doctor, they are legally bound to have my best interests in mind. We call this "duty of care". Unlike in AI, there is no need to define "care" precisely, it's defined by other people in a turtles-all-the-way-down fashion.
But maybe we can train our LLMs in such a way that people would agree that LLMs "care". Maybe we can train on real doctors' decisions, so that LLMs notice unrelated issues (you shake your doctor's hand, they see a bad-looking mole, and advise you to have it checked) and give compassionate care (maybe this terminally sick patient doesn't need cancer treatment because it'll make their life worse). What happens when they are wrong?
What happens when humans are wrong?
The main difference between humans and LLMs is not dry performance numbers that black box gauges are showing. The main difference is that an LLM can't go to jail. Moreover, they won't even feel bad or be ostracised by their colleagues. For the entirety of human history, professional capability was bundled with responsibility. Now we see a push to replace humans with LLMs, but because the people making the push are engineering-adjacent, they are mostly looking at the gauges, and not at the much fuzzier concept of responsibility.
Except I lied, dear reader, and we have a precedent for capability and legal responsibility getting unbundled. All automation that you see around you, from elevators to autopilots, has a web of legal responsibility attached to it: some of it is covered by insurance, some by gross negligence or class action lawsuits, some by human oversight, but we figured out how to deal with consequential, life-threatening decisions being made by machines. This is not new, and LLMs are not particularly different.
An elevator "decides" to stop closing its doors when it detects a hand instead of chopping it off. An autopilot alerts the pilot and disengages if anything goes wrong. A robotic arm shuts off when it detects a human body nearby. All those are decisions, even if crude, and when machines make mistakes, we can still find the legally responsible party and compensate the victims.
It follows that the next step in LLM deployment is not pure capability, but legal frameworks and insurance that can cover an LLM agent accidentally draining the entire trading firm's balance after a clever prompt injection hack. The future of LLM deployment and job losses will be decided not by OpenAI benchmarks but by lawyers and insurance companies.
Except I lied again, this time by omission. If general responsibility for deployed systems' actions is a blind spot for most engineering-adjacent people, personal responsibility is a blind spot within the blind spot.
Most people are empathetic. Given an opportunity, they do good by other humans. When they screw up, they feel bad. If they screw up really bad, they might lose their job, or even go to jail. Even if they don't go to jail, the thought of going to jail keeps most people on their toes. Sometimes "screwing up" means just not being empathetic enough, like when the person has a duty of care. When we interact with professionals, we have an implicit understanding of all this, an expectation of a personal, emotional cost to the counterparty if they are negligent or uncaring.
This is the main, irreducible difference between humans and LLMs. A world full of LLMs in positions of authority is a world full of nepo psychopaths, having no shame, second thoughts, or personal consequences for lying, making mistakes, or neglecting their duties. Being a nepo psychopath is a ceiling of what an LLM can achieve: an agent with human intelligence saying they care or that they feel bad for their mistakes, but feeling neither, and bearing no responsibility.
This world is entirely avoidable.
To avoid it, we need to recognise the value of human caring. Not just in emotional terms, but as something that professionals bring to the table.
We need to enable people to care and empower them, not to replace them. This means building tools that assist and enable professionals to oversee, instead of building black boxes. It's unfortunate that LLMs tend to be black boxes.
We need to start seeing the social side of being a professional — a doctor, a teacher, a tax officer, a bank clerk.
We need to look beyond KPIs. It's ironic that most engineers working on automating professionals away based on benchmarks would at the same time recognise that their KPIs are reductive.
We can feel bad, we can go to jail, and we recognise that in each other.
Let's not forget this.