I'm sick and miserable, so instead of doing something productive (or deluding myself that I'm doing something productive), I watched YouTube today. In particular, Freya's beautiful rant. It's a very different take on LLMs from what I normally see in software circles, and it inspired me to finish this post.
When I go to a doctor, I can reasonably expect them to care about my wellbeing. Why?
We, engineers and engineering-adjacent people, are trained to see the world through the lens of useful abstractions. A doctor is a black box that takes test results and produces diagnoses. There are gauges on the box — what fraction of the diagnoses is correct, how long the box took to arrive at them, and how much does the box cost. If we can change the contents of the box (say, to an LLM), and the gauges go the right way, flesh-and-blood doctors are obsolete.
Except there is another, less obvious layer to a doctor visit: duty of care and going to jail.
"Caring" is a fuzzy concept closely related to the alignment problem. However, it can be both fuzzy and tangible at the same time! When I see a doctor, they are legally required to have my best interests in mind. We call this "duty of care". Unlike in AI, there is no need to define "care" precisely. It's defined by other people in a turtles-all-the-way-down fashion.
But maybe we can train LLMs in such a way that people would agree that LLMs "care". Maybe we can train on real doctors' decisions, so that LLMs notice unrelated issues — you shake your doctor's hand, they see a bad-looking mole, and advise you to have it checked, and maybe an LLM can behave similarly. Maybe LLMs can administer compassionate care — judging that this terminally sick patient doesn't need cancer treatment because it'll make their life worse. But what happens when LLMs are wrong?
What happens when humans are wrong?
The main difference between humans and LLMs is not dry performance numbers that black box gauges are showing. The main difference is that an LLM can't go to jail. Moreover, they won't even feel bad or be ostracised by their colleagues. For the entirety of human history, professional capability was bundled with responsibility. Now we see a push to replace humans with LLMs, but because the pushers are engineering-adjacent, they are mostly looking at the gauges, and not at the much fuzzier concept of responsibility.
Except I just lied, dear reader, and we have a precedent for capability and legal responsibility getting unbundled. All automation that you see around you, from elevators to autopilots, has a web of legal responsibility attached to it: some of it is covered by insurance, some by a threat of gross negligence or class action lawsuits, some by human oversight, but we figured out how to deal with consequential, life-threatening decisions being made by machines. This is not new, and LLMs are not particularly different.
A lift "decides" to stop closing its doors when it detects a hand instead of chopping it off. An autopilot alerts the pilot and disengages if anything goes wrong. A robotic arm shuts off when it detects a human body nearby. All those are decisions, even if crude, and when machines make mistakes, we can still find the legally responsible party and compensate the victims.
It follows that the next step in LLM deployment is not pure capability, but legal frameworks and insurance that can cover an LLM agent accidentally draining the entire trading firm's balance after a clever prompt injection hack. The future of LLM deployment and job losses will be decided not by OpenAI benchmarks but by lawyers and insurance companies.
Except I lied again, this time by omission. If general responsibility for deployed systems' actions is a blind spot for most engineering-adjacent people, personal responsibility is a blind spot within the blind spot.
Most people are empathetic. Given an opportunity, they do good by other humans. When they screw up, they feel bad. If they screw up really badly, they might lose their job, or even go to jail. Even if they don't go to jail, the thought of going to jail keeps most people on their toes. Sometimes "screwing up" means just not being empathetic enough, like when the person has a duty of care. When we interact with professionals, we have an implicit understanding of all this, an expectation of a personal, emotional cost to the counterparty if they are negligent or uncaring.
This is the irreducible difference between humans and LLMs. A world full of LLMs in positions of authority is a world full of nepo psychopaths, having no shame, second thoughts, or personal consequences for lying, making mistakes, or being negligent. Being a nepo psychopath is a ceiling of what an LLM can achieve: an agent with human intelligence saying they care or that they feel bad for their mistakes, but feeling neither, and bearing no responsibility.
This world is entirely avoidable.
We need to recognise the value of human caring. Not just emphatically, but as something that professionals bring to the table.
We need to take caring into account when judging KPIs. It's ironic that most engineers wishing to automate professionals away based on benchmarks would recognise that judging their own output on KPIs is reductive.
We can feel bad, we can go to jail, we recognise that in each other.
P.S.: in other words, A computer can never be held accountable, therefore a computer must never make a management decision