Amid all the scary and shocking AI headlines, there are at least two big areas where large language models will have obvious and immediate positive impacts on daily life: education and medical care.
Today we got great news on the medical front:
In real-world test, an AI model did better than ER doctors at diagnosing patients (NPR reporting on a study in Science)
When you read the underlying study, it gets even better. The large language model that accomplished this feat was OpenAI's o1-preview, which was released 19 months ago and has already been replaced by much better models from multiple companies.
o1-preview did not just "do a pretty good job" the way AI does with images that look pretty real but a little off. Instead, it crushed the human-doctor-only comparison group.
From the study:
We report the results of a physician evaluation of a large language model (LLM) on challenging clinical cases across five experiments with a baseline of hundreds of physicians. We then report a real-world study comparing human expert and artificial intelligence (AI) second opinions in randomly selected patients in the emergency room of a major tertiary academic medical center.
In all experiments, the LLM outperformed [human] physician baselines and displayed continued improvement from prior generations of AI clinical decision support. Our study suggests that LLMs have eclipsed most benchmarks of clinical reasoning...
From the standpoint of providing the best possible patient care, there is already a very strong case for requiring every physician to run every decision through an AI-powered second opinion. It will take years for that to roll out in real life, but the technology was ready for prime-time a year and a half ago. And the current models from all major companies are much better than the one that was used in this study.
That may seem like a bold statement that's at odds with a lot of AI-skepticism and AI-concern, so I want to dig deeper today and make a thorough case for a large number of ways that AI will unequivocally benefit society.
The Star Trek holographic AI doctor will see you soon.
The good news about AI has a huge time lag
This study used a model that is 19 months old. That is so old in AI-model-time that it is not even available to the public anymore. And yet this study was literally published yesterday.
By contrast, the bad news and scary things take no time at all to publish. You can go interview Jack Dorsey about why he laid off 4,000 people in the hopes of replacing them with AI, and you get an immediate viral story. You can just ask a bunch of people if they're scared of AI taking their jobs and have infinite content. And if you really want to feel bad about the state of the world, you can simply read anything about how Elon Musk and Sam Altman behave toward their fellow human beings.
These are all true, salient, immediate concerns, so naturally they overshadow the very-slow-moving and rigorous research proving that AI can directly help doctors save lives in the ER.
This creates a major discrepancy between the vibes of the median news article about AI (fear and loathing) and the actual potential of AI (unlimited extraordinary doctors).
On top of that, many casual AI users are on limited free plans that only give them access to hamstrung versions of older models. These free users often include the people writing the articles (or social media posts) about AI. The lack of familiarity with the cutting-edge models – which cost $20+ per month – compounds the time-lag problem. For example, the New York Times published a cheeky op-ed two weeks ago making fun of the Meta sunglasses and AI in general. A quote:
"Critiquing A.I. these days is like shooting fish in a barrel — and I mean poorly animated fish that keep sprouting human fingers inside a barrel that, as soon as you ask it a question or two, reveals itself to be a Nazi."
I know this is a joke. But it also conveys that the author hasn't used an image-generation model in at least a year (they are flawless now – no more seven-fingered hands and misshapen creatures). The author apparently remembers a series of hate-speech incidents with Elon Musk's Grok model but does not realize (or care to note) that Grok is an embarrassment relative to its competition.
And yet millions of people have likely read that article in earnest... 19 months after AI already outperformed human emergency room doctors!
This is not to say that any of the concerns about AI should be ignored – we take them very seriously and address them continuously in everything we do here at Innovating with AI. But they also need to be understood in the context of a very fast-moving media environment with an insatiable appetite for negativity, and contrasted with very slow-moving research showing how good large language models are getting.
Reasoning models changed everything – o1-preview was the first
The study notes that o1-preview kicked the proverbial butt of its predecessor, GPT-4, at the emergency room experiments. That's because o1-preview was the first reasoning model. Nearly two years later, this is now the industry standard.
Reasoning is the innovation that allows the model to iterate on itself and check its own work. Rather than a "one-shot" attempt at an answer, the reasoning model is basically designed to talk to itself for a bit (this is called "chain of thought"), search the web, compare and contrast ideas and approaches, and so on. This is why GPT-5 and Claude Opus 4.7 are leaps and bounds better than the models from 2024 and prior. They nearly never hallucinate because they check their own work, show their sources, and are just generally way more robust.
In other words, the models of 2026 are barely recognizable compared to the models of 2024. They are so much better that it would be more accurate to treat them as a different technology altogether than to just call everything "AI."
But it has only been 19 months! Remember, most people aren't AI enthusiasts like us, reading and writing and playing and building every day. They aren't thinking in AI-industry-time where major changes happen every few weeks.
It took 36 months to make the sequel to the Super Mario Bros. movie – so why would people expect GPT-5 to be that much different than GPT-4?
And yet, it is a whole new ballgame, and we have only just begun to see reasoning models in action in everyday life.
When you are saving lives in the ER, the debate about AI feels a lot different
AI needs a lot of computer chips in a lot of data centers, and the biggest tech companies are spending like crazy to build them. Demand is so high that Claude and Gemini are constantly unavailable for power users despite being run by two of the most valuable companies in the world.
An anti-data center protest (Source: NBC News)
When you look at the data center buildout in the context of ideas like "AI is going to cause mass unemployment," it naturally does not make much sense to support big corporations creating gigantic server farms. When your expectation is that AI is somewhere between frivolous and harmful, arguments about electricity prices, pollution generation and water usage hit home. This is why you see a ton of opposition to new data centers from local governments and community groups, who have an outsize role in land use decisions.
But what if you instead said:
"We are going to build a data center to help doctors make better, faster medical decisions in the ER and save more lives."
No doctor gets laid off. Every patient gets better care. The outcome we get from that data center is entirely positive for everyone.
In my view, this perspective makes it really difficult to make the environmental and anti-capitalist arguments. How many bottles of water would you use to save someone's life in the emergency room? I think most people would say something close to "unlimited." Would you be comfortable fueling the new, incredibly accurate, low-cost, life-saving AI medical assistants with natural gas while we wait to build out more solar capacity? I think many people would be fine with that trade.
Medicine is just one of several high-impact frontiers where AI can improve life for pretty much everyone, without eliminating any jobs. Education is another: the best large language models will be used to create custom tutors to augment every human teacher or professor. Just as AI allows doctors to give more customized, accurate and complete care, AI can allow teachers to give a more customized, accurate and complete learning experience to children and adults alike. No teacher gets laid off; instead, they become sort of like superhero versions of their current selves.
Notice that in both cases, the humans in the loop are providing services that we widely admire and for which demand is very high. Everyone needs education and medical care. There are shortages of human workers in both industries. Anything that helps people do those jobs better and faster is a positive for everyone – there is no loser in the equation.
Of course, not every job is like a doctor or a teacher. Many jobs – web designer, bookkeeper – are basically neutral for society, and there is a limit to the demand for the work. If the books get done 10x faster we're not necessarily going to have 10x more transactions waiting in the wings to be categorized – we as an economy will eventually hire fewer bookkeepers. This is where the layoffs and unemployment happen, and they are genuinely a big problem that we need to figure out how to handle. But they are a separate thing from the major, society-wide, everybody-wins positive results we'll get from AI in fields like medicine and education.