The Most Important AI Paper of The Year Was Just Released
The Weekend Leverage, December 14th
This was one of those weeks where the smaller headlines were the more important ones. It is tempting to read about the big transactions (like IBM buying Confluent for $11 billion this week) or the latest drama on tech twitter. But you’re a reader of The Leverage and you know better. Today’s half-noticed science is tomorrow’s trillion dollar outcome. And in that, this week we feasted. There is some incredible science and startups happening right now.
We’ll get to that. But first, today’s issue is brought to you by Radiant.
Here’s the dirty secret I’ve learned about being a founder: half the job is just... writing the same email over and over. The post-meeting recap. The Slack summary. The “per our conversation” follow-up that somehow takes 20 minutes to draft even though the meeting was only 30.
It’s death by a thousand tiny admin tasks.
Radiant is trying to disrupt that. It’s an AI meeting assistant that listens to your calls and actually drafts the follow-up work—the emails, the briefs, the task lists. You review, you send, you move on. No bot awkwardly joining your Zoom and no video recording you’ll never watch.
I’m skeptical of most productivity tools. Most of them just add another dashboard to check. But the pitch here is simple: Radiant does the work you were going to do anyway, just faster. That’s it.
Go beyond just taking notes. Radiant turns your conversations into action with an AI meeting assistant and workspace that gets things done.
MY RESEARCH
Anti-slop list. To fight against the wave of slop in our lives, I recommend seeking out sources of productive friction. This is media that is challenging, but gives outsized rewards for your effort. Find 15 books and films for you to consume over this holiday break here.
WHAT MATTERED THIS WEEK?
AI LABS
We can officially train robots using generated video. The most important thing that happened this week is that Google latest research paper, proving that we can use AI-generated video to train robots. Google tested whether a video-based simulator (built on Veo) could predict how often a robot would succeed at real tasks, and it matched reality closely. Across 8 versions of the same robot policy, the simulator’s predicted success rates and the robot’s real-world success rates had a 0.88 correlation (where 1.0 would be a perfect match). In practice, that meant the simulator could reliably tell which versions were better or worse—e.g., the best checkpoint was about 70% real-world success on their test set.
The implication is you can do far more training in simulation and have higher confidence those gains will transfer to physical robots. This is important because the biggest issue in robotics has been having enough training data to make them able to reliably do tasks. Previously, you had to have robots practice a job over and over again. Google’s paper proves that companies can have significantly faster deployment cycles if they use a simulator + data pipeline versus exclusively training via practice. This means that video generation startups like Runway are significantly more valuable than just “cheaper way to make a video.” We now have scientific proof that Runway and its ilk are a “cheaper way to make robots” which is a much bigger (and more important) market.
This is an idea/result that has been whispered around Silicon Valley for a while now, but this paper is it, the big proof. It will be full-steam ahead now with every major tech company in the world going after robots. You don’t need more robots to train, you just need more GPUs. For more reading on this topic you can read my research on how robots will change labor and frameworks by which to evaluate which startups will win in robotics markets.
New OpenAI model is…fine? The company released GPT 5.2 this week after what Sam Altman called for a “red alert” in response to competitive pressure from Google and Anthropic. In my personal evals, the model is best in the world at complex queries and research tasks. However, I’ll continue to use Opus 4.5 as my daily driver because Anthropic’s speed and writing quality remain superior.
I asked 5.2 Pro to remake the chart on the right for me in The Leverage’s chart style. Here is a list of its errors: wrong shade of green, incorrect label for one bar, incorrect orientation, missing 5.2 Thinking, missing crucial data labels, Chart title and subhed slightly overlap, and ignored my usual brand rules for not having gridlines. The model took 10 minutes and 42 seconds to make this many errors. We have a ways to go before we are all unemployed because, ironically, this chart shows how 5.2 Pro is the best model in the world at knowledge work tasks.
This all points to a broader concern for foundation model companies: all of these models are pretty much “smart” enough, in that they can do individual tasks at a skill level akin to most knowledge workers. However, the lack of continuous learning makes them unreliable companions for the average user, and enterprise workflows require more integrations and better software than a chatbot can provide. So they are all kinda…stuck. Until some fundamental science questions are resolved for AI (learning, memory, context windows, etc) foundation models will have to push into the applications market like coding agents. For more on this strategy, and why Anthropic has used it to add billions in revenue in the last 12 months, you can read my analysis here.
Chatbots aren’t going to fix education, you dummies. This week Elon Musk and xAI announced a partnership with the government of El Salvador to bring their Grok chatbot to over 5,000 schools. Ignore the sycophancy issues, the times that Grok called itself MechaHitler, or argued that Elon was the best in the world at everything (including drinking piss). Put those technical hiccups aside and still, this partnership is a fundamental misunderstanding of how education works. I am incredibly bullish on the role AI can play in education, but it has to be paired with careful instructional design and motivation systems. Just giving kids a chatbot and saying “go be smart now” is the equivalent of setting off a nuclear bomb of cheating and sloppy thinking. I appreciate the goal to help children, but everything we know about AI and education says this will fail. For 3,000 words of analysis on how AI actually can save education, read here.
DEAL VIBES
What AI-accelerated science looks like is starting to take shape. Two deals this week show what that will look like in practice.
Medra raised a $52 million Series A, by betting that the real bottleneck isn’t “more model intelligence” but mechanical intelligence: they need a physical AI layer that can actually run instruments end-to-end, in addition to a model that proposes experiments, interprets results, and improves protocols in a loop. Conceptually, that’s a deeper-in-the-stack wedge: if you own execution (robotic manipulation, integration, reliability, error recovery), you get the right to standardize how experiments happen—and you turn every action into structured feedback that compounds over time.
Excelsior Sciences is aiming one layer above that, where the scarce resource is not robotic dexterity but legible chemical data. It raised $70M in Series A funding plus a $25M grant from New York’s Empire State Development (a $95M total package) to push its “smart bloccs” approach. These are machine-ready chemical building blocks meant to make small-molecule synthesis more modular and therefore more learnable for AI systems.
Where Medra’s compounding asset is “the lab’s operating layer,” Excelsior’s is “a new chemical language”: a standardized substrate that lets automation generate the datasets AI needs to actually steer discovery (and potentially manufacturing) rather than merely predict it.
Put these together and the vision for AI-assisted science becomes more clear. You use robots to produce additional data (Medra) and new types of data (Excelsior) and then have a model to interpret all of that together (OpenAI). One thing that isn’t clear to me quite yet is where profit pools will appear as the AI improves. In more traditional robotics markets, the value would accrue for the AI creators and the owners of distribution.
However, since so much of AI science will depend on creating new intellectual property, I think this will be much more esoteric profit capture, an area where outcomes are much murkier.
If anyone has any thoughts here (or is building a company in this space) please reach out! I am anxious to go deeper into this.
Robot brains are happening. Skild, a Pittsburgh-based Robotics company is rumored to be raising more than $1 billion in a Series C round at a roughly $14 billion valuation. The company—which I’ve covered before—is focusing on the left hand side of the chart above. The team wants to build “Any robot. Any task. One brain.” Results like the Google paper above are going to make this round happen. OpenAI and Anthropic have proven what happens when you give the right researchers a huge amount of capital and time.
TASTEMAKER
You are underestimating how good AI content is getting. I have found myself fooled more and more often this year by AI-videos. I’m sure you have too. Now though, I am starting to see videos that aren’t just slopesque, they are also, dare I say it, enjoyable. Take for example, this video from an anonymous account on X called “PsyopAnime.” He made the whole thing himself with AI video generation products like Kling.
Now ultra-violent anime may not be your cup of tea, but this is remarkably good! And all it took was one person’s time. It’s worth watching and seriously considering what happens when anyone can do this.
Have a great week,
Evan
Sponsorships
We are now accepting sponsors for the Q1 ‘26. If you are interested in reaching my audience of 35K+ founders, investors, and senior tech executives, send me an email at team@gettheleverage.com.









