Verticalization is the only way these labs stop hitting the wall. We’ve spent two years pretending that a model that can write a sonnet in the style of a 1920s detective can also map a protein fold. It can’t. The launch of Claude Science is a tacit admission that the “one model to rule them all” dream is dead. Now we are in the era of the specialist.
The optics of the launch are telling. Anthropic didn’t drop this in a random blog post for the masses; they took it to an event specifically for pharmaceutical executives and researchers. This is a pivot toward high-margin, high-stakes enterprise work. The general consumer chat market has become a commodity race to the bottom where everyone is fighting over who can summarize a PDF the fastest for the lowest price.
But in biotech, the value isn’t in the summary—it’s in the accuracy of the hypothesis. If you can shorten the drug discovery cycle by even a few weeks, the ROI is measured in billions, not monthly subscription fees. (And probably a massive bill for the tokens).
The problem is that general LLMs are probabilistic by nature. They are designed to predict the most likely next token, which is great for poetry but lethal for chemistry. Using a general model for hard science is like hiring a jazz musician to perform open-heart surgery; sure, they both have a sense of rhythm, but you really want the surgeon to follow the manual exactly. Who is actually paying for this if the model still hallucinates a chemical bond because it looked “plausible” in the training data?
The big question is whether Claude Science is actually a new model or just a curated data wrapper with a different system prompt and a heavy dose of RAG over proprietary journals. If it’s just a fine-tuned version of the existing weights, then it’s a feature, not a product.
It’s a feature, not a product.
For this to be a legitimate “flagship,” Anthropic has to prove they’ve solved the grounding problem. Scientific data is often sparse and locked behind paywalls. If they’ve simply scraped the open web and added a “science mode,” they’re just giving researchers a faster way to be wrong. The real friction here isn’t the UI or the context window—it’s the training data. Most of the high-quality science data is proprietary, meaning Anthropic had to either strike deals with pharma giants or find a way to synthesize data that doesn’t degrade into noise.
I suspect the latter is more likely. We’ve seen this pattern before where a lab claims a “specialized” version of a model only for it to be a thin layer of prompt engineering. Or maybe not—see below. If they have actually integrated a symbolic reasoning engine or a formal verifier into the loop, then the game changes. But they haven’t mentioned a verifier. They’ve mentioned a “product.”
The industry is currently obsessed with the “AI-discovered drug” prize. Everyone wants to be the first to claim they found a cure for something using a GPU cluster. But the bottleneck isn’t the AI’s ability to suggest a molecule; it’s the physical lab’s ability to verify it. Until the loop between the LLM and the wet lab is automated, Claude Science is just a very expensive brainstorming tool.
By Q4, we will see the first peer-reviewed paper that lists a specific vertical model as a primary tool for hypothesis generation, and it will trigger a massive crisis in academic publishing regarding what constitutes “authorship.”
Until then, we’re just watching a very expensive game of “guess the molecule” played by a model that is still, at its core, a very sophisticated autocomplete.