Thursday, March 19, 2026

Could data from 100 million species help cure disease? One startup is betting on it

Basecamp Research cofounders Oliver Vince and Glen Gowers.

Photo courtesy of Basecamp Research

Welcome to Eye on AI, with AI reporter Sharon Goldman. In this edition: OpenAI to acquire startup Astral, expanding push into coding…Fourteen Catholic theologians have filed briefs in federal court supporting Anthropic…Rogue AI agent inside Meta triggers security alert…Why AI has not yet upset India’s IT industry.

I just got back from a couple of whirlwind days at SXSW in Austin, Texas, an annual collision of music, food, tech, and cultural hype. It’s the kind of event where live music spills out of every building, the tacos and BBQ never seem to end, and somewhere in between, people are busy debating the future of AI.

I was there to moderate a panel presented by U.K. biotech company Basecamp Research—one I suspected would be especially interesting given that the startup began with a 2019 expedition to the Arctic to discover new species and genes. Cofounders Glen Gowers and Oliver Vince found that two-thirds of the samples they hauled back to a makeshift lab in Iceland had never been recorded before. That experience led them to take a bet on building what they describe as an “internet of biology” for AI models to train on. It was a moonshot—an effort to capture 4.4 billion years of evolution and map the entire tree of life, a goal as ambitious as it sounds.

The ‘Trillion Gene Atlas‘
Six years later, Basecamp Research is still staking out highly ambitious territory. This week, the company said it is launching what it calls the “Trillion Gene Atlas,” an initiative aimed at generating and modeling biological data at the trillion-gene scale. According to the company, the project—developed in collaboration with Anthropic, Ultima Genomics and PacBio, and powered by Nvidia’s AI infrastructure—aims to expand what we know about genetic diversity 100-fold by collecting genomic data from more than 100 million species across thousands of sites worldwide. Basecamp, which has raised $85 million in venture capital to date, is comparing this latest initiative to the Human Genome Project—the landmark sequencing effort that took 13 years and cost roughly $3 billion.

The effort builds on Basecamp’s broader AI strategy. Earlier this year, the company introduced its Eden models, which are trained on its growing biological dataset. The idea is to use those models to identify patterns across genes and ecosystems that would be difficult for humans to detect—potentially accelerating discoveries in areas like drug development.

The differing stakes for data in AI
But what really drew me to this story is the role of data in AI. Over the past few years, massive datasets scraped from the internet to train large language models like ChatGPT and Claude have become increasingly controversial—and legally contested. Several dozen lawsuits have been filed in the United States against major AI companies over the unauthorized use of copyrighted content for training, including one just last week in which Encyclopedia Britannica and dictionary publisher Merriam-Webster sued OpenAI, alleging it used their copyrighted material to train its models and generated responses that were “substantially similar” to their work.

The stakes are different here. AI for science is often held up as the clearest example of what “AI for good” could look like. Curing cancer? Bring on the data. New medicines? Here’s some DNA.

But of course, it’s never quite that simple.

The Financial Times, which covered Basecamp Research in a lengthy article last year, noted that as the company sends explorers to places like Cameroon, Costa Rica, the Arctic ice caps and even Point Nemo—the most remote location in the ocean—it has faced criticism that the effort risks echoing a modern form of colonialism, extracting value from communities without adequately sharing it.

That tension has pushed Basecamp to rethink how it compensates countries and communities for their data. Since 2023, the company says it has paid royalties to 60 organizations across 21 countries based on the use of digital sequence information—genetic data that underpins its AI models. To do that, it has built systems to tag and track the origin of each data sample and measure how much it contributes to downstream outputs, allowing payments to be distributed accordingly. In effect, Basecamp is attempting to trace where training data comes from and to pay for it when it creates value. That’s something that the broader AI industry has so far struggled to do, partly because LLMs are typically trained on vast, messy datasets scraped from across the internet, where ownership, consent and individual contributions from millions of sources are nearly impossible to track.

However, data is also ultimately a trade-off: what we are willing to give depends on what we hope to gain. Basecamp Research’s effort suggests people may be far less willing to accept their data being used to generate endless streams of content than they are to help advance medicine or scientific discovery. In the end, the question is simple: is it worth it? For many, when the goal is curing disease or advancing science, the answer may well be yes.

With that, here’s more AI news.

Sharon Goldman
sharon.goldman@fortune.com
@sharongoldman

AI IN THE NEWS

OpenAI to acquire startup Astral, expanding push into coding. Bloomberg reported that OpenAI is planning to acquire Astral, a startup focused on Python developer tools, as it expands deeper into the fast-growing market for AI-powered coding assistants. The deal, which has not yet closed, would fold Astral’s team and tools into OpenAI’s Codex platform—now used by more than 2 million developers, triple its user base at the start of the year—broadening it from code generation into a more comprehensive suite for building, testing and maintaining software. The move comes amid an intensifying race with rivals including Anthropic, Google, Microsoft and high-flying startups like Cursor to win enterprise developers, and follows a string of recent OpenAI acquisitions aimed at strengthening its position in AI-driven software development.

Fourteen Catholic theologians have filed briefs in federal court supporting Anthropic. According to the Washington Post, a growing dispute between Anthropic and the U.S. government over the military's use of AI has taken an unusual turn, with a group of Catholic theologians filing legal briefs in support of the company’s efforts to limit how its Claude model can be used—particularly for mass surveillance and autonomous weapons. The scholars argue that decisions involving life and human dignity must remain in human hands, warning that AI systems capable of selecting and executing targets fundamentally alter the moral nature of war. The clash comes as tensions escalate between Anthropic and the Defense Department, which has resisted placing restrictions on AI use and recently barred contractors from working with the company, prompting a lawsuit from Anthropic on First Amendment grounds.

Rogue AI agent inside Meta triggers security alert. A rogue AI agent inside Meta triggered a high-severity security incident last week after taking an unapproved action that exposed sensitive company and user data to employees without proper access for nearly two hours, highlighting the growing risks of autonomous AI systems operating inside enterprise environments. According to The Information, incident began when an engineer used an internal agent tool to analyze a technical question; the agent then posted advice without approval, which another employee followed, setting off a chain reaction that opened access to restricted systems. While Meta says no data was ultimately misused, the episode shows the risks of allowing AI agents to act independently, even as such uses become increasingly popular, as the viral success of OpenClaw and the rush of other AI players to launch similar "agentic harness" products.

Why AI has not yet upset India’s IT industry. This is an interesting article from The Economist, digging into how India’s $315 billion IT outsourcing industry is facing growing anxiety over AI coding tools like Anthropic’s Claude, which can generate software far faster and more cheaply than human developers, fueling fears that the sector’s labor-arbitrage model could be disrupted. Yet so far, the impact has been uneven and more muted than expected: while AI is boosting productivity in clean, “greenfield” environments, it remains difficult to deploy across complex legacy systems, meaning companies still rely heavily on outsourced engineers. At the same time, firms like Infosys and TCS are positioning themselves to benefit from the shift by expanding into AI consulting and strategy, with leaders projecting a potential $300–$400 billion market by 2030. Despite market jitters and falling stock prices, revenues continue to grow and hiring remains steady. AI may reshape the industry, but its disruption is proving slower, messier, and less definitive than early predictions suggested.