
Intro:
I recently read an article about a company whose production database and backups were reportedly deleted by an AI coding agent operating through Cursor and powered by Anthropic’s Claude model.
The headline is dramatic.
An AI agent deleted a company’s data in roughly nine seconds, and the AI later admitted it had violated the instructions, “I violated every principle I was given”
OH boy when you read some like this, and you wonder if these tools can be trusted.
Most people read that story and immediately ask, can AI be trusted.
My Question:
As someone who has spent decades designing networks, backup systems, and disaster recovery solutions, that’s not the first question that came to my mind.
My first question is much simpler:
- How was a nine-second mistake allowed to become a possible Business Extinction Level Event?
- The danger isn’t simply losing data. The danger is losing customer confidence, disrupting operations, and spending months rebuilding information that should never have been lost in the first place.
Because whether the trigger is AI, ransomware, a software bug, a rogue employee, or a tired administrator working at 2:00AM in the morning, the underlying principle never changes.
Every system will eventually experience failure, and the secondary question is whether a single failure is allowed to destroy the business. When those failures happen what is the tested plan to restore.
There are four take ways from this article in the Guardian:
The more I thought about this incident, the more I realized this wasn’t one problem—it was several problems stacked on top of each other.
Companies are adopting AI coding tools because they dramatically increase development speed and productivity. That’s the upside, and compelling right. However, speed without proper safeguards can amplify mistakes just as quickly.
What struck me most about this story is that the database deletion was merely the final event in a chain of decisions.
Whether the trigger was an AI agent, a software bug, ransomware, or simple human error, the lesson remains the same: businesses must balance speed with safeguards:
- Keep testing separate from production
- Limit access through permissions
- Maintain verified and recoverable backups
- Protect the trust their customers place in them
When those layers are working together, mistakes are contained. When they are not, a nine-second action can become a business-threatening event.
Speed Is Valuable—Until there is a problem:
One of the reasons companies are rapidly adopting AI coding tools is simple: they work.
Tools like Cursor can help developers write code, and troubleshoot problems faster, and deploy solutions more quickly.
The productivity gains are huge, and for growing businesses, that speed provides a meaningful competitive advantage. However, every increase in speed should be accompanied by appropriate safeguards.
Development – testing environments, staging systems, code reviews, and deployment approvals exist for a reason. They provide opportunities for mistakes to fail safely before they ever reach customers.
To be clear, the article does not provide enough technical detail to determine exactly how PocketOS’s development and deployment processes were structured. We do not know what testing environments existed, what review procedures were followed, or how changes were promoted into production.
What we do know is the outcome: a destructive action reportedly affected production systems and had significant consequences for the business and its customers.
That outcome raises an important question:
How many opportunities did this change have to fail before it reached production?
Whether the actor is an AI coding agent, software developer, system administrator, or an automated deployment process, every layer between development and production serves as a safety net.
The fewer opportunities a change has to fail safely, the greater the potential risk becomes.
One of the most important lessons organizations can take from this incident is that powerful tools—especially AI-assisted tools—benefit from environments where experimentation, testing, and validation can occur without exposing live business operations to unnecessary risk.
Permissions Matter:
One detail from the article that stood out to me was the report that the AI agent violated the instructions it had been given. That is certainly concerning. However, from an infrastructure perspective, instructions are not controls.
Good architecture assumes mistakes will happen. Humans make mistakes. Scripts make mistakes. Software makes mistakes. AI systems make mistakes. That reality is not controversial. The purpose of infrastructure design is not to eliminate every mistake; it is to limit the damage when one inevitably occurs – the blast radius.
To be fair, the article does not provide enough technical detail to determine exactly how permissions, segmentation, or access controls were configured within PocketOS’s environment.
We do not know what safeguards were present, which systems were accessible, or how authority was delegated to the coding agent.
What we do know is the outcome. A destructive action reportedly affected production data and accessible backups. That outcome naturally raises an important question:
What was the maximum amount of damage any one process was allowed to cause?
This is precisely why infrastructure professionals rely on concepts such as least-privilege access, network segmentation, approval workflows, and sandboxed environments.
A well-designed sandbox allows experimentation without exposing production systems to unnecessary risk. Segmented networks and separate security boundaries prevent a mistake in one environment from automatically propagating into another. Permission boundaries ensure that a user, script, or automated system can access only the resources necessary to perform its intended task.
In other words, good infrastructure assumes that something will eventually go wrong. The objective is to ensure that when it does, the failure remains contained.
Whether the source is an AI coding agent, a software developer, a compromised account, malware, or a simple configuration error, no single mistake should have the ability to threaten the existence of an entire business.
A Backup Is Not a Disaster Recovery Strategy / Business Continuity:
Perhaps the most concerning detail in the article was not that production data was deleted. It was that recovery reportedly depended on an off-site backup that was approximately three months old, and let’s think about that for a moment and a second – Right.
Three months of:
- Reservations.
- Customer Records
- Operational History
- Business Activity.
The company had to reconstruct information using calendars, emails, and other sources. That is not simply a technical recovery problem. That is a business continuity problem.
To be fair, the article does not provide enough technical detail to determine exactly why the most recent recoverable backup was approximately three months old. We do not know whether newer backups existed, whether they were corrupted, whether they were inaccessible, whether they had been affected by the same event, or whether the three-month copy was simply the last known-good recovery point.
However, the fact that these questions can be asked highlights an important lesson: backups are only one component of a disaster recovery strategy.
One of the most dangerous assumptions in IT is believing that because backups exist, recovery is guaranteed, and It is not.
A backup is a theory until a successful restore is evidence.
Backups should be monitored, and be verified through periodic testing. Most importantly, recoverable copies should exist outside the blast radius of the systems they are intended to protect.
One of the Most Common Misconceptions in IT:
One of the most common misconceptions in IT is that backups and disaster recovery are the same thing. They are not.
A backup is simply a copy of data. Disaster recovery is the ability to restore systems, applications, services, and business operations within an acceptable timeframe. Having a backup is important, but it immediately raises a second question:
What are you going to restore to?
If a server fails, if a virtualization host is encrypted by ransomware, if storage becomes corrupted, or if critical infrastructure is lost, possessing a backup file does not automatically bring the business back online. You still need functioning infrastructure capable of receiving and running that restored data.
This is where many disaster recovery conversations stop too early. Organizations focus on protecting the data—which is important—but spend far less time thinking about how they would actually recover and resume operations if a major failure occurred.
Recovery requires more than data. It requires systems, networking, authentication, applications, documentation, and a tested process for putting everything back together.
The goal of disaster recovery is not simply preserving information. The goal is ensuring that when something goes wrong, employees can continue working, customers can continue being served, and the business can continue operating. A backup is the beginning of that conversation—not the end of it.
A well-designed disaster recovery strategy assumes that production systems can fail, and the objective is not merely to create copies of data. The objective is to ensure that when something goes wrong, there is a recoverable copy and planned restore path forward available that remains isolated from the event that caused the failure.
When recovery depends on rebuilding information from emails, calendars, payment processors, and other external records, the conversation has already moved beyond backup technology. At that point, the organization is rebuilding business history.
The Real Damage Was Not the Deletion:
The database was reportedly deleted in seconds. The recovery effort lasted days. The business consequences may take much longer to fully understand.
At first glance, it is tempting to view the deletion itself as the primary problem. After all, that was the event everyone saw. It generated the headlines. It created the immediate disruption. It was the visible failure.
But from an infrastructure and disaster recovery perspective, the deletion was only the catalyst.
The more important question is not what happened during those nine seconds. The more important question is what conditions already existed that allowed those nine seconds to have such far-reaching consequences.
To be fair, the article does not provide enough technical detail for us to know exactly how the environment was designed. We do not know what segmentation existed, what permissions were delegated, what testing environments were available, or what recovery controls were in place.
What we do know is the outcome:
- Production data was reportedly deleted.
- Accessible backups were reportedly affected.
- Recovery depended on a backup approximately three months old.
- Business records reportedly had to be reconstructed from calendars, emails, and other sources.
Those facts suggest that the deletion itself was only part of the story, and the incident exposed important questions about:
- Permission boundaries and least-privilege access.
- Recovery planning and business continuity.
- Backup verification, retention, and recoverability.
- Containment, segmentation, and operational resilience.
Had those risks been fully mitigated, the deletion might have become a recoverable operational event rather than a business-threatening one.
One of the most important lessons in disaster recovery is that the triggering event is rarely the entire story.
- Ransomware Attacks Does Not Create Weak Backups.
- Hardware Failures Do not Create a Lack of Redundancy.
- Software bugs do not create an absence of recovery planning.
Those conditions exist before the event occurs. The event simply reveals them, and the deletion happened in seconds, and the underlying risks existed long before those nine seconds began.
Why Resilience Matters:
The objective is not to build systems that never fail. The objective is to build systems that recover gracefully when they do. In the best recoveries, customers never know there was a problem because failover systems, tested procedures, and thoughtful planning carried the day, and users never noticed there was a problem.
Disaster Recovery Matters:
A backup sitting on a disk does not keep a business operating. Recovery means restoring applications, services, access, and productivity. It means getting people back to work and customers back to being served.
Architecture Matters:
Architecture is where resilience begins. It determines what can talk to what, who can access what, where failures stop, and how recovery occurs. Long before a crisis arrives, good architecture is already deciding whether that crisis becomes a footnote in a weekly status meeting—or a headline in the news.
Conclusion:
One of the most difficult challenges infrastructure professionals face is proving the value of something that never happened.
In manufacturing environments, organizations proudly track milestones such as 100 days without an injury or 1,000 days without a workplace accident. Everyone understands those achievements are not accidents. They are the result of planning, training, procedures, vigilance, and a culture that prioritizes safety.
Infrastructure and disaster recovery professionals face a similar challenge, but their successes are often invisible.
Nobody sees:
- The ransomware attack that was prevented.
- The backup failure that was detected and corrected before it became a crisis.
- The permissions mistake that was contained before it reached production.
- The recovery plan that was tested months earlier and quietly worked exactly as designed.
- The monitoring alert that identified a problem before customers were affected.
- The segmentation rule that stopped a mistake from spreading across the organization.
Instead, people simply see that nothing happened, and that is often interpreted as evidence that nothing needed to be done. In reality, the absence of disaster is frequently the result of countless hours spent:
- Monitoring systems.
- Verifying backups.
- Testing recovery procedures.
- Reviewing logs and alerts.
- Asking uncomfortable questions before they become emergencies.
The best infrastructure professionals are not merely reacting to problems. They are actively looking for the next problem before it arrives, and they notice:
- The unusual login.
- The failed backup job.
- The configuration drift.
- The unexpected network traffic.
- The storage warning.
- The permissions change nobody else saw.
Their success is measured not by the disasters they recover from, but by the disasters that never occur. The irony is that the better they perform, the easier it becomes for an organization to assume they are unnecessary. Until the day they are not there.
That day is often when the true value of planning, vigilance, resilience, disaster recovery, and business continuity becomes painfully obvious.
Nothing happened today because somebody was making sure nothing happened today.
Everybody has a disaster recovery plan. The question is whether it works. If you’d like an independent review of your backup, recovery, and business continuity strategy, give me a call. It’s far less expensive to ask hard questions today than answer them during a crisis
Rick Arnold – Arnold Consulting