Mind the Gap: Closing the Expectation Divide in Cloud & Data Center Services

Global demand in cloud computing industry and data centers is growing faster than ever. There is an explosion of hyperscalers as well as AI workloads that provide unprecedented growth impetus; companies at every level depend on providers to maintain high-performance networks. Yet amid all its innovation and growth, though, one thing is unchanged: the difference between what service agreements promise and what customers expect.

Uptime: The One-Way Street of Gratitude

The majority of professional hosting and cloud agreements make an uptime priority minimum (generally 99.95% and up) for essential network and infrastructure. 99.95% is perfect for the average joe, but in practice it allows for almost 4½ hours of downtime a year. Here’s the paradox: if a provider provides flawless service for years, no one writes a thank-you note. The silence on success is just “business as usual.” But then for 5 minutes when a blip happens … still well under 99.95% of the promise … customer support lines light up and legal clauses get quoted back to the provider. It’s not the case of customers being ungrateful, the lesson is that reliability has been rendered invisible. Uptime is required, and any deviation, no matter how slight or contractually permissible, is regrettable.

Backups: The Unpaid — and Often Misplaced — Safety Net

And the other consistent rub is backup accountability. Many customers of the cloud and bare-metal world think data can be automatically backed up when it resides at a professional data center. In practice, much of the standard agreement does not provide protection for data but access to the essential infrastructure. When a virtual machine fails or a dedicated server’s disk dies, infrequent but inevitable events, customers without a backup plan often require that the provider “just recover it.” And unless backups were part of the contract (or bought as an add-on), the provider can’t magically restore lost data. Another common yet sometimes ignored rule: You don’t have backup and recovery if you don’t pay for them. Customers can and should be told and are supposed to be educated by providers, but the responsibility of protecting data integrity falls to the data owner.

Backups on the Same Server: A Concealable Catch

Even customers who maintain backups can fall into the trap of storing those backups on the same VM or dedicated server they’re trying to protect. When the underlying hardware fails, it means that both the live data and the “backup” could disappear in a single stroke. Real resilience is holding backups offsite or at least on different physical infrastructure — in another availability zone, on another storage platform or through a managed backup service. A backup that shares the same failure domain isn’t a backup at all; it is simply yet another copy waiting to fail.

Planned Maintenance: No Good Deed Goes Unpunished

Even infrastructure most reliably established requires care. Hardware firmware ought to be patched, network gadgets upgraded, and security equipment put to the latest security updates. Nearly every service agreement specifies the timing of scheduled maintenance windows, and providers generally work on those days in the dead of night with ample notice given. Yet maintenance notices regularly provoke resistance. Some customers need zero disruption at any cost, including when the work is needed to prevent future outages. Ironically, the clients who value stability can be hostile to the very processes needed to preserve it.

Bridging the Expectation Gap

So how do providers and customers come together in the middle?

Crystal-Clear SLAs

Service Level Agreements need to be written in plain language, specifying uptime objectives, response times, and — crucially — what is not included. Define roles for backups, recovery and data retention.

Proactive Education

Providers should communicate the reality of uptime %, needs for maintenance, and responsibilities for backups during the sales process, not after the fact.

Shared Responsibility Models

When you hear the term shared responsibility, public cloud behemoths such as AWS and Azure made it famous. The former way, (whether that be infrastructure-as-a-service (IaaS) or colocation), is that the provider maintains the platform, while the customer secures and backs up their data.

Celebrate Reliability

It might seem a little self-obsessed, but frequently appearing as reports of “X days of uninterrupted service” help remind subscribers of what they’re getting back — and can help soften feelings when an unavoidable event plays out.

Not a Transaction, a Partnership

A data-center / cloud agreement is a partnership in its simplest form. Providers agree to world-class uptime, redundancy, and security; clients agree to gauge the extent of those services and plan. And when each side sees the contract as a living document and not fine print, there’s less room for surprise and fewer panicking calls when the inevitable hiccup occurs.

Takeaway: That is, nothing about the world-defining infrastructure is ever “set and forget.” Transparency is the key to successful customer relationships: explicit SLAs, contracts of mutual responsibilities, and an understanding that when it comes to maintenance, backups (carried out in their own locations) and periodic downtime, the system is better for it. Finally, a strong provider is not someone who never does need to worry about a problem, but one who talks things over openly, keeps promises and works with customers to navigate the times when the lights go out.

#CloudComputing #DataCenters #SLA #Uptime #Downtime #CloudServices #Infrastructure #DevOps #ITOperations #ServiceLevelAgreement #HighAvailability #CloudReliability #CloudBackup #PlannedMaintenance #BusinessContinuity

https://www.linkedin.com/pulse/mind-gap-closing-expectation-divide-cloud-data-center-andris-gailitis-wo94f

22.09.2025 0

When the Cable Snaps: Why Regional Compute Can’t Be an Afterthought

It is a web of glass threads lying on the seabed. Twice, in starkly different seas, those threads were cut.

Two *subsea cables** in the Baltic Sea were cut within hours of one another in November 2024, cutting capacity across Finland, Lithuania, Sweden, and Germany.

In *September 2025**, multiple systems in the Red Sea, one of the world’s busiest internet corridors, were damaged and services were decimated across Europe, the Middle East, and Asia.

Each event had its own cause, but the net effect for users, enterprises, and cloud providers was the same: latency spikes, rerouting stress, an unpleasant lesson that our digital lives rely on a handful of physical chokepoints.

## The myth of infinite bandwidth

It is easy to assume “the cloud” will just absorb disruptions. Microsoft and AWS do have very good redundancy, and traffic was rerouted. But physics can’t be abstracted away:

*Latency increases** when traffic takes the bypass thousands of kilometers.

*Throughput decreases** when alternative routes inherit workloads.

*Resilience shrinks** when other cables in the same geography break down.

For latency-sensitive services — trading platforms, multiplayer gaming, video collaboration — the difference between 20 ms and 150 ms is the difference between usable and unusable. Because compliance-heavy workloads must reroute into areas with unknown jurisdictions, this carries very different risks of its own.

Regional compute is the antidote

The lesson is that if enterprises don’t want to expose themselves to chokepoints, regional compute capacity will have to be closer to both users and data sources. Regional doesn’t just mean “they’re all on the same continent.” And those operations must remain so they can continue if a submarine cable was cut and important international routes were taken offline. Regional compute operates in three aspects:

1. Continuity of performance – Maintain fast and stable mission-critical applications when cross-ocean fault paths are broken.

2. Risk diversification – Eliminate dependence on a single corridor — Red Sea, Baltic Sea, English Channel, etc.

3. Regulatory alignment – For some jurisdictions, including the EU, managing data within borders deals with sovereignty requirements as well.

## Europe as a case study—sovereignty through resilience

Europe’s movement for “digital sovereignty” (see NIS2, the EU Data Boundary, AWS’ European Sovereign Cloud…) is frequently presented in terms of compliance and control. But the cable incidents illustrate a more common principle: keeping capacity local is a resilience measure first, a regulatory checkbox second.

If you’re working inside the EU, sovereignty is one factor. If in Asia, the reasoning is similar — no need to rely on Red Sea transit. In North America, resilience might look like investing in a variety of east–west terrestrial routes to protect against coastal chokepoints.

A global problem with regional solutions

Route disruptions, by natural catastrophes, ship anchors, or even deliberate sabotage, have struck the Atlantic, Pacific, and Indian oceans. Every geography has its weak spots. That’s why international organizations are now more and more wondering: Where can we compute if the corridor collapses?

The answer frequently isn’t another distant hyperscale region. It’s:

*Regional data centers** embedded in terrestrial backbones.

*Local edge nodes** for caching and API traffic.

*Cross-border clusters** of real route diversity, not just carrier diversity.

## Building for the next cut

Here’s what CIOs, CTOs, and infrastructure leaders can do:

1. Map your exposure. Do you know which subsea corridors are mostly under your workload? Most organizations don’t. Ask for path transparency from your providers.

2. Design for “cable cut mode.” Envision what happens if the Baltic or Red Sea corridor goes dark. Test failover, measure latency, and revise the architecture accordingly.

3. Invest regionally, fail over regionally. Don’t just copy and paste data cross-sea. Build failover in your own core market when possible.

4. Contract for resilience. Diversity in routes, repair-time commitments, regional availability — build these into your SLAs.

5. Frame it as business continuity. This is not only a network ops situation, it’s a boardroom problem. One day of degraded service can exceed the cost of additional regional capacity.

Beyond sovereignty

Yes, sovereignty rules in Europe are a push factor. But sovereignty alone doesn’t explain why a fintech in Singapore, a SaaS in Toronto, or a hospital network in Nairobi would care about regional compute. They should care because cables are fragile, chokepoints are real, and physics doesn’t negotiate.

The bottom line

Last year’s cable cuts weren’t necessarily catastrophic. They were warnings. And the world’s dependence on a few narrow subsea corridors is increasing, not decreasing. As AI, streaming, and cloud adoption accelerate, the stakes rise.

Regional compute isn’t all about sovereignty. It’s about resilience. The organizations that internalize that lesson right now—before the next snap—will be the ones that stay fast, compliant, and reliable while others grind to a halt.

Subscribe & Share now if you are building, operating, and investing in the digital infrastructure of tomorrow.

#SubseaCables #CableCuts #DigitalResilience #RegionalCompute #DataCenters #EdgeComputing #NetworkResilience #CloudInfrastructure #DigitalSovereignty #Latency #BusinessContinuity #NIS2 #CloudComputing #InfrastructureSecurity #DataSovereignty #Connectivity #CriticalInfrastructure #CloudStrategy #TechLeadership #DigitalTransformation

https://www.linkedin.com/pulse/when-cable-snaps-why-regional-compute-cant-andris-gailitis-we9sf

13.09.2025 0

Why Colocation and Private Infrastructure Are Making a Comeback—and Why Cloud Hype Is Wearing Thin

The Myth of Cloud-First—And the Reality of Repatriation.

For nearly a decade, businesses have been sold the idea of “cloud-first” as a golden ticket—unlimited scale, lower costs, effortless agility. But let’s be frank: that narrative wore thin a while ago. Now we’re seeing a smarter reality take shape—cloud repatriation: organizations moving workloads back from public cloud to colocation, private cloud, or on-prem infrastructure.

These Numbers Are Real—and Humbling

A recent Barclays CIO survey found 83% of enterprise CIOs plan to repatriate at least some workloads in 2024, up from just 43% in late 2020. Read the coverage here: (https://www.eetimes.eu/cloud-repatriation-on-the-rise-83-of-cios-plan-workload-shifts-in-2024)
IDC echoes that 80% of companies expect some repatriation of compute and storage within the next year (https://www.eetimes.eu/cloud-repatriation-on-the-rise-83-of-cios-plan-workload-shifts-in-2024).
Other studies report similar trends: CIOs citing spiraling costs, compliance hassles, and performance concerns as drivers (https://thinkon.com/resources/the-cloud-repatriation-shift).

Still, let’s be clear: only about 8–9% of companies are planning a full repatriation. Most are just selectively bringing back specific workloads—not abandoning the cloud entirely. (https://newsletter.cote.io/p/that-which-never-moved-can-never)

Why Colo and On-Prem Are Winning Minds

Here’s where the ideology meets reality:

1. Predictable Cost Over Hyperscaler Surprise Billing

Public cloud is flexible—but also notorious for runaway bills. Unplanned spikes, data transfer fees, idle provisioning—it all adds up. Colo or owned servers require upfront investment, sure—but deliver stable, predictable costs. Barclays noted that spending on private cloud is leveling or even increasing in areas like storage and communications (https://www.channelnomics.com/insights/breaking-down-the-83-public-cloud-repatriation-number and https://8198920.fs1.hubspotusercontent-na1.net/hubfs/8198920/Barclays_Cio_Survey_2024-1.pdf).

2. Performance, Control, Sovereignty

Sensitive workloads—especially in finance, healthcare, or regulated industries—need tighter oversight. Colocation gives firms direct control over hardware, data residency, and networking. Latency-sensitive applications perform better when they’re not six hops away in someone else’s cloud (https://www.hcltech.com/blogs/the-rise-of-cloud-repatriation-is-the-cloud-losing-its-shine and https://thinkon.com/resources/the-cloud-repatriation-shift).

3. Hybrid Is the Smarter Default

The trend isn’t cloud vs. colo. It’s cloud + colo + private infrastructure—choosing the right tool for the workload. That’s been the path of Dropbox, 37signals, Ahrefs, Backblaze, and others (https://www.unbyte.de/en/2025/05/15/cloud-repatriation-2025-why-more-and-more-companies-are-going-back-to-their-own-data-center).

Case Studies That Talk Dollars

Dropbox’s “Magic Pocket”: Between 2015 and 2016, Dropbox shifted 90% of its 600 PB of customer data from AWS to its own datacenters. The payoff? Almost $75 million saved over two years in operational costs (https://www.datacenterknowledge.com/cloud/dropbox-s-reverse-migration-from-cloud-to-own-data-centers-five-years-on).
37signals (makers of Basecamp and Hey): The cloud honeymoon ended when they realized “renting computers is (mostly) a bad deal” for stable workloads. By shifting to Dell hardware in a colo, they’re slated to save $7 million over five years (https://www.datacenterdynamics.com/en/analysis/cloud-repatriation-and-the-death-of-cloud-only).
Ahrefs and Backblaze: Ahrefs spent ~$60 million building its own infra rather than paying AWS over time. Backblaze runs its own data centers to keep pricing sharp (https://www.unbyte.de/en/2025/05/15/cloud-repatriation-2025-why-more-and-more-companies-are-going-back-to-their-own-data-center).

Let’s Be Brutally Honest: Public Cloud Isn’t a Unicorn Factory Anymore

Remember those “cloud-first unicorn” fantasies? They’re wearing off fast. Here’s the cold truth:

Cloud costs remain opaque and can bite hard.
Security controls and compliance on public clouds are increasingly murky and expensive.
Vendor lock-in and lack of control can stifle agility, not enhance it.
Real innovation—especially at scale—often comes from owning your infrastructure, not renting someone else’s.

What’s Your Infrastructure Strategy, Really?

Here’s a practical playbook:

Question the hype. Challenge claims about mythical cloud savings.
Audit actual workloads. Which ones are predictable? Latency-sensitive? Sensitive data?
Favor colo for the dependable, crucial, predictable. Use public cloud for seasonal, experimental, or bursty workloads.
Lock down governance. Owning hardware helps you own data control.
Watch your margins. Infra doesn’t have to be sexy—it just needs to pay off.

The Final Thought

Cloud repatriation is real—and overdue. And that’s not a sign of retreat; it’s a sign of maturity. Forward-thinking companies are ditching dreamy catchphrases like “cloud unicorns” and opting for rational hybrids—colocation, private infrastructure, and only selective cloud. It may not be glamorous, but it’s strategic, sovereign, and smart.

Subscribe & Share now if you are building, operating, and investing in the digital infrastructure of tomorrow.

#CloudRepatriation #HybridCloud #DataCenters #Colocation #PrivateCloud #CloudStrategy #CloudCosts #Infrastructure #ITStrategy #DigitalSovereignty #CloudEconomics #ServerRentals #EdgeComputing #TechLeadership #CloudMigration #OnPrem #MultiCloud #ITInfrastructure #CloudSecurity #CloudReality

https://www.linkedin.com/pulse/why-colocation-private-infrastructure-making-cloud-hype-gailitis-bcguf

27.08.2025 0

AI Inside AI: How Data Centers Can Use AI to Run AI Workloads Better

This is a high-stakes AI workload host challenge—the machinery has a dense GPU cluster but also hard-to-predict demand and extreme cooling demands. However, the same technology pushing this sort of workload in the future will also help the center run more smoothly, safely, and environmentally friendly.

How to use AI to manage the AI data center in 10 steps:

1. AI models instantly forecast temperature changes. These models can render instant forecasts of airflow patterns to compensate for hot areas by, for example, fitting a contained LC unit that translates recycling air with an independent refrigeration system into cooling power delivered directly on top of electronic parts needing it.

2. Use vibration, power draw, and sensor data from chillers, UPSes, and PDUs to target those pieces of equipment that are likely to break long before they do.

3. Energy-Aware Scheduler for AI Training Jobs. Run the workloads at times when there is a cleaner grid and send those on out to areas with more wind turbines.

4. Optimizing Scheduling of AI Workloads. Spreading GPU-heavy jobs across clusters in order to even out the load saves one region from overloading while others wait.

5. Real-time Adaptive Efficiency Monitoring constantly observes PUE, WUE, and Carbon intensity with real-time recommendations to operations—if everything looks efficient, let’s not get hasty and take a risk that could put us out of business.

6. Building-Intelligence Video-Surveillance Security Anomaly Detection. Scans access logs, security cameras, and network traffic for signs of someone trying to break in.

7. Feature: GPU/TPU Hardware-Health Forecasting. Identifies symptoms of degeneration—error rates increasing, components overheating or running slow—for replacement before training jobs fail entirely.

8. Incident simulation and response planning. Running digital “fire drills” to see what the plant would do when: cooling failed, power was lost, or if there were a cyber attack.

9. Real-time automated compliance reporting ISO, SOC, etc. Using the operational logs of the facility to onboard customers faster. Pulls from system/operational logs for audit reports on-demand (reliable and consistent & audit-ready).

10. Automated GPU node on/off with Intelligent Resource Scaling. It won’t turn on GPU nodes just because you’re using them, it will also try to keep energy costs down through effective management.

In the end, if you have an AI host, then your business should be AI-driven too. It is not a matter of choice, but of necessity in order to deal with the scale and complexity of these modern AI workloads, that we begin using machine intelligence for both heating control and cooling spot-by-spot because it simply has become routine everywhere else.

Subscribe & Share now if you are building, operating, and investing in the digital infrastructure of tomorrow.

#DataCenter #CloudComputing #HostingSolutions #GreenTech #SustainableHosting #AI #ArtificialIntelligence #EcoFriendly #RenewableEnergy #DataStorage #TechForGood #SmartInfrastructure #DigitalTransformation #CloudHosting #GreenDataCenter #EnergyEfficiency #FutureOfTech #Innovation #TechSustainability #AIForGood

https://www.linkedin.com/pulse/ai-inside-how-data-centers-can-use-run-workloads-better-gailitis-1hjzf

18.08.2025 0

Beyond Uptime

Can Yesterday’s Data Centers Handle Tomorrow’s AI?

Industry-wide, thousands of megawatts are hostage to data centers that were limiting AI lifecycles before this technology boom. Some are already constructed, some in the middle of construction — all tailored to dirty workloads that still, for most people (until recently), would have looked nothing like today’s GPU-rich cluster.

With the prevalence of high-density AI workloads, hybrid cooling requirements, and one-minute deployment cycles to keep data centers competitive in an AI-driven world, the question becomes extremely relevant.

1. The AI Workload Shift

Artificial Intelligence is changing the rules of infrastructure.

At the bottom, we have training clusters — One AI training rack can pull 30–80 KW, which is 5x-10x higher than a traditional enterprise rack.
Inference workloads — Not so centralized, but still push physical cooling and networking beyond the realm of legacy architectures.
Dynamic loads — GPU clusters can go from idle to full draw in a second, which both stresses power and cooling systems.

For many facilities, this isn’t a “nice to have” upgrade — it’s an existential need to adapt and compete with the next generation of patrons.

2. Limits of Traditional Design

The majority of pre-AI data centers (ones built before 2018, if we were to define it very strictly) were constructed for racks in the 3–10 kW per rack range cooled by air.

Cooling: CRAC/CRAH units and hot aisle containment — were not designed for 40+ kW racks.
Change-out of UPS, PDUs, and Switchgear sized for lower densities [Selective or Full Replacement]
Some unique to the application — 5 kW racks respond better to larger f/r ratios, the circumstances leading up to a raised floor collapse or a rack tipping over because it was back heavy than others (aka top or bottom heavy).

Here though, some facilities are really going to be able to adapt while others may hit hard physical limits that will limit their AI-readiness.

3. Adaptation Strategies

The operators who survive won’t necessarily be the ones with the newest buildings — but those whose retrofits well.

A combination of air cooling (for standard workloads) with direct-to-chip liquid cooling or rear-door heat exchangers for AI racks as hybrid cooling models.
Modular AI Temps — High-density AI in the rest of the data center once special halls or pods are converted to deter high heat output AI.
Point solutions for Power — Enhancing few electric runs to sustain AI loads without turning the facility upside down.
Network design — High throughput but best in class low latency interconnects between GPU nodes guaranteeing optimal operation of the cluster.

And Hybridization escapes the ‘all-or-nothing’ syndrome, enabling facilities to tap into AI demand but not at the expense of their current customer base.

4. The Retrofit ROI Question

As a result, not all data centers would — or should — be AI ready.

Retrofitting high-density zones is capex-heavy:

That should be up in the millions when it comes to power upgrades.
Installing liquid cooling systems requires mechanical, plumbing, and floorplan changes.
Network upgrades add further cost.

Workload demand, competitive landscape, and the lifespan of the existing facility constitute your decision point.

In those situations, it may be more cost-effective to create a greenfield site in close proximity to the existing building and visit for scheduled maintenance only rather than investing capital in deep retrofits.

5. The Strategic Outlook

This is the dawn of AI infrastructure expansion. Three likely scenarios are emerging:

Traditional racks blended with AI-ready pods: Dual-use facilities
Artificial intelligence-specific buildings with layer upon layer of extreme density and liquid cooling built from scratch.
AI/ML ‘clusters — rather than metro density, these will concentrate compute closer to large power-rich, low-latency markets.

The AI era doesn’t plan for the next 20-year build cycle. Those operators who change now with clear retrofit strategies in place will secure the first-mover advantage on the next wave of customers.

Closing Thoughts

Actually, running AI is not just “another workload.” It is a completely different thermal, power, and interconnect problem. The form and function of yesterday can meet the AI needs of tomorrow — but only if operators take a targeted, rational, and accelerated approach to redesign.

Subscribe & Share now if you are building, operating, and investing in the digital infrastructure of tomorrow.

AI #DataCenters #AIInfrastructure #HighDensityComputing #HybridCooling #LiquidCooling #GPUClusters #CloudComputing #DataCenterRetrofit #EdgeComputing #DigitalInfrastructure #Colocation #AIThermalManagement #PowerUpgrades #NextGenDataCenters

https://www.linkedin.com/pulse/beyond-uptime-andris-gailitis-hiovf

18.08.2025 0