Multi-cloud Is Inevitable.

Composability Is a Lie (Unless You Build It)

Goldman Sachs calls it Polycloud. And they didn’t choose that term lightly.

If You Think We Love Dogs - We Do

Polycloud is more than a buzzword—it recognizes that no single cloud provider can meet an enterprise’s needs across performance, regulatory, availability, and pricing dimensions.

In interviews, Goldman’s CIO Marco Argenti and managing directors from the engineering team have laid out a vision where cloud providers are treated like interchangeable parts in a composable infrastructure stack.

The idea is simple in theory: use the best tool for the job, regardless of which cloud it's on. The execution? Far from simple.

Polycloud isn’t just about choosing between AWS, Azure, and Google Cloud—it’s about being able to run critical workloads across all of them, depending on:

  • Where your data is located (to avoid egress costs and meet regulatory constraints)
  • Where compute horsepower is available (especially high-demand resources like A100s or H100s)
  • Which APIs or higher-order services does your use case depends on (Bedrock, Vertex AI, Azure AI Studio)
  • What enterprise policies must be enforced (identity, network boundaries, audit logging)

Goldman Sachs built a platform to abstract this complexity for developers. But most enterprises aren’t Goldman. And most analysts and ML researchers certainly aren’t platform engineers.

The Analyst’s Dilemma

From an analyst’s perspective, a machine-learning workflow should look like this:

  1. Load data
  2. Train model
  3. Evaluate and deploy

But here’s what it actually looks like in a multi-cloud environment:

  1. Submit a ticket to get access to a GPU
  2. Wait for DevOps to sort out permissions across cloud providers
  3. Manually move datasets or mount buckets between regions
  4. Debug authentication failures or scheduler mismatches
  5. Re-run everything after something breaks in step 4
  6. Question your career choices

Multi-cloud promises flexibility. It delivers friction, especially for people who just want to run, fine-tune, or leverage a model.

Diving Into An Example

Let’s take a typical machine learning use case. An analyst at a financial institution wants to train a foundation model for forecasting risk across multiple portfolios. She’s got datasets in S3 on AWS, compliance policies that require execution on Azure in certain regions, and her team’s preferred fine-tuning pipeline relies on Hugging Face's tools—except the model she needs is only pre-optimized in Google’s Model Garden.

To train this model at scale, she’ll need:

  • Access to GPU capacity—which may only be available in one cloud at any given moment.
  • Dataset proximity—to avoid performance issues and data egress costs.
  • Consistent APIs and developer tooling—even though each cloud wraps its ML services with a different UX and naming convention.
  • Security and policy enforcement—because legal, risk, and compliance teams never sleep.

The Cloud Is Not Getting Simpler

Cloud APIs are diverging, not converging.

Each cloud provider—AWS, Azure, GCP—offers its managed ML stack. Each stack comes with different primitives, different naming conventions, and different security controls. That’s by design. Cloud vendors don’t want to be interchangeable. They want you locked in.

GPU capacity? It’s bursty, region-bound, and subject to quota drama. Compliance? It’s not just legal anymore—it’s embedded into infrastructure policy. And higher-order services? They don’t talk to each other.

So, where does that leave you if you’re trying to train or fine-tune models across providers? Either you build a lot of glue code and internal tooling… or give up and pick the least lousy cloud for every project forever. Or worse yet, try to aggregate unoptimized Kubernetes clusters.

Because your data? Scattered. Your GPUs? Wherever they're available. Your ML stack? Split across three cloud APIs and ten acronyms.

This is the part where most analysts give up and call the platform team. Or worse: they don’t start at all.

Let’s take a real scenario:

  • You’ve got datasets in S3.
  • Your compliance team says the training job must run in a specific Azure region.
  • Your models are in Hugging Face, but the variant you need is optimized for Google’s TPU stack.
  • You want A100s, but AWS is out. Azure has them—but only in West Europe.

Oh—and your analysts want to launch the job from Jupyter. Not Terraform.

The Solution: A New Layer for Multi-Cloud ML

That’s why we built Project Robbie.

Robbie is a high-performance computing service that makes multi-cloud invisible for ML workloads. Built on top of infrastructure software from Positron Networks, Robbie lets analysts and researchers launch training, fine-tuning, or inference jobs across clouds without having to know—or care—where the job runs.

It works with the tools analysts already use (JupyterLab, VS Code, CLI), and under the hood it handles:

  • GPU scheduling across cloud providers, including AWS, Azure, GCP, and private clusters
  • Data-compute colocation, minimizing egress cost and latency
  • Multi-scheduler execution, including Kubernetes and Slurm
  • Code generation for ML training/fine-tuning, so analysts don’t have to write YAML or Dockerfiles, optimized AI-powered GPU selection
  • Enterprise policy enforcement, including identity, network, and audit controls

Robbie is the abstraction layer for people, not just infrastructure.

Your analysts get a fast path to ML results. Your platform team gets a policy-aligned way to scale AI across clouds. Your organization stops reinventing glue code every quarter.

Article content

The Enterprise Benefits

For IT, architecture, and security teams, Robbie acts as a control plane for AI workloads:

  • Enforces policy-based routing (e.g., US-only workloads must stay in GovCloud)
  • Logs usage and costs across providers
  • Offers a compliant, secure alternative to shadow infra
  • Reduces the workload on DevOps and platform teams

In short, Robbie lets your people move faster—without breaking the rules.

Final Thought

Goldman Sachs is right: the future of enterprise cloud is composable. But composability doesn’t come from cloud providers. It comes from the layers you build.

Project Robbie exists to make multi-cloud ML possible without complexity.

Let’s forget about what the clouds can’t do together, and start building what we need them to do together. Let’s go.