Project Robbie: Rapid, Scalable Model Training for Analysts Innovating With AI/ML

Machine learning is the future. It’s the engine driving breakthroughs in medicine, finance, and technology, but let’s be honest—right now, it’s also an exercise in patience.

Long wait times, overworked GPUs, and the constant battle to secure affordable, high-performance computing mean that some of the most innovative minds in the world spend more time waiting than innovating. That is not acceptable.

If You Think We Love Dogs - We Do

Enter Project Robbie—a research platform that doesn’t ask machine learning scientists to work around bottlenecks but removes them entirely. Need a GPU? You’ve got it. Need a cluster? Done. Need to run massive hyperparameter sweeps without turning your laptop into a space heater? Consider it handled. When you’re pushing the boundaries of AI, you shouldn’t be fighting for resources—you should be focused on building better models faster. And now, you can.

A Real-World Use Case: Training a Computer Vision Model on Robbie

Let’s walk through a real-world example of how a machine learning scientist can use Project Robbie to train a deep learning model for a computer vision task.

This isn’t theoretical. This is happening. A physician at a major Midwestern institution has laid it out for us: a doctor with a terabyte of oncology radiology images believes they can train a model to detect cancer automatically.

Not in the abstract. Not as some distant possibility. Right now. But here’s the problem: this doctor isn’t a machine learning engineer. They’re not well-versed in Python, PyTorch, Slurm, Kubernetes, Docker, Containers, or TensorFlow. What they are is someone with an idea—an idea that could save lives.

And that’s the point. AI shouldn’t be the exclusive domain of a Ph.D. in computer science. It should be accessible to those who need it. With Project Robbie, that doctor isn’t just analyzing data—they’re innovating. They’re taking the first step from theory to reality, from data analyst to data pioneer. When the tools are within reach, the breakthroughs aren’t far behind.

Folks - we're just getting started. This is Positron's first tiny step in transforming data analysts into data innovators with AI.

The Problem: Training a ResNet Model for Image Classification

Imagine you’re working on a computer vision project that involves classifying medical images—say, identifying different types of lung diseases in X-rays. You’ve selected ResNet-50, a deep convolutional neural network (CNN) architecture known for its accuracy and efficiency.

While developing your model, you quickly hit a roadblock. Training ResNet-50 on your local machine is taking forever. Your workstation’s GPU (if you even have one) is already struggling with smaller models, and running the training on a CPU would take days, if not weeks. You could use your institution’s shared HPC cluster, but there’s a long queue, and you don’t have time to wait.

This is where Project Robbie comes in.

Step 1: Setting Up Robbie in Your Jupyter Notebook

Instead of relying on local resources, you can directly leverage Robbie's remote function capabilities within your Jupyter Notebook. Here's how:

Install the Robbie and PyTorch Packages: In the first cell of your notebook, install the necessary packages:

!pip install robbie torch

Log in to Robbie: In the next cell, import the robbie module and log in:

import robbie

robbie.login()

Copy the Radiology Images to Your Persistent Disk: The model relies on a dataset for training. You have a persistent disk with Robbie for storing data, including training data, model weights, and other content. You can transfer data to and from the persistent disk using the "robbie pd" commands. The documentation on how to utilize the persistent disk is available here.

Define the Remote Training Function: Now, define your training function and decorate it with Robbie's @remote decorator to specify that it should run on Robbie's infrastructure:

from robbie import remote

import torch

import torchvision

import torchvision.transforms as transforms

from torchvision import models

@remote(chooser_ui=True)

def train_resnet50():

   # Define data transformations

   transform = transforms.Compose([

       transforms.Resize((224, 224)),

       transforms.ToTensor(),

   ])

   # Load dataset

   # Before you run this example, you would move the data into your persistent disk

   # To learn about our persistent disk capability see:

   # https://docs.robbie.run/robbie-concepts/robbie-persistent-disk

   train_dataset = torchvision.datasets.ImageFolder(root='../persistent-disk/oncology-radiology-data/', transform=transform)

   train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)

   # Load ResNet-50 model

   model = models.resnet50(pretrained=True)

   model.fc = torch.nn.Linear(model.fc.in_features, num_classes)  # Adjust for the number of classes

   # Move model to Robbie’s GPU

   device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

   model.to(device)

   # Define loss function and optimizer

   criterion = torch.nn.CrossEntropyLoss()

   optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

   # Training loop

   for epoch in range(10):

       for images, labels in train_loader:

           images, labels = images.to(device), labels.to(device)

           optimizer.zero_grad()

           outputs = model(images)

           loss = criterion(outputs, labels)

           loss.backward()

           optimizer.step()

       print(f"Epoch {epoch+1}, Loss: {loss.item()}")

   return model.state_dict()

Step 2: Executing the Remote Training Function

After defining the function, you can execute it remotely on Robbie's infrastructure:

trained_model_state = train_resnet50()

When you run this cell, Robbie will:

  • Display a GUI within the notebook to select the desired hardware (e.g., an NVIDIA V100 GPU-based instance).
  • Serialize your function and its dependencies.
  • Allocate the selected GPU resources in the Robbie cloud.
  • Execute the function remotely.
  • Stream the standard output from the remote machine to your local notebook.
  • Return the result (in this case, the trained model's state dictionary) to your local environment.

For more detailed instructions on using remote functions in Jupyter Notebooks, please refer to Robbie's documentation.

Step 3: Saving and Deploying the Trained Model

Once the training is complete, you can save the trained model locally for evaluation or deployment:

import torch

torch.save(trained_model_state, "resnet50_trained.pth")

This allows you to integrate the trained model into applications, deploy it as a web service, or share it with colleagues for further analysis.

Why Machine Learning Scientists Love Robbie

Machine learning scientists face enough challenges—choosing the right model architecture, cleaning data, and debugging training issues. Finding computing power should not be one of them. Here’s why users turn to Robbie:

  • Instant Access to High-Performance GPUs – No wait times, no provisioning headaches—just powerful computing when needed.
  • Seamless Integration with Existing Tools—You can use Jupyter Notebooks, PyTorch, TensorFlow, and other ML frameworks without extra setup.
  • Scalability for Experimentation – Train multiple models in parallel, fine-tune hyperparameters, and quickly run extensive tests.
  • Cost-Effective Pay-As-You-Go Model: You only pay for the compute time you use, making this model far more efficient than traditional cloud solutions.
  • There is no IT Overhead, and there is no need to manage infrastructure or cloud configurations—log in and start training.

Final Thoughts: Powering the Future of AI Research

AI innovation is moving fast, and machine learning scientists need tools that keep pace. Whether you’re an academic researcher, an independent data scientist, or an industry ML engineer, Project Robbie ensures that computing power is never a bottleneck to discovery.

So, the next time you find yourself stuck waiting for compute resources, ask yourself: What could I accomplish today if I had instant access to cutting-edge GPUs? With Robbie, the answer is simple: more innovation, breakthroughs, and progress.

Get started with Project Robbie today, and train your models how they were meant to be trained—fast, scalable, and without limits. Sign up here. Read our docs here.