Enter Project Robbie—a research platform that doesn’t ask machine learning scientists to work around bottlenecks but removes them entirely. Need a GPU? You’ve got it. Need a cluster? Done. Need to run massive hyperparameter sweeps without turning your laptop into a space heater? Consider it handled. When you’re pushing the boundaries of AI, you shouldn’t be fighting for resources—you should be focused on building better models faster. And now, you can.
Let’s walk through a real-world example of how a machine learning scientist can use Project Robbie to train a deep learning model for a computer vision task.
This isn’t theoretical. This is happening. A physician at a major Midwestern institution has laid it out for us: a doctor with a terabyte of oncology radiology images believes they can train a model to detect cancer automatically.
Not in the abstract. Not as some distant possibility. Right now. But here’s the problem: this doctor isn’t a machine learning engineer. They’re not well-versed in Python, PyTorch, Slurm, Kubernetes, Docker, Containers, or TensorFlow. What they are is someone with an idea—an idea that could save lives.
And that’s the point. AI shouldn’t be the exclusive domain of a Ph.D. in computer science. It should be accessible to those who need it. With Project Robbie, that doctor isn’t just analyzing data—they’re innovating. They’re taking the first step from theory to reality, from data analyst to data pioneer. When the tools are within reach, the breakthroughs aren’t far behind.
Folks - we're just getting started. This is Positron's first tiny step in transforming data analysts into data innovators with AI.
Imagine you’re working on a computer vision project that involves classifying medical images—say, identifying different types of lung diseases in X-rays. You’ve selected ResNet-50, a deep convolutional neural network (CNN) architecture known for its accuracy and efficiency.
While developing your model, you quickly hit a roadblock. Training ResNet-50 on your local machine is taking forever. Your workstation’s GPU (if you even have one) is already struggling with smaller models, and running the training on a CPU would take days, if not weeks. You could use your institution’s shared HPC cluster, but there’s a long queue, and you don’t have time to wait.
This is where Project Robbie comes in.
Instead of relying on local resources, you can directly leverage Robbie's remote function capabilities within your Jupyter Notebook. Here's how:
Install the Robbie and PyTorch Packages: In the first cell of your notebook, install the necessary packages:
!pip install robbie torch
Log in to Robbie: In the next cell, import the robbie module and log in:
import robbie
robbie.login()
Copy the Radiology Images to Your Persistent Disk: The model relies on a dataset for training. You have a persistent disk with Robbie for storing data, including training data, model weights, and other content. You can transfer data to and from the persistent disk using the "robbie pd" commands. The documentation on how to utilize the persistent disk is available here.
Define the Remote Training Function: Now, define your training function and decorate it with Robbie's @remote decorator to specify that it should run on Robbie's infrastructure:
from robbie import remote
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision import models
@remote(chooser_ui=True)
def train_resnet50():
# Define data transformations
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
# Load dataset
# Before you run this example, you would move the data into your persistent disk
# To learn about our persistent disk capability see:
# https://docs.robbie.run/robbie-concepts/robbie-persistent-disk
train_dataset = torchvision.datasets.ImageFolder(root='../persistent-disk/oncology-radiology-data/', transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
# Load ResNet-50 model
model = models.resnet50(pretrained=True)
model.fc = torch.nn.Linear(model.fc.in_features, num_classes) # Adjust for the number of classes
# Move model to Robbie’s GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Define loss function and optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Training loop
for epoch in range(10):
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss: {loss.item()}")
return model.state_dict()
After defining the function, you can execute it remotely on Robbie's infrastructure:
trained_model_state = train_resnet50()
When you run this cell, Robbie will:
For more detailed instructions on using remote functions in Jupyter Notebooks, please refer to Robbie's documentation.
Once the training is complete, you can save the trained model locally for evaluation or deployment:
import torch
torch.save(trained_model_state, "resnet50_trained.pth")
This allows you to integrate the trained model into applications, deploy it as a web service, or share it with colleagues for further analysis.
Machine learning scientists face enough challenges—choosing the right model architecture, cleaning data, and debugging training issues. Finding computing power should not be one of them. Here’s why users turn to Robbie:
AI innovation is moving fast, and machine learning scientists need tools that keep pace. Whether you’re an academic researcher, an independent data scientist, or an industry ML engineer, Project Robbie ensures that computing power is never a bottleneck to discovery.
So, the next time you find yourself stuck waiting for compute resources, ask yourself: What could I accomplish today if I had instant access to cutting-edge GPUs? With Robbie, the answer is simple: more innovation, breakthroughs, and progress.
Get started with Project Robbie today, and train your models how they were meant to be trained—fast, scalable, and without limits. Sign up here. Read our docs here.