How to Build a Virtual Agent Team for Faster Shipping: A Step-by-Step Guide from Docker's Coding Agent Sandboxes Team

Introduction

Imagine having a team of seven AI agents that test your product, triage issues, post release notes, and even fix bugs—all running autonomously in your CI pipeline. That’s exactly what the Coding Agent Sandboxes team at Docker built with their Fleet. This guide walks you through the same approach so you can create your own virtual agent team. You’ll learn how to move from traditional scripts to autonomous agents that use judgment, not just instructions.

How to Build a Virtual Agent Team for Faster Shipping: A Step-by-Step Guide from Docker's Coding Agent Sandboxes Team — Source: www.docker.com

What You Need

A microVM isolation tool – like Docker’s sbx (Coding Agent Sandboxes) or any secure sandbox that gives agents full autonomy (Docker daemon, network, filesystem).
AI coding agent – such as Claude Code, Gemini, Codex, or Docker Agent. The guide assumes Claude Code with skill files.
CI system – GitHub Actions, GitLab CI, or any runner where you can execute skills.
Skill files (markdown) – one per agent role. These define the persona, responsibilities, and allowed tools.
Local development environment – your laptop with the same tooling as your CI context.
Version control – to store skill files and workflow definitions.

Step-by-Step Guide

Step 1: Define Agent Roles and Responsibilities

Before writing any code, decide what roles your virtual team needs. The Fleet uses roles like:

Exploratory Tester (/cli-tester) – manually exercises CLI commands, finds issues, and reports them.
Build Engineer – handles builds and verifies they succeed across platforms.
Triage Specialist – categorizes and prioritizes incoming issues from the backlog.
Release Manager – posts release notes and checks upgrade paths.

Each role should have a clear persona, a set of responsibilities, and boundaries on what tools it may use. For example, the build engineer can run make and compile binaries but should not modify source code unless authorized.

Step 2: Create Skill Files for Each Role

A skill file is a markdown document that describes the agent’s role, knowledge, and decision-making process. It is not a step-by-step script; it’s a persona description. For instance:

# /cli-tester

You are an exploratory tester for the sbx CLI tool.
You know how to build, install, and run sbx on Linux, macOS, and Windows.
Your job is to test all CLI commands: create, start, stop, remove, configure networking, mount workspaces.
When you find an issue, investigate its cause and write a detailed report.
You may escalate to the build engineer if needed.

Store these files in a skills/ directory in your repository. The same file will be invoked both locally and in CI—no separate versions needed.

Step 3: Develop Locally First (Local First, CI Second)

This is the core design principle. Always iterate on your skill files on your laptop before wiring them into CI. Why? Because debugging in CI means commit-push-wait-read-logs cycles that take minutes. Locally, you see the agent think in real time, spot confusion, and fix the skill in seconds.

How to do this:

Install the sandbox tool (sbx) and the AI agent (Claude Code) on your machine.
Create a sandbox environment for the agent to work in.
Invoke the skill locally: claude --skill skills/cli-tester.md
Watch the agent build the binaries, run commands, and produce reports.
If the agent misunderstands, edit the skill file and re-invoke. Iterate until the behavior matches your expectations.

Your local machine becomes the primary development environment. CI is just another runtime for the exact same skill.

Step 4: Wire Skills into CI Workflows

Once your skill works locally, add it to your CI system. For example, in a GitHub Actions workflow:

jobs:
  test-cli:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
    steps:
      - uses: actions/checkout@v3
      - name: Setup environment
        run: ...
      - name: Run CLI tester skill
        run: claude --skill skills/cli-tester.md

Notice: the same skills/cli-tester.md file is called. No translation, no “CI version.” The workflow only sets up the environment and checks out code. The agent does the rest.

Make separate workflows for different roles. For example, run the /cli-tester nightly on all three OSes, and the /triage agent after each issue update.

Step 5: Run Autonomous Tasks and Review Reports

Now let your Fleet work. Each agent will:

Build the product from source.
Execute exploratory tests on CLI commands (create, start, stop, remove, network, workspace).
Check upgrade paths between versions.
Run sustained load tests to catch resource leaks.
Triage the issue backlog, categorize new issues, and prioritize.
Generate release notes and post them to a Slack channel or GitHub release.
Even fix simple bugs by suggesting pull requests.

All reports are written to a shared location (GitHub issues, comments, or a dedicated folder). The team reviews them daily.

Step 6: Iterate and Refine Based on Failures

Autonomous agents will make mistakes. When a test fails unexpectedly, a script stops. A role (agent) investigates. But you still need to improve the skill files over time.

After each CI run, examine the agent’s output:

Did the agent investigate the right things?
Did it misinterpret an instruction?
Did it need a new tool or permission?

Update the corresponding skill file locally, test it, then commit the change. The same iterative cycle applies: local first, then CI.

Tips for Success

Keep skill files simple – a persona with enough context for the agent to make good decisions, not a rigid checklist.
Use the same runtime – local and CI environments should be as close as possible (same sandbox, same agent version). This prevents “works on my machine” problems.
Start with one role – avoid building the entire fleet at once. Perfect the /cli-tester first, then add the build engineer, then triage, etc.
Log everything – make agents output detailed logs so you can debug failures. The log is your main feedback loop.
Empower agents with limited autonomy – allow them to modify code for bug fixes, but always require a human review before merging.
Monitor costs – AI agents consume tokens and compute. Set limits per run and track usage.

Building a virtual agent team is not about replacing engineers—it’s about freeing them from repetitive tasks so they can focus on higher-level problems. Start small, iterate locally, and watch your shipping velocity increase.