7. GitHub Actions agent runtime MVP
Hypothesis
A fully autonomous SDLC pipeline — from issue triage to merge — can be implemented using only GitHub Actions workflows and AI agent CLIs, with the repository itself as the coordinator (no orchestrator agent). The review/fix loop should converge within a practical number of iterations.
Background
Fullsend's core claim is that the repo is the coordinator. This experiment tests whether GitHub Actions' native event system (issue events, PR events, review events) can drive a multi-agent pipeline without a central orchestrator. Each agent runs in its own workflow, triggered by the previous agent's side effects.
The key question is whether the event-driven model provides enough coordination to handle the full lifecycle — including the review/fix loop where agents iterate autonomously until the reviewer approves.
Test repo: nonflux/integration-service (public playground fork)
Setup
Pipeline Architecture
TL;DR — Pipeline in Action

GIF preview — view the full-quality video (webm) for a clearer look.
Every agent input passes through Google Cloud Model Armor scan before processing. Prompt injection detected → blocked + human notified.
The Four Agents
| Agent | Trigger | Model | Tools | Token | Purpose |
|---|---|---|---|---|---|
| Triage | issues.opened | gemini-3-flash-preview | gh, read_file (restricted) | AGENT_TOKEN (PAT) | Classify issues, add labels, determine readiness |
| Implementation | issues.labeled (ready-for-implementation) | gemini-3-pro-preview | Unrestricted shell (git, gh, make, go) | AGENT_TOKEN (PAT) | Create branch, write code, run tests, open PR |
| Review | pull_request.opened/synchronize | Pluggable (see below) | read_file, git only — no gh | Reviewer App token (post step only) | Review code quality and security |
| Fix | pull_request_review.submitted | gemini-3-pro-preview | Unrestricted shell (git, gh, make, go) | Agent App token | Read feedback, fix code, push |
Pluggable Reviewer Architecture
The review agent slot is designed to be filled by any code review tool that can post a GitHub pull_request_review with APPROVE or CHANGES_REQUESTED. The fix agent triggers on the review event regardless of which reviewer posted it.
| Reviewer | Type | Identity | Status |
|---|---|---|---|
| Gemini CLI (built-in) | Gemini via run-gemini-cli action | fullsend-reviewer[bot] (GitHub App) | Verified on PR #7 |
| cicaddy-action | Reusable GitHub Action | fullsend-reviewer[bot] (GitHub App) | Verified on PR #1 |
| Qodo (formerly Codium) | Third-party SaaS | qodo-merge-pro[bot] | Planned |
| CodeRabbit | Third-party SaaS | coderabbitai[bot] | Planned |
For the fix agent to trigger from a reviewer, the fix-agent.yml if: condition must include the reviewer's bot login:
github.event.review.user.login == 'fullsend-reviewer[bot]'
# To add more reviewers:
# || github.event.review.user.login == 'qodo-merge-pro[bot]'
# || github.event.review.user.login == 'coderabbitai[bot]'Third-party reviewers (Qodo, CodeRabbit) don't need identity isolation setup — they already have their own bot identity. The main integration point is the fix agent's trigger filter.
Identity Model
| Component | GitHub Identity | Token Source | Purpose |
|---|---|---|---|
| Triage Agent | fullsend-agent[bot] or PAT | AGENT_TOKEN (PAT) | Label issues, post triage comment |
| Implementation Agent | PAT user identity | AGENT_TOKEN (PAT) | Push branches, open PRs (must trigger review agent) |
| Review Agent | fullsend-reviewer[bot] | GitHub App (REVIEWER_APP_ID) | Post reviews, trigger fix agent |
| Fix Agent | fullsend-agent[bot] | GitHub App (AGENT_APP_ID) | Push commits, post status |
| Cancel stale fixes | github-actions[bot] | GITHUB_TOKEN | Cancel workflows (no event trigger needed) |
| Injection scanner | Service account | GCP_SA_KEY (static key — see 015) | Scan for prompt injection |
Why PAT for Implementation Agent? GITHUB_TOKEN events don't trigger other workflows. A PAT from a different identity breaks this restriction so the Implementation Agent's PR triggers the Review Agent.
Why separate GitHub App for Reviewer? GitHub prevents the same token from authoring AND approving PRs. A separate reviewer identity solves the self-approval problem.
Review Agent Flow
Fix Agent Flow
Concurrency Model
Review Agent group: fullsend-review-pr-{N}
cancel-in-progress: true
→ New push cancels stale review, latest review wins
Fix Agent group: fullsend-fix-pr-{N}
cancel-in-progress: true
→ New fix cancels stale fix of same trigger type
Cross-workflow:
Review Agent step 1 → Cancel stale fix agents via API
(prevents stale fixes from pushing after new code lands)Security Layers
- Model Armor — Scans PR body, diff, review comments, and issue content for prompt injection at every agent entry point
- Fail-closed scanning —
UNKNOWNor error from Model Armor → block (not pass) - Token isolation (Review Agent) — Review agent's LLM has no
ghtool and no write-capable token; write actions only in deterministic post step - Unique file paths —
${RUN_ID}-$(openssl rand -hex 8)prevents artifact hijack between steps - CODEOWNERS — Agent workflow files, scripts, agent config (
GEMINI.md,.gemini/), and CODEOWNERS itself requirefullsend-sigteam approval - Branch protection —
require_code_owner_reviews: trueenforces CODEOWNERS approval onmain;dismiss_stale_reviews: trueinvalidates prior approvals after new pushes - Fork PR blocking — Fix agent refuses to run on fork PRs
- Iteration cap — Hard limit at 45 review-fix cycles, escalates to human
- Identity separation — Reviewer and fixer are different GitHub Apps
- Least-privilege tools — Each agent's tool allowlist is restricted to what it needs
Branch Protection (Recommended Target Repo Config)
The test repo (nonflux/integration-service) uses these branch protection settings on main:
| Setting | Value | Why |
|---|---|---|
| Require pull request reviews | Yes | Agents cannot push directly to main |
| Required approving review count | 1 | At least one approval before merge |
| Require code owner reviews | Yes | Enforces CODEOWNERS rules — agent workflow/config changes need team approval |
| Dismiss stale reviews | Yes | New push (e.g., fix agent commit) invalidates prior approval, forcing re-review |
| Require last push approval | No | Not needed — the review agent re-reviews on every synchronize event |
CODEOWNERS protects agent infrastructure from self-modification:
# Agent workflows — listed individually, not as a glob
/.github/workflows/triage-agent.yml @nonflux/fullsend-sig
/.github/workflows/implementation-agent.yml @nonflux/fullsend-sig
/.github/workflows/review-agent.yml @nonflux/fullsend-sig
/.github/workflows/fix-agent.yml @nonflux/fullsend-sig
# Agent config and scripts
/GEMINI.md @nonflux/fullsend-sig
/.gemini/ @nonflux/fullsend-sig
/.github/scripts/ @nonflux/fullsend-sig
# CODEOWNERS itself — prevents agents from weakening review gates
/.github/CODEOWNERS @nonflux/fullsend-sigThe combination of CODEOWNERS + require_code_owner_reviews means agents can push workflow changes during the review/fix loop, but those changes cannot be merged without human team approval.
In-Place Comment Updates
The review and fix agents each use a single PR comment with an HTML marker (<!-- fullsend:review-agent --> / <!-- fullsend:fix-agent -->), editing it in-place on each new cycle. Previous content is collapsed into a <details> block. GitHub tracks the edit history, keeping the PR timeline clean.
Context Passing Between Steps
Running
Demo 1: First-Pass Approval — Issue #15 → PR #16
Created Issue #15 (gosec G708 warnings fix). The pipeline processed it end-to-end:
- Triage agent (~1 min) — classified as
bug, added labels includingready-for-implementation - Implementation agent (~18 min) — created branch, wrote fix, opened PR #16
- Review agent (~5 min) — APPROVED on first pass
Total: ~24 min, zero human intervention.
Demo 2: Review/Fix Loop — Issue #19 → PR #20
Created Issue #19 (finalizer removal bug). This time the review agent found issues:
| Stage | Agent | Duration | Result |
|---|---|---|---|
| Triage | Triage Agent | ~1 min | Labels: bug, kind/bug, area/controller, area/gitops, priority/high, ready-for-implementation |
| Implementation | Implementation Agent | ~18 min | Created PR #20 |
| Review (1st) | Review Agent | ~7 min | CHANGES_REQUESTED |
| Fix | Fix Agent | ~30 min | Commit bb94e01 — fix: address review feedback |
| Review (2nd) | Review Agent | ~4 min | APPROVED — "LGTM" |
Total: ~60 min, zero human intervention. The review/fix loop converged in one iteration.
Full demo video (58s, webm) — end-to-end pipeline from issue creation to approval.
Review/Fix Loop in PR Timeline

The PR timeline shows the full autonomous loop: review agent requested changes → fix agent committed fix: address review feedback → review agent approved on re-review.
Review/Fix Loop — Verified on PR #7
PR #7 verified the review/fix loop end-to-end with 3 successful bot commits during development of the pipeline itself. This was the primary validation PR where the concurrency model, identity isolation, and comment update patterns were iterated on.
Results
What Works
| Feature | Status | Notes |
|---|---|---|
| Full review→fix→push loop | Working | 3 successful bot commits on PR #7 |
| Full issue→triage→implement→review→merge pipeline | Working | Issue #15 → PR #16 (first-pass), Issue #19 → PR #20 (with fix loop) |
| In-place comment updates | Working | Single comment per agent, collapsed history |
| Deterministic comment posting | Working | Shell step, not LLM — eliminates duplicates |
| Separate bot identities | Working | fullsend-reviewer[bot] reviews, fullsend-agent[bot] pushes |
| Fix agent triggering from review | Working | pull_request_review event from app token |
/fix-agent human command | Partial | Trigger verified (workflow started, +1 reaction posted), but cancelled by incoming commit before full run completed |
workflow_dispatch fallback | Working | For when events are throttled |
| Model Armor integration | Working | Hard block on PR body, filter-and-continue on diff/comments |
| Iteration cap (45) | Working | Escalates to human with CODEOWNERS mention |
| Strategy escalation (>=5 cycles) | Working | Prompt tells agent to try new approach |
| Cancel stale fix agents | Working | Review agent cancels in-progress fixes on new push |
| Unique artifact file paths | Working | Run ID + random hex prevents hijack |
| Token isolation | Working | LLM gets read-only token, write-only in post step |
| Pluggable reviewer slot | Working | Any tool that posts a GitHub review can fill the slot |
Timing Data (from PR #7)
Review Agent (11 successful runs)
| Run | Duration | Notes |
|---|---|---|
| 23645733079 | ~5 min | Early, smaller diff |
| 23646406199 | ~3.5 min | |
| 23646546863 | ~4.5 min | |
| 23646792809 | ~5 min | |
| 23648438805 | ~6.5 min | |
| 23649109721 | ~4.5 min | |
| 23649767441 | ~5 min | |
| 23650887715 | ~10 min | Larger diff after workflow changes |
| 23652775666 | ~8.5 min | |
| 23654379038 | ~4.5 min | |
| 23657972959 | ~5 min | |
| Mean | ~5.5 min |
Fix Agent (3 successful runs)
| Run | Duration | Notes |
|---|---|---|
| 23608344342 | ~7 min | Small fix |
| 23627557889 | ~21 min | Larger fix |
| 23644666720 | ~17 min | After workflow refactoring |
| Mean | ~15 min | Range: 7-21 min |
Full Loop Cycle (review + fix + push)
- Best case: ~12 min (fast review + small fix)
- Typical: ~20-25 min
- Worst case: ~35 min (long review + complex fix)
Fix Agent Telemetry (from gemini-output artifacts)
| Metric | Run 23644666720 (~17 min) | Run 23627557889 (~21 min) |
|---|---|---|
| LLM API time (gemini-3.1-pro) | 283.9s (~4.7 min, 29%) | 341.8s (~5.7 min, 29%) |
| Tool execution time (shell) | 689.5s (~11.5 min, 71%) | 832.9s (~13.9 min, 70%) |
| Loop detector (gemini-3-flash) | 4.3s (<1%) | 12.2s (1%) |
| API requests | 44 (0 errors) | 62 (0 errors) |
| Tool calls | 48 (47 shell, 1 read_file) | 61 (60 shell, 1 read_file) |
| Total tokens | 1.49M (89% cached) | 2.38M (91% cached) |
Key findings:
- ~70% of fix agent time is shell command execution (make test, make lint, git, gh api), not LLM inference
- LLM thinking is only ~30% of total time (~5-6 min per run)
- Token caching is highly effective (89-91% hit rate) — Gemini caches prior context across multi-turn
- The
read_filetool is barely used (1 call per run) — agent prefersrun_shell_commandfor reading files - The loop detector (flash model) adds negligible overhead
- Optimization target is tool execution, not model speed — faster test/lint or parallelized commands would have the most impact
What Failed and Was Reverted
| Approach | Why It Failed |
|---|---|
| Shared concurrency group for review+fix | Review and fix cancel each other. Max queue depth 1. |
cancel-in-progress per workflow in shared group | GitHub evaluates incoming workflow's setting, not existing. Asymmetric behavior. |
| LLM posts comments via prompt instructions | LLM unreliably follows complex shell sequences. Creates duplicates. |
gh auth login in post step with GH_TOKEN env | Conflicts — gh complains when both are set. GH_TOKEN env var is sufficient. |
GITHUB_TOKEN passed to LLM for read-only | GITHUB_TOKEN has pull-requests: write from workflow permissions. Agent can still post as wrong bot. |
workflows: write in YAML permissions block | Causes parse error with workflow_dispatch trigger. App token handles this at the GitHub App settings level. |
| Trusted scripts from base branch checkout | Scripts don't exist on base branch (main) yet — only on PR branch. Breaks the workflow. |
Review content in gh pr review --body | Creates visible review with full content + separate issue comment = duplicate noise. |
Known Issues
| # | Issue | Status |
|---|---|---|
| 001 | Fix agent cycle time slow (7-21 min per iteration) — 70% is shell execution | Open |
| 002 | Fix agent workflow push permission vs guardrails | Resolved |
| 003 | GitHub throttles pull_request_review events after ~20 from same bot on one PR | Open |
| 004 | Concurrent bot + human fixes share concurrency group, causing cancellation | Open |
| 005 | Review verdict entries (--request-changes) accumulate in PR timeline | Open |
| 006 | Prompt injection scan returns match state only, not flagged snippet | Open |
| 007 | Workflows hardcoded per repo — need reusable workflows for multi-repo | Open |
| 009 | Triage agent only triggers on issues.opened — no retry on failure | Open |
| 010 | Fix agent modifying workflow files can break agent pipelines | Open |
| 011 | run-gemini-cli output parsing failure with special characters (upstream) | Open |
| 012 | Prompt injection scan truncates diff at 1000 lines — bypass possible | Open |
| 013 | GITHUB_OUTPUT multi-line value injection via echo "instruction=$X" | Open |
| 014 | Triage agent multi-label race causes implementation agent concurrency cancellation | Open |
| 015 | GCP auth uses static SA key — replace with Workload Identity Federation | Open |
| 016 | Agent telemetry trapped in artifacts — export to GCP Cloud Monitoring | Open |
Security Notes
- All agent inputs are scanned by Google Cloud Model Armor for prompt injection before processing
- Scanner operates in fail-closed mode (unknown results → block)
- The review agent's LLM never has write access to GitHub — all mutations happen in deterministic shell steps
- Each workflow uses unique temp file paths (
${RUN_ID}-$(openssl rand -hex 8)) to prevent artifact hijack between steps - The fix agent has an iteration cap (45 cycles) to prevent infinite loops
- Fork PRs are blocked from triggering the fix agent
- Identity separation: reviewer and fixer are different GitHub Apps — prevents self-approval
- See
experiments/model-armor-vs-claude-triage/for prompt injection scanner effectiveness testing - See
experiments/prompt-injection-defense/for AI agent inherent injection resistance testing
