Unblocking GitHub: 3-Phase Workflow For API Rate Limits
Hey guys! Let's dive into how we can tackle those pesky GitHub API rate limits that have been causing some headaches. We've been running into issues with our multi-agent system hitting these limits, especially when multiple agents are working at the same time. This can lead to PR creation failures, which, as you know, can really block our core workflow. So, let’s break down the problem, the solution, and how we’re going to implement it.
Problem Statement
So, the main issue? GitHub API rate limiting. We're bumping into these limits when multiple agents are trying to complete their work concurrently. We've seen cases where tasks have completed reviews, but the PR creation fails because we've hit the rate limit. This is a big deal because it blocks our core workflow. Our current system doesn't have a way to bundle these concurrent completions, and the API pressure is causing delays across the board. We're talking about approaching the sustainable rate of 6 PRs per hour with 8-12 agents running at the same time, which is not ideal.
Immediate Impact
The immediate impact of these rate limits is pretty clear. We've had tasks fail due to rate limiting, which means delays and frustrated agents. The lack of a mechanism to bundle concurrent completions just makes the problem worse. All this API pressure creates a domino effect, causing delays in agent coordination. Basically, it's a mess that we need to clean up ASAP.
Root Cause Analysis
To really nail this, let’s dig into the root cause. Our current setup uses a 2-phase workflow:
- Code → PR: This means each agent makes individual API calls, which adds up quickly.
- PR → Review → Merge: The standard process, but it gets bottlenecked if PRs can't be created.
Rate Limiting Math
Let's talk numbers. Our target is 6 PRs per hour max – that’s the sustainable rate we’re aiming for. Right now, it takes about 4-6 API calls to create a single PR. With 12 agents working simultaneously, we're looking at 48-72 API calls in just minutes. GitHub gives us 5000 requests per hour, but we're approaching that limit during peak times, which is why we're seeing these issues.
Solution: 3-Phase Workflow with Bundle Stage
Okay, so how do we fix this? We’re proposing a 3-phase workflow with a bundle stage. This new workflow looks like this:
- Code → Bundle Signal: Once the code is ready, the agent sends a signal to the bundle stage and is immediately freed up to work on something else. This is crucial for keeping our agents productive.
- Bundle → PR Creation: This is where the magic happens. We bundle 4 or more branches into a single PR, which significantly reduces the number of API calls.
- PR → Review → Merge: The existing process remains the same after the PR is created.
This approach should drastically cut down on the number of API calls and keep us well within the rate limits. Let's dive into the specifics.
Key Implementation Specs
To make this work, we need some solid guidelines. Here are the bundle thresholds we’re setting based on real metrics:
- Rate Target: We’re still aiming for that 6 PRs per hour max, which translates to about 1 PR every 10 minutes.
- Bundle Trigger: We’ll trigger a bundle when we have a minimum of 4 completed branches.
- Time Window: We’ll use a 10-minute bundling window to collect these branches.
- API Pressure: If we’re using more than 70% of our rate limit, we’ll bundle to conserve API calls.
- No Caching: We need a fresh survey on every peek/pop to ensure we’re working with the latest data.
We’re also introducing a new priority level to manage these bundles:
route:unblocker
(200) - Critical system issuesroute:land
(100) - Ready for final mergeroute:bundle
(50) - Bundle opportunities ← NEWroute:priority-high
(3)route:priority-medium
(2)route:priority-low
(1)
This new route:bundle
priority will help us manage and prioritize bundle creation effectively.
Auto-Cleanup Mechanism
To keep things tidy, we’ll use an auto-cleanup mechanism. Bundle PRs will use the Fixes #X
syntax, which leverages GitHub's built-in issue linking. This means that when the PR is merged, the related issues will be automatically closed. No manual cleanup needed – sweet!
Technical Requirements
Alright, let's get technical. Here are the main components we need to build:
1. Real Rate Limit Monitoring
We need to monitor our GitHub API usage in real-time. This means using GitHub's actual rate limit API. Here’s some Rust code to illustrate:
// Use GitHub's actual rate limit API
async fn get_current_rate_limit_status(&self) -> Result<RateLimitStatus, GitHubError>
async fn check_pr_creation_rate(&self) -> Result<PRCreationRate, GitHubError>
This will give us the data we need to make informed decisions about bundling.
2. Bundle Detection Logic
Next up, we need the logic to detect when we should trigger bundling. This involves checking if our bundle thresholds are met. Again, here's a Rust snippet:
// Trigger bundling when thresholds met
async fn evaluate_bundle_threshold(&self, branches: Vec<BundleReadyBranch>) -> Result<Option<BundleTask>, GitHubError>
3. Just-In-Time Survey
To ensure we're always working with the latest repo state, we need a just-in-time survey. This means no caching – we want fresh data on every peek and pop:
// Fresh repo survey on every peek/pop (no caching)
async fn survey_repo_state(&self) -> Result<(), GitHubError>
4. Bundle PR Creation
Finally, we need to create the bundle PR itself. This involves combining multiple agent contributions into a single PR body:
// Single PR with multiple agent contributions
fn create_bundle_pr_body(bundle_branches: &[BundleReadyBranch]) -> String
Implementation Scope
We're going to tackle this in three phases to keep things manageable:
Phase 1: Core Bundle Infrastructure
This phase is all about building the foundation. We’ll:
- Add the
route:bundle
priority to our routing system. - Implement real GitHub API rate limit monitoring.
- Add bundle threshold evaluation logic.
- Create bundle issue generation (just-in-time).
- Update
fetch_routable_issues()
with a fresh survey.
Phase 2: Bundle Execution
In this phase, we’ll focus on executing the bundling process. We’ll:
- Detect when a
route:bundle
task is assigned. - Implement the git branch bundling logic.
- Create a unified PR with the
Fixes #X
auto-cleanup. - Apply the
route:land
label to bundled PRs.
Phase 3: Workflow Integration
Finally, we’ll integrate the new workflow into our existing processes. We’ll:
- Update
clambake land
to detect bundle opportunities. - Modify
clambake peek
to show bundle tasks at priority 50. - Add Phoenix observability for bundle success metrics.
- Conduct comprehensive testing with multiple concurrent agents.
Success Metrics
How will we know if we’ve succeeded? We’re looking at several key metrics:
API Efficiency
- ✅ We want to see a 60-80% reduction in GitHub API calls during concurrent completions.
- ✅ We need to maintain our 6 PR/hour target rate.
- ✅ We should be able to bundle 4+ branches when thresholds are met.
Workflow Quality
- ✅ Agents should be freed immediately after code completion.
- ✅ Bundle tasks should appear in
clambake peek
at priority 50. - ✅ Bundled PRs should maintain individual agent work traceability.
- ✅ Our auto-cleanup via GitHub's "fixes #X" mechanism should be working smoothly.
System Reliability
- ✅ We should have no more rate limiting failures during concurrent completions.
- ✅ We’re aiming for a >95% bundle success rate.
- ✅ We need a graceful fallback to individual PRs when conflicts are detected.
Risk Mitigation
Of course, there are risks to consider. Let's look at how we'll mitigate them:
Technical Risks
- Merge conflicts in bundles: We’ll implement robust conflict detection and fallback to individual PRs.
- Bundle PR review complexity: We’ll ensure clear per-agent contribution sections in the PR description.
- API monitoring overhead: We’ll use efficient GitHub rate limit endpoints to minimize overhead.
Operational Risks
- Bundle timing edge cases: We’re using a conservative 4-branch minimum threshold.
- Review workflow disruption: We’ll maintain the existing review process for bundled PRs.
- Rollback scenarios: We’ll preserve individual agent branches for cherry-pick capability.
Implementation Timeline
Here’s our timeline for getting this done:
- Week 1: Core infrastructure (rate monitoring, bundle detection, routing priority)
- Week 2: Bundle execution and PR creation logic
- Week 3: Workflow integration and comprehensive testing
- Week 4: Phoenix observability and production deployment
This unblocker isn't just a quick fix; it's a long-term solution. By implementing this 3-phase workflow, we're resolving the immediate rate limiting crisis and creating a scalable foundation for 12+ concurrent agents. We got this!
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]