Entitlements & Tiers
Quotas & Limits

Quotas & Limits

Quotas are numeric limits that cap resource usage within a billing period. Unlike feature flags (which are boolean), quotas define how much of a resource a deployment can consume.

Quota Types

AI Monthly Token Limit

Controls the maximum number of AI tokens (input + output) a deployment can use per billing period.

TierMonthly Token Limit
Sandbox0 (AI disabled)
Trial1,000
Launch10,000
Growth50,000
EnterpriseUnlimited

API Rate Limit

Controls the maximum number of API requests per minute across all endpoints.

TierRequests/Minute
Sandbox100
Trial500
Launch2,000
Growth10,000
EnterpriseUnlimited

How Quotas Reset

Monthly Reset

AI token quotas reset at the start of each billing period. The reset is tied to the subscription's billing cycle, not the calendar month.

  • Unused tokens do not roll over to the next period
  • The reset happens automatically at the billing period boundary
  • Purchased credits (via credit packs) are separate and do not reset

Rate Limit Window

API rate limits use a sliding 1-minute window. Once the limit is hit, subsequent requests receive a 429 Too Many Requests response until the window advances.

Exceeded Behavior

AI Token Limit Exceeded

When the AI monthly limit is reached:

{
  "error": "quota_exceeded",
  "code": "AI_QUOTA_EXCEEDED",
  "message": "AI token quota exceeded for this billing period",
  "quota_exceeded": true,
  "upgrade_url": "https://app.hiveforge.dev/billing"
}

The deployment can still use AI if it has purchased credits available. The deduction order is:

  1. Monthly allocation (period balance)
  2. Purchased credits

API Rate Limit Exceeded

When the rate limit is hit:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711612800
{
  "error": "rate_limit_exceeded",
  "message": "API rate limit exceeded. Retry after 12 seconds."
}
⚠️

Sustained rate limit violations may trigger additional throttling. Design your application to respect Retry-After headers.

Monitoring Usage

Quota Status via Entitlement Check

The entitlement response includes quota information:

{
  "tier": "launch",
  "quotas": {
    "ai_tokens": {
      "used": 4200,
      "limit": 10000,
      "remaining": 5800,
      "resets_at": "2026-04-01T00:00:00Z"
    }
  }
}

AI Quota Endpoint

For detailed AI usage tracking:

curl https://api.hiveforge.dev/api/v1/ai/quota \
  -H "X-Deployment-ID: d9f2a1b4-..." \
  -H "X-Deployment-Secret: sk_live_..."

Response:

{
  "used": 4200,
  "limit": 10000,
  "remaining": 5800,
  "resets_at": "2026-04-01T00:00:00Z",
  "tier": "launch",
  "credits_available": 9500,
  "credits_per_standard": 1,
  "credits_per_advanced": 3,
  "credits_per_premium": 10
}

SDK Quota Monitoring

import { HiveForgeClient } from "@producthacker/hiveforge-sdk";
 
const client = new HiveForgeClient();
 
// Check AI quota
const quota = await client.ai.getQuota();
console.log(`AI tokens: ${quota.used}/${quota.limit} (${quota.remaining} remaining)`);
console.log(`Resets at: ${quota.resets_at}`);
 
// Check if near limit
if (quota.remaining !== null && quota.remaining < 1000) {
  console.warn("AI token quota is running low");
}

Quota Overrides

Enterprise deployments and special cases can have quota overrides applied by HiveForge administrators:

  • Override AI token limits (higher or lower than tier default)
  • Override API rate limits
  • Set expiration dates on overrides
  • Reset current usage counters

Overrides are applied via the admin API and take priority over tier defaults.

Quota overrides persist across billing periods until they expire or are removed. They do not affect the deployment's tier -- only the specific quota values.

Best Practices

  1. Cache entitlement data -- Respect the next_check_seconds field to avoid unnecessary API calls
  2. Show usage in your UI -- Display quota consumption to end users so they can manage their usage
  3. Handle 429 responses -- Implement exponential backoff with jitter for rate-limited requests
  4. Monitor quota warnings -- Alert when usage approaches 80% of the limit
  5. Consider credit packs -- For deployments that regularly exceed monthly allocations, purchased credits provide overflow capacity