AI Proxy

The AI proxy lets your deployed application make OpenAI-compatible requests through HiveForge without embedding API keys in your app. Quota enforcement, model access, and usage tracking are handled automatically based on your deployment's tier.

Access the AI proxy via hiveforge.ai.

import { HiveForgeClient } from '@producthacker/hiveforge-sdk';
 
const hiveforge = new HiveForgeClient({
  deploymentId: process.env.HIVEFORGE_DEPLOYMENT_ID!,
  deploymentSecret: process.env.HIVEFORGE_DEPLOYMENT_SECRET!,
});
await hiveforge.initialize();
 
// Check if AI is available for this tier
if (hiveforge.ai.isEnabled()) {
  const response = await hiveforge.ai.complete({
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.content);
}

Methods

`complete(options)`

Create a chat completion. Sends messages to the AI model and returns the full response.

const response = await hiveforge.ai.complete({
  messages: [
    { role: 'system', content: 'You are a helpful customer support agent.' },
    { role: 'user', content: 'How do I reset my password?' },
  ],
  model: 'gpt-4o-mini',
  temperature: 0.7,
  max_tokens: 500,
});
 
console.log(response.content);
console.log(`Tokens used: ${response.tokens_used}`);
console.log(`Model: ${response.model}`);

Parameters (AICompletionOptions):

Parameter	Type	Required	Default	Description
`messages`	`ChatMessage[]`	Yes	--	Array of conversation messages
`model`	`string`	No	`'gpt-4o-mini'`	Model to use
`max_tokens`	`number`	No	--	Maximum tokens in the response
`temperature`	`number`	No	`0.7`	Sampling temperature (0-2)
`top_p`	`number`	No	--	Nucleus sampling parameter
`stream`	`boolean`	No	`false`	Whether to stream (use `stream()` method instead)
`stop`	`string \| string[]`	No	--	Stop sequences
`presence_penalty`	`number`	No	`0`	Presence penalty (-2 to 2)
`frequency_penalty`	`number`	No	`0`	Frequency penalty (-2 to 2)
`user`	`string`	No	--	End-user identifier for abuse tracking
`metadata`	`Record<string, unknown>`	No	--	Custom metadata for logging

ChatMessage type:

Field	Type	Required	Description
`role`	`'system' \| 'user' \| 'assistant' \| 'function' \| 'tool'`	Yes	Message role
`content`	`string \| null`	Yes	Message content
`name`	`string`	No	Name of the function/tool

Returns (AICompletionResponse):

Field	Type	Description
`content`	`string`	The generated text response
`model`	`string`	Model that was used
`tokens_used`	`number`	Total tokens consumed
`tokens_input`	`number`	Input/prompt tokens
`tokens_output`	`number`	Output/completion tokens
`finish_reason`	`string \| null`	Why generation stopped (`'stop'`, `'length'`, etc.)
`metadata`	`Record<string, unknown>`	Optional metadata

Throws: AIProxyException if quota is exceeded or AI is not enabled.

Equivalent curl request

curl -X POST https://api.hiveforge.dev/api/v1/proxy/ai/completions \
  -H "Content-Type: application/json" \
  -H "X-Deployment-ID: d9f2a1b4-7c3e-4f8a-b5d6-1e2f3a4b5c6d" \
  -H "X-Deployment-Secret: sk_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6" \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}],
    "model": "gpt-4o-mini",
    "temperature": 0.7
  }'

`stream(options)`

Create a streaming chat completion. Returns an AsyncGenerator that yields content chunks as they arrive.

let fullResponse = '';
 
for await (const chunk of hiveforge.ai.stream({
  messages: [
    { role: 'user', content: 'Write a short poem about TypeScript.' },
  ],
  model: 'gpt-4o-mini',
})) {
  process.stdout.write(chunk.content);
  fullResponse += chunk.content;
 
  if (chunk.done) {
    console.log('\n--- Stream complete ---');
  }
}

With callbacks:

for await (const chunk of hiveforge.ai.stream({
  messages: [{ role: 'user', content: 'Tell me a story' }],
  onChunk: (chunk) => {
    // Called for each chunk
    process.stdout.write(chunk.content);
  },
  onComplete: (fullContent) => {
    // Called when streaming finishes
    console.log(`\nTotal length: ${fullContent.length} chars`);
  },
  onError: (error) => {
    console.error('Stream error:', error.message);
  },
})) {
  // You can also process chunks here
}

Parameters (AIStreamOptions):

Same as AICompletionOptions (minus stream), plus:

Parameter	Type	Required	Description
`onChunk`	`(chunk: AIStreamChunk) => void`	No	Callback for each chunk
`onComplete`	`(fullContent: string) => void`	No	Callback when streaming completes
`onError`	`(error: Error) => void`	No	Callback for stream errors

Yields (AIStreamChunk):

Field	Type	Description
`content`	`string`	Text content of this chunk
`done`	`boolean`	Whether this is the final chunk

Returns: AsyncGenerator<AIStreamChunk, string, unknown> -- the return value is the full concatenated content.

`streamToString(options)`

Convenience method that consumes the entire stream and returns the full response as a string.

const fullResponse = await hiveforge.ai.streamToString({
  messages: [{ role: 'user', content: 'Summarize this document...' }],
  onChunk: (chunk) => process.stdout.write(chunk.content),
});
 
console.log('Final response:', fullResponse);

Parameters: Same as stream().

Returns: Promise<string>

`embed(options)`

Generate vector embeddings for text. Useful for semantic search, clustering, and similarity comparisons.

const result = await hiveforge.ai.embed({
  text: 'How do I reset my password?',
  model: 'text-embedding-3-small',
});
 
console.log(`Dimensions: ${result.dimensions}`);
console.log(`Tokens used: ${result.tokens_used}`);
console.log(`Embedding length: ${result.embeddings[0].length}`);

Multiple texts:

const result = await hiveforge.ai.embed({
  text: [
    'How do I reset my password?',
    'Where can I update my billing info?',
    'How to enable two-factor authentication',
  ],
});
 
// result.embeddings is an array of embedding arrays
for (const embedding of result.embeddings) {
  console.log(`Vector with ${embedding.length} dimensions`);
}

Parameters (AIEmbeddingOptions):

Parameter	Type	Required	Default	Description
`text`	`string \| string[]`	Yes	--	Text(s) to embed
`model`	`string`	No	`'text-embedding-3-small'`	Embedding model
`metadata`	`Record<string, unknown>`	No	--	Custom metadata

Returns (AIEmbeddingResponse):

Field	Type	Description
`embeddings`	`number[][]`	Array of embedding vectors
`model`	`string`	Model that was used
`tokens_used`	`number`	Tokens consumed
`dimensions`	`number`	Dimensionality of each embedding

Equivalent curl request

curl -X POST https://api.hiveforge.dev/api/v1/proxy/ai/embeddings \
  -H "Content-Type: application/json" \
  -H "X-Deployment-ID: d9f2a1b4-7c3e-4f8a-b5d6-1e2f3a4b5c6d" \
  -H "X-Deployment-Secret: sk_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6" \
  -d '{
    "text": "How do I reset my password?",
    "model": "text-embedding-3-small"
  }'

`getQuota()`

Get the current AI usage quota for your deployment.

const quota = await hiveforge.ai.getQuota();
 
console.log(`Used: ${quota.used} tokens`);
console.log(`Limit: ${quota.limit ?? 'unlimited'}`);
console.log(`Remaining: ${quota.remaining ?? 'unlimited'}`);
console.log(`Resets at: ${quota.resets_at}`);
console.log(`Tier: ${quota.tier}`);

Returns (AIQuotaResponse):

Field	Type	Description
`used`	`number`	Tokens used in current period
`limit`	`number \| null`	Token limit (`null` = unlimited)
`remaining`	`number \| null`	Tokens remaining (`null` = unlimited)
`resets_at`	`string \| null`	ISO timestamp when quota resets
`tier`	`string`	Current deployment tier
`model_limits`	`Record<string, boolean> \| null`	Which models are accessible

`getModels()`

List available models for your deployment's tier.

const models = await hiveforge.ai.getModels();
 
console.log(`Tier: ${models.tier}`);
console.log('Available:', models.available_models);
console.log('All models:', models.all_models);

Returns:

Field	Type	Description
`tier`	`string`	Current tier
`available_models`	`string[]`	Models accessible at your tier
`all_models`	`string[]`	All models across all tiers

`isEnabled()`

Check if AI is enabled for the current tier without making an API call (reads from cached entitlements).

if (hiveforge.ai.isEnabled()) {
  // Safe to make AI calls
}

Returns: boolean

`getRemainingQuota()`

Get remaining AI token quota from cached entitlements (no API call).

const remaining = hiveforge.ai.getRemainingQuota();
if (remaining !== null && remaining < 1000) {
  console.warn('Running low on AI tokens');
}

Returns: number | null -- null if unlimited or entitlements not loaded.

Full Example: Chat Interface

import { HiveForgeClient, AIProxyException } from '@producthacker/hiveforge-sdk';
 
const hiveforge = new HiveForgeClient({
  deploymentId: process.env.HIVEFORGE_DEPLOYMENT_ID!,
  deploymentSecret: process.env.HIVEFORGE_DEPLOYMENT_SECRET!,
});
await hiveforge.initialize();
 
async function chat(userMessage: string, history: Array<{ role: string; content: string }>) {
  if (!hiveforge.ai.isEnabled()) {
    throw new Error('AI features are not available on your current plan.');
  }
 
  const remaining = hiveforge.ai.getRemainingQuota();
  if (remaining !== null && remaining < 100) {
    throw new Error('AI token quota nearly exhausted. Please upgrade your plan.');
  }
 
  try {
    const messages = [
      { role: 'system' as const, content: 'You are a helpful assistant.' },
      ...history.map(m => ({ role: m.role as 'user' | 'assistant', content: m.content })),
      { role: 'user' as const, content: userMessage },
    ];
 
    let response = '';
    for await (const chunk of hiveforge.ai.stream({ messages })) {
      response += chunk.content;
      // Update UI with chunk.content
    }
    return response;
  } catch (error) {
    if (error instanceof AIProxyException) {
      if (error.isQuotaExceeded) {
        // Redirect to upgrade page
        window.location.href = error.upgradeUrl ?? '/pricing';
      }
    }
    throw error;
  }
}

Getting Started Billing