AI Proxy
The AI proxy lets your deployed application make OpenAI-compatible requests through HiveForge without embedding API keys in your app. Quota enforcement, model access, and usage tracking are handled automatically based on your deployment's tier.
Access the AI proxy via hiveforge.ai.
import { HiveForgeClient } from '@producthacker/hiveforge-sdk';
const hiveforge = new HiveForgeClient({
deploymentId: process.env.HIVEFORGE_DEPLOYMENT_ID!,
deploymentSecret: process.env.HIVEFORGE_DEPLOYMENT_SECRET!,
});
await hiveforge.initialize();
// Check if AI is available for this tier
if (hiveforge.ai.isEnabled()) {
const response = await hiveforge.ai.complete({
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.content);
}Methods
complete(options)
Create a chat completion. Sends messages to the AI model and returns the full response.
const response = await hiveforge.ai.complete({
messages: [
{ role: 'system', content: 'You are a helpful customer support agent.' },
{ role: 'user', content: 'How do I reset my password?' },
],
model: 'gpt-4o-mini',
temperature: 0.7,
max_tokens: 500,
});
console.log(response.content);
console.log(`Tokens used: ${response.tokens_used}`);
console.log(`Model: ${response.model}`);Parameters (AICompletionOptions):
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
messages | ChatMessage[] | Yes | -- | Array of conversation messages |
model | string | No | 'gpt-4o-mini' | Model to use |
max_tokens | number | No | -- | Maximum tokens in the response |
temperature | number | No | 0.7 | Sampling temperature (0-2) |
top_p | number | No | -- | Nucleus sampling parameter |
stream | boolean | No | false | Whether to stream (use stream() method instead) |
stop | string | string[] | No | -- | Stop sequences |
presence_penalty | number | No | 0 | Presence penalty (-2 to 2) |
frequency_penalty | number | No | 0 | Frequency penalty (-2 to 2) |
user | string | No | -- | End-user identifier for abuse tracking |
metadata | Record<string, unknown> | No | -- | Custom metadata for logging |
ChatMessage type:
| Field | Type | Required | Description |
|---|---|---|---|
role | 'system' | 'user' | 'assistant' | 'function' | 'tool' | Yes | Message role |
content | string | null | Yes | Message content |
name | string | No | Name of the function/tool |
Returns (AICompletionResponse):
| Field | Type | Description |
|---|---|---|
content | string | The generated text response |
model | string | Model that was used |
tokens_used | number | Total tokens consumed |
tokens_input | number | Input/prompt tokens |
tokens_output | number | Output/completion tokens |
finish_reason | string | null | Why generation stopped ('stop', 'length', etc.) |
metadata | Record<string, unknown> | Optional metadata |
Throws: AIProxyException if quota is exceeded or AI is not enabled.
Equivalent curl request
curl -X POST https://api.hiveforge.dev/api/v1/proxy/ai/completions \
-H "Content-Type: application/json" \
-H "X-Deployment-ID: d9f2a1b4-7c3e-4f8a-b5d6-1e2f3a4b5c6d" \
-H "X-Deployment-Secret: sk_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}],
"model": "gpt-4o-mini",
"temperature": 0.7
}'stream(options)
Create a streaming chat completion. Returns an AsyncGenerator that yields content chunks as they arrive.
let fullResponse = '';
for await (const chunk of hiveforge.ai.stream({
messages: [
{ role: 'user', content: 'Write a short poem about TypeScript.' },
],
model: 'gpt-4o-mini',
})) {
process.stdout.write(chunk.content);
fullResponse += chunk.content;
if (chunk.done) {
console.log('\n--- Stream complete ---');
}
}With callbacks:
for await (const chunk of hiveforge.ai.stream({
messages: [{ role: 'user', content: 'Tell me a story' }],
onChunk: (chunk) => {
// Called for each chunk
process.stdout.write(chunk.content);
},
onComplete: (fullContent) => {
// Called when streaming finishes
console.log(`\nTotal length: ${fullContent.length} chars`);
},
onError: (error) => {
console.error('Stream error:', error.message);
},
})) {
// You can also process chunks here
}Parameters (AIStreamOptions):
Same as AICompletionOptions (minus stream), plus:
| Parameter | Type | Required | Description |
|---|---|---|---|
onChunk | (chunk: AIStreamChunk) => void | No | Callback for each chunk |
onComplete | (fullContent: string) => void | No | Callback when streaming completes |
onError | (error: Error) => void | No | Callback for stream errors |
Yields (AIStreamChunk):
| Field | Type | Description |
|---|---|---|
content | string | Text content of this chunk |
done | boolean | Whether this is the final chunk |
Returns: AsyncGenerator<AIStreamChunk, string, unknown> -- the return value is the full concatenated content.
streamToString(options)
Convenience method that consumes the entire stream and returns the full response as a string.
const fullResponse = await hiveforge.ai.streamToString({
messages: [{ role: 'user', content: 'Summarize this document...' }],
onChunk: (chunk) => process.stdout.write(chunk.content),
});
console.log('Final response:', fullResponse);Parameters: Same as stream().
Returns: Promise<string>
embed(options)
Generate vector embeddings for text. Useful for semantic search, clustering, and similarity comparisons.
const result = await hiveforge.ai.embed({
text: 'How do I reset my password?',
model: 'text-embedding-3-small',
});
console.log(`Dimensions: ${result.dimensions}`);
console.log(`Tokens used: ${result.tokens_used}`);
console.log(`Embedding length: ${result.embeddings[0].length}`);Multiple texts:
const result = await hiveforge.ai.embed({
text: [
'How do I reset my password?',
'Where can I update my billing info?',
'How to enable two-factor authentication',
],
});
// result.embeddings is an array of embedding arrays
for (const embedding of result.embeddings) {
console.log(`Vector with ${embedding.length} dimensions`);
}Parameters (AIEmbeddingOptions):
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | string[] | Yes | -- | Text(s) to embed |
model | string | No | 'text-embedding-3-small' | Embedding model |
metadata | Record<string, unknown> | No | -- | Custom metadata |
Returns (AIEmbeddingResponse):
| Field | Type | Description |
|---|---|---|
embeddings | number[][] | Array of embedding vectors |
model | string | Model that was used |
tokens_used | number | Tokens consumed |
dimensions | number | Dimensionality of each embedding |
Equivalent curl request
curl -X POST https://api.hiveforge.dev/api/v1/proxy/ai/embeddings \
-H "Content-Type: application/json" \
-H "X-Deployment-ID: d9f2a1b4-7c3e-4f8a-b5d6-1e2f3a4b5c6d" \
-H "X-Deployment-Secret: sk_live_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6" \
-d '{
"text": "How do I reset my password?",
"model": "text-embedding-3-small"
}'getQuota()
Get the current AI usage quota for your deployment.
const quota = await hiveforge.ai.getQuota();
console.log(`Used: ${quota.used} tokens`);
console.log(`Limit: ${quota.limit ?? 'unlimited'}`);
console.log(`Remaining: ${quota.remaining ?? 'unlimited'}`);
console.log(`Resets at: ${quota.resets_at}`);
console.log(`Tier: ${quota.tier}`);Returns (AIQuotaResponse):
| Field | Type | Description |
|---|---|---|
used | number | Tokens used in current period |
limit | number | null | Token limit (null = unlimited) |
remaining | number | null | Tokens remaining (null = unlimited) |
resets_at | string | null | ISO timestamp when quota resets |
tier | string | Current deployment tier |
model_limits | Record<string, boolean> | null | Which models are accessible |
getModels()
List available models for your deployment's tier.
const models = await hiveforge.ai.getModels();
console.log(`Tier: ${models.tier}`);
console.log('Available:', models.available_models);
console.log('All models:', models.all_models);Returns:
| Field | Type | Description |
|---|---|---|
tier | string | Current tier |
available_models | string[] | Models accessible at your tier |
all_models | string[] | All models across all tiers |
isEnabled()
Check if AI is enabled for the current tier without making an API call (reads from cached entitlements).
if (hiveforge.ai.isEnabled()) {
// Safe to make AI calls
}Returns: boolean
getRemainingQuota()
Get remaining AI token quota from cached entitlements (no API call).
const remaining = hiveforge.ai.getRemainingQuota();
if (remaining !== null && remaining < 1000) {
console.warn('Running low on AI tokens');
}Returns: number | null -- null if unlimited or entitlements not loaded.
Full Example: Chat Interface
import { HiveForgeClient, AIProxyException } from '@producthacker/hiveforge-sdk';
const hiveforge = new HiveForgeClient({
deploymentId: process.env.HIVEFORGE_DEPLOYMENT_ID!,
deploymentSecret: process.env.HIVEFORGE_DEPLOYMENT_SECRET!,
});
await hiveforge.initialize();
async function chat(userMessage: string, history: Array<{ role: string; content: string }>) {
if (!hiveforge.ai.isEnabled()) {
throw new Error('AI features are not available on your current plan.');
}
const remaining = hiveforge.ai.getRemainingQuota();
if (remaining !== null && remaining < 100) {
throw new Error('AI token quota nearly exhausted. Please upgrade your plan.');
}
try {
const messages = [
{ role: 'system' as const, content: 'You are a helpful assistant.' },
...history.map(m => ({ role: m.role as 'user' | 'assistant', content: m.content })),
{ role: 'user' as const, content: userMessage },
];
let response = '';
for await (const chunk of hiveforge.ai.stream({ messages })) {
response += chunk.content;
// Update UI with chunk.content
}
return response;
} catch (error) {
if (error instanceof AIProxyException) {
if (error.isQuotaExceeded) {
// Redirect to upgrade page
window.location.href = error.upgradeUrl ?? '/pricing';
}
}
throw error;
}
}