AIForge Complete Installation & Setup Guide
Comprehensive step-by-step guide to install and configure AIForge - the AI-powered SaaS boilerplate with OpenAI, Claude, and vector database integration.
AIForge Complete Installation & Setup Guide
Welcome to AIForge! This guide covers everything you need to set up your AI-powered SaaS application with multiple LLM providers, RAG capabilities, and vector databases.
Table of Contents
- What's Included in AIForge
- Prerequisites
- Repository Access & Installation
- Environment Configuration
- AI Provider Setup
- Vector Database (Qdrant)
- RAG Implementation
- Chat Interface
- Token Tracking & Cost Management
- Deployment
- Best Practices
What's Included in AIForge {#whats-included}
AIForge includes everything in SaaSForge PLUS:
| Feature | Description | | ----------------------- | --------------------------------------------- | | Multi-LLM Support | OpenAI GPT-4, Anthropic Claude, Google Gemini | | Streaming Responses | Real-time AI response streaming | | Vector Database | Qdrant for semantic search | | RAG System | Retrieval Augmented Generation | | Document Upload | PDF, DOCX, TXT processing | | Embeddings | OpenAI/Cohere embeddings | | Token Tracking | Usage monitoring per user | | Cost Calculation | Real-time cost estimates | | Chat History | Persistent conversation storage | | Model Switching | Switch models mid-conversation |
Prerequisites
Required
- Node.js 18.x or 20.x
- PostgreSQL 14+
- Redis (for queues)
- OpenAI API key (minimum)
Recommended
- Docker Desktop
- 4GB+ RAM for local development
- Anthropic API key (for Claude)
- Cohere API key (for embeddings)
Repository Access & Installation {#repository-access}
Step 1: Accept GitHub Invitation
After purchase, you'll receive a GitHub invitation email. Accept it to access the AIForge repository.
Step 2: Clone Repository
# Clone the repository
git clone git@github.com:FastSaaSCloud/aiforge.git
cd aiforge
# Install dependencies
npm install
Step 3: Environment Setup
# Copy environment template
cp .env.example .env
Environment Configuration
Configure your .env file with all necessary variables:
# ========================================
# DATABASE
# ========================================
DATABASE_URL="postgresql://aiforge:aiforge_password@localhost:5432/aiforge"
# ========================================
# AUTHENTICATION
# ========================================
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="your-super-secret-key-min-32-chars"
# ========================================
# AI PROVIDERS (Configure at least one)
# ========================================
# OpenAI (Required for embeddings)
OPENAI_API_KEY="sk-..."
OPENAI_ORG_ID="" # Optional
# Anthropic Claude (Optional but recommended)
ANTHROPIC_API_KEY="sk-ant-..."
# Google Gemini (Optional)
GOOGLE_AI_API_KEY="..."
# Cohere (Optional - for alternative embeddings)
COHERE_API_KEY="..."
# ========================================
# VECTOR DATABASE
# ========================================
QDRANT_URL="http://localhost:6333"
QDRANT_API_KEY="" # Required for cloud
# ========================================
# AI CONFIGURATION
# ========================================
DEFAULT_AI_MODEL="gpt-4-turbo-preview"
DEFAULT_EMBEDDING_MODEL="text-embedding-3-small"
MAX_TOKENS_PER_REQUEST=4000
ENABLE_STREAMING=true
# ========================================
# RATE LIMITING
# ========================================
AI_RATE_LIMIT_FREE=10 # requests per hour (free tier)
AI_RATE_LIMIT_PRO=100 # requests per hour (pro tier)
AI_RATE_LIMIT_ENTERPRISE=1000 # requests per hour (enterprise)
# ========================================
# TOKEN LIMITS (per month)
# ========================================
TOKEN_LIMIT_FREE=10000
TOKEN_LIMIT_PRO=100000
TOKEN_LIMIT_ENTERPRISE=1000000
AI Provider Setup
OpenAI Setup (Required)
- Go to platform.openai.com
- Navigate to API Keys
- Click Create new secret key
- Copy and add to
.env:
OPENAI_API_KEY="sk-proj-..."
Available OpenAI Models
| Model | Use Case | Cost | | ------------------- | ------------ | ---------------- | | gpt-4-turbo-preview | Best quality | $0.01/1K tokens | | gpt-4 | High quality | $0.03/1K tokens | | gpt-3.5-turbo | Fast & cheap | $0.001/1K tokens |
Anthropic Claude Setup
- Go to console.anthropic.com
- Navigate to API Keys
- Create new key
ANTHROPIC_API_KEY="sk-ant-api03-..."
Available Claude Models
| Model | Use Case | Context | | --------------- | --------------- | ----------- | | claude-3-opus | Highest quality | 200K tokens | | claude-3-sonnet | Balanced | 200K tokens | | claude-3-haiku | Fastest | 200K tokens |
Google Gemini Setup
- Go to makersuite.google.com/app/apikey
- Create API key
GOOGLE_AI_API_KEY="..."
Vector Database (Qdrant) {#vector-database}
AIForge uses Qdrant for semantic search and RAG capabilities.
Option 1: Docker (Recommended for Development)
# Start Qdrant with Docker
docker run -d \
--name qdrant \
-p 6333:6333 \
-v qdrant_storage:/qdrant/storage \
qdrant/qdrant
# Verify it's running
curl http://localhost:6333/collections
QDRANT_URL="http://localhost:6333"
Option 2: Qdrant Cloud (Production)
- Sign up at cloud.qdrant.io
- Create a cluster
- Get connection URL and API key
QDRANT_URL="https://your-cluster.qdrant.io:6333"
QDRANT_API_KEY="your-api-key"
Creating Collections
AIForge automatically creates collections, but you can also do it manually:
// lib/vector/qdrant.ts
import { QdrantClient } from "@qdrant/js-client-rest";
const client = new QdrantClient({
url: process.env.QDRANT_URL,
apiKey: process.env.QDRANT_API_KEY,
});
await client.createCollection("documents", {
vectors: {
size: 1536, // OpenAI embedding size
distance: "Cosine",
},
});
RAG Implementation
How RAG Works in AIForge
User Question → Generate Embedding → Search Vector DB →
Retrieve Relevant Chunks → Augment Prompt → Send to LLM → Response
Document Upload Flow
// 1. User uploads document
const file = await request.formData();
// 2. Extract text
const text = await extractText(file);
// 3. Chunk the text
const chunks = chunkText(text, { chunkSize: 500, overlap: 50 });
// 4. Generate embeddings
const embeddings = await openai.embeddings.create({
model: "text-embedding-3-small",
input: chunks,
});
// 5. Store in Qdrant
await qdrant.upsert("documents", {
points: chunks.map((chunk, i) => ({
id: generateId(),
vector: embeddings.data[i].embedding,
payload: { text: chunk, documentId, userId },
})),
});
Querying with RAG
// 1. Generate query embedding
const queryEmbedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: userQuery,
});
// 2. Search similar documents
const results = await qdrant.search("documents", {
vector: queryEmbedding.data[0].embedding,
limit: 5,
filter: { must: [{ key: "userId", match: { value: userId } }] },
});
// 3. Build context
const context = results.map((r) => r.payload.text).join("\n\n");
// 4. Generate response
const response = await openai.chat.completions.create({
model: "gpt-4-turbo-preview",
messages: [
{ role: "system", content: `Use this context to answer: ${context}` },
{ role: "user", content: userQuery },
],
});
Chat Interface
AIForge includes a complete chat UI with:
Features
- Real-time streaming responses
- Model selection dropdown
- Chat history sidebar
- Context window display
- Token usage indicator
- Copy/regenerate buttons
Streaming Implementation
// app/api/chat/route.ts
export async function POST(request: Request) {
const { messages, model } = await request.json();
const stream = await openai.chat.completions.create({
model,
messages,
stream: true,
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content || "";
controller.enqueue(encoder.encode(text));
}
controller.close();
},
});
return new Response(readable, {
headers: { "Content-Type": "text/event-stream" },
});
}
Token Tracking & Cost Management {#token-tracking}
Database Schema
model TokenUsage {
id String @id @default(cuid())
userId String
model String
promptTokens Int
completionTokens Int
totalTokens Int
cost Float
createdAt DateTime @default(now())
user User @relation(fields: [userId], references: [id])
}
Tracking Usage
// After each AI call
await prisma.tokenUsage.create({
data: {
userId: session.user.id,
model: "gpt-4-turbo-preview",
promptTokens: response.usage.prompt_tokens,
completionTokens: response.usage.completion_tokens,
totalTokens: response.usage.total_tokens,
cost: calculateCost(response.usage, "gpt-4-turbo-preview"),
},
});
Cost Calculation
const PRICING = {
"gpt-4-turbo-preview": { input: 0.01, output: 0.03 },
"gpt-3.5-turbo": { input: 0.0005, output: 0.0015 },
"claude-3-opus": { input: 0.015, output: 0.075 },
"claude-3-sonnet": { input: 0.003, output: 0.015 },
};
function calculateCost(usage, model) {
const prices = PRICING[model];
return (
(usage.prompt_tokens / 1000) * prices.input +
(usage.completion_tokens / 1000) * prices.output
);
}
Deployment
Production Checklist
- API Keys: Ensure all production API keys are set
- Rate Limiting: Configure appropriate limits
- Error Handling: Set up Sentry for AI errors
- Caching: Enable Redis for response caching
- Monitoring: Track token usage and costs
VPS Deployment
# Install dependencies
npm install --production
# Build
npm run build
# Start with PM2
pm2 start npm --name "aiforge" -- start
# Configure Nginx with increased timeouts for AI
location / {
proxy_pass http://localhost:3000;
proxy_read_timeout 300; # 5 minutes for AI responses
proxy_connect_timeout 300;
}
Environment Variables for Production
# Use production API keys
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."
# Production Qdrant
QDRANT_URL="https://your-cluster.qdrant.io:6333"
QDRANT_API_KEY="production-key"
# Enable caching
REDIS_URL="redis://localhost:6379"
ENABLE_RESPONSE_CACHING=true
Best Practices
1. API Key Security
- Never commit API keys to Git
- Use environment variables
- Rotate keys regularly
2. Cost Control
- Set user token limits
- Monitor usage daily
- Implement spending alerts
3. Error Handling
try {
const response = await openai.chat.completions.create({...});
} catch (error) {
if (error.code === 'rate_limit_exceeded') {
// Fall back to cheaper model
return useFallbackModel();
}
if (error.code === 'context_length_exceeded') {
// Truncate context
return retryWithShorterContext();
}
throw error;
}
4. Model Fallbacks
const MODEL_FALLBACKS = {
"gpt-4-turbo-preview": "gpt-3.5-turbo",
"claude-3-opus": "claude-3-sonnet",
};
Getting Help
- Documentation: https://fastsaas.cloud/docs
- Discord: Join our community
- Email: support@fastsaas.cloud
Next Steps
Happy building with AI! 🤖