AIForge Complete Installation & Setup Guide

Comprehensive step-by-step guide to install and configure AIForge - the AI-powered SaaS boilerplate with OpenAI, Claude, and vector database integration.

December 27, 2024
8 min read
By FastSaaS Team

AIForge Complete Installation & Setup Guide

Welcome to AIForge! This guide covers everything you need to set up your AI-powered SaaS application with multiple LLM providers, RAG capabilities, and vector databases.

Table of Contents

  1. What's Included in AIForge
  2. Prerequisites
  3. Repository Access & Installation
  4. Environment Configuration
  5. AI Provider Setup
  6. Vector Database (Qdrant)
  7. RAG Implementation
  8. Chat Interface
  9. Token Tracking & Cost Management
  10. Deployment
  11. Best Practices

What's Included in AIForge {#whats-included}

AIForge includes everything in SaaSForge PLUS:

| Feature | Description | | ----------------------- | --------------------------------------------- | | Multi-LLM Support | OpenAI GPT-4, Anthropic Claude, Google Gemini | | Streaming Responses | Real-time AI response streaming | | Vector Database | Qdrant for semantic search | | RAG System | Retrieval Augmented Generation | | Document Upload | PDF, DOCX, TXT processing | | Embeddings | OpenAI/Cohere embeddings | | Token Tracking | Usage monitoring per user | | Cost Calculation | Real-time cost estimates | | Chat History | Persistent conversation storage | | Model Switching | Switch models mid-conversation |


Prerequisites

Required

  • Node.js 18.x or 20.x
  • PostgreSQL 14+
  • Redis (for queues)
  • OpenAI API key (minimum)

Recommended

  • Docker Desktop
  • 4GB+ RAM for local development
  • Anthropic API key (for Claude)
  • Cohere API key (for embeddings)

Repository Access & Installation {#repository-access}

Step 1: Accept GitHub Invitation

After purchase, you'll receive a GitHub invitation email. Accept it to access the AIForge repository.

Step 2: Clone Repository

# Clone the repository
git clone git@github.com:FastSaaSCloud/aiforge.git
cd aiforge

# Install dependencies
npm install

Step 3: Environment Setup

# Copy environment template
cp .env.example .env

Environment Configuration

Configure your .env file with all necessary variables:

# ========================================
# DATABASE
# ========================================
DATABASE_URL="postgresql://aiforge:aiforge_password@localhost:5432/aiforge"

# ========================================
# AUTHENTICATION
# ========================================
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="your-super-secret-key-min-32-chars"

# ========================================
# AI PROVIDERS (Configure at least one)
# ========================================

# OpenAI (Required for embeddings)
OPENAI_API_KEY="sk-..."
OPENAI_ORG_ID=""  # Optional

# Anthropic Claude (Optional but recommended)
ANTHROPIC_API_KEY="sk-ant-..."

# Google Gemini (Optional)
GOOGLE_AI_API_KEY="..."

# Cohere (Optional - for alternative embeddings)
COHERE_API_KEY="..."

# ========================================
# VECTOR DATABASE
# ========================================
QDRANT_URL="http://localhost:6333"
QDRANT_API_KEY=""  # Required for cloud

# ========================================
# AI CONFIGURATION
# ========================================
DEFAULT_AI_MODEL="gpt-4-turbo-preview"
DEFAULT_EMBEDDING_MODEL="text-embedding-3-small"
MAX_TOKENS_PER_REQUEST=4000
ENABLE_STREAMING=true

# ========================================
# RATE LIMITING
# ========================================
AI_RATE_LIMIT_FREE=10        # requests per hour (free tier)
AI_RATE_LIMIT_PRO=100        # requests per hour (pro tier)
AI_RATE_LIMIT_ENTERPRISE=1000 # requests per hour (enterprise)

# ========================================
# TOKEN LIMITS (per month)
# ========================================
TOKEN_LIMIT_FREE=10000
TOKEN_LIMIT_PRO=100000
TOKEN_LIMIT_ENTERPRISE=1000000

AI Provider Setup

OpenAI Setup (Required)

  1. Go to platform.openai.com
  2. Navigate to API Keys
  3. Click Create new secret key
  4. Copy and add to .env:
OPENAI_API_KEY="sk-proj-..."

Available OpenAI Models

| Model | Use Case | Cost | | ------------------- | ------------ | ---------------- | | gpt-4-turbo-preview | Best quality | $0.01/1K tokens | | gpt-4 | High quality | $0.03/1K tokens | | gpt-3.5-turbo | Fast & cheap | $0.001/1K tokens |

Anthropic Claude Setup

  1. Go to console.anthropic.com
  2. Navigate to API Keys
  3. Create new key
ANTHROPIC_API_KEY="sk-ant-api03-..."

Available Claude Models

| Model | Use Case | Context | | --------------- | --------------- | ----------- | | claude-3-opus | Highest quality | 200K tokens | | claude-3-sonnet | Balanced | 200K tokens | | claude-3-haiku | Fastest | 200K tokens |

Google Gemini Setup

  1. Go to makersuite.google.com/app/apikey
  2. Create API key
GOOGLE_AI_API_KEY="..."

Vector Database (Qdrant) {#vector-database}

AIForge uses Qdrant for semantic search and RAG capabilities.

Option 1: Docker (Recommended for Development)

# Start Qdrant with Docker
docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v qdrant_storage:/qdrant/storage \
  qdrant/qdrant

# Verify it's running
curl http://localhost:6333/collections
QDRANT_URL="http://localhost:6333"

Option 2: Qdrant Cloud (Production)

  1. Sign up at cloud.qdrant.io
  2. Create a cluster
  3. Get connection URL and API key
QDRANT_URL="https://your-cluster.qdrant.io:6333"
QDRANT_API_KEY="your-api-key"

Creating Collections

AIForge automatically creates collections, but you can also do it manually:

// lib/vector/qdrant.ts
import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({
  url: process.env.QDRANT_URL,
  apiKey: process.env.QDRANT_API_KEY,
});

await client.createCollection("documents", {
  vectors: {
    size: 1536, // OpenAI embedding size
    distance: "Cosine",
  },
});

RAG Implementation

How RAG Works in AIForge

User Question → Generate Embedding → Search Vector DB →
Retrieve Relevant Chunks → Augment Prompt → Send to LLM → Response

Document Upload Flow

// 1. User uploads document
const file = await request.formData();

// 2. Extract text
const text = await extractText(file);

// 3. Chunk the text
const chunks = chunkText(text, { chunkSize: 500, overlap: 50 });

// 4. Generate embeddings
const embeddings = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: chunks,
});

// 5. Store in Qdrant
await qdrant.upsert("documents", {
  points: chunks.map((chunk, i) => ({
    id: generateId(),
    vector: embeddings.data[i].embedding,
    payload: { text: chunk, documentId, userId },
  })),
});

Querying with RAG

// 1. Generate query embedding
const queryEmbedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: userQuery,
});

// 2. Search similar documents
const results = await qdrant.search("documents", {
  vector: queryEmbedding.data[0].embedding,
  limit: 5,
  filter: { must: [{ key: "userId", match: { value: userId } }] },
});

// 3. Build context
const context = results.map((r) => r.payload.text).join("\n\n");

// 4. Generate response
const response = await openai.chat.completions.create({
  model: "gpt-4-turbo-preview",
  messages: [
    { role: "system", content: `Use this context to answer: ${context}` },
    { role: "user", content: userQuery },
  ],
});

Chat Interface

AIForge includes a complete chat UI with:

Features

  • Real-time streaming responses
  • Model selection dropdown
  • Chat history sidebar
  • Context window display
  • Token usage indicator
  • Copy/regenerate buttons

Streaming Implementation

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { messages, model } = await request.json();

  const stream = await openai.chat.completions.create({
    model,
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || "";
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

Token Tracking & Cost Management {#token-tracking}

Database Schema

model TokenUsage {
  id        String   @id @default(cuid())
  userId    String
  model     String
  promptTokens    Int
  completionTokens Int
  totalTokens     Int
  cost      Float
  createdAt DateTime @default(now())

  user      User     @relation(fields: [userId], references: [id])
}

Tracking Usage

// After each AI call
await prisma.tokenUsage.create({
  data: {
    userId: session.user.id,
    model: "gpt-4-turbo-preview",
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
    totalTokens: response.usage.total_tokens,
    cost: calculateCost(response.usage, "gpt-4-turbo-preview"),
  },
});

Cost Calculation

const PRICING = {
  "gpt-4-turbo-preview": { input: 0.01, output: 0.03 },
  "gpt-3.5-turbo": { input: 0.0005, output: 0.0015 },
  "claude-3-opus": { input: 0.015, output: 0.075 },
  "claude-3-sonnet": { input: 0.003, output: 0.015 },
};

function calculateCost(usage, model) {
  const prices = PRICING[model];
  return (
    (usage.prompt_tokens / 1000) * prices.input +
    (usage.completion_tokens / 1000) * prices.output
  );
}

Deployment

Production Checklist

  1. API Keys: Ensure all production API keys are set
  2. Rate Limiting: Configure appropriate limits
  3. Error Handling: Set up Sentry for AI errors
  4. Caching: Enable Redis for response caching
  5. Monitoring: Track token usage and costs

VPS Deployment

# Install dependencies
npm install --production

# Build
npm run build

# Start with PM2
pm2 start npm --name "aiforge" -- start

# Configure Nginx with increased timeouts for AI
location / {
    proxy_pass http://localhost:3000;
    proxy_read_timeout 300;  # 5 minutes for AI responses
    proxy_connect_timeout 300;
}

Environment Variables for Production

# Use production API keys
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."

# Production Qdrant
QDRANT_URL="https://your-cluster.qdrant.io:6333"
QDRANT_API_KEY="production-key"

# Enable caching
REDIS_URL="redis://localhost:6379"
ENABLE_RESPONSE_CACHING=true

Best Practices

1. API Key Security

  • Never commit API keys to Git
  • Use environment variables
  • Rotate keys regularly

2. Cost Control

  • Set user token limits
  • Monitor usage daily
  • Implement spending alerts

3. Error Handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.code === 'rate_limit_exceeded') {
    // Fall back to cheaper model
    return useFallbackModel();
  }
  if (error.code === 'context_length_exceeded') {
    // Truncate context
    return retryWithShorterContext();
  }
  throw error;
}

4. Model Fallbacks

const MODEL_FALLBACKS = {
  "gpt-4-turbo-preview": "gpt-3.5-turbo",
  "claude-3-opus": "claude-3-sonnet",
};

Getting Help

  • Documentation: https://fastsaas.cloud/docs
  • Discord: Join our community
  • Email: support@fastsaas.cloud

Next Steps

  1. Multi-Tenancy for AI Apps
  2. Scaling AI Applications
  3. Payment Integration

Happy building with AI! 🤖