AIForge Complete Installation & Setup Guide

Welcome to AIForge! This guide covers everything you need to set up your AI-powered SaaS application with multiple LLM providers, RAG capabilities, and vector databases.

What's Included in AIForge
Prerequisites
Repository Access & Installation
Environment Configuration
AI Provider Setup
Vector Database (Qdrant)
RAG Implementation
Chat Interface
Token Tracking & Cost Management
Deployment
Best Practices

What's Included in AIForge {#whats-included}

AIForge includes everything in SaaSForge PLUS:

| Feature | Description | | ----------------------- | --------------------------------------------- | | Multi-LLM Support | OpenAI GPT-4, Anthropic Claude, Google Gemini | | Streaming Responses | Real-time AI response streaming | | Vector Database | Qdrant for semantic search | | RAG System | Retrieval Augmented Generation | | Document Upload | PDF, DOCX, TXT processing | | Embeddings | OpenAI/Cohere embeddings | | Token Tracking | Usage monitoring per user | | Cost Calculation | Real-time cost estimates | | Chat History | Persistent conversation storage | | Model Switching | Switch models mid-conversation |

Prerequisites

Required

Node.js 18.x or 20.x
PostgreSQL 14+
Redis (for queues)
OpenAI API key (minimum)

Repository Access & Installation {#repository-access}

Step 1: Accept GitHub Invitation

After purchase, you'll receive a GitHub invitation email. Accept it to access the AIForge repository.

Step 2: Clone Repository

# Clone the repository
git clone git@github.com:FastSaaSCloud/aiforge.git
cd aiforge

# Install dependencies
npm install

Step 3: Environment Setup

# Copy environment template
cp .env.example .env

Environment Configuration

Configure your .env file with all necessary variables:

# ========================================
# DATABASE
# ========================================
DATABASE_URL="postgresql://aiforge:aiforge_password@localhost:5432/aiforge"

# ========================================
# AUTHENTICATION
# ========================================
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="your-super-secret-key-min-32-chars"

# ========================================
# AI PROVIDERS (Configure at least one)
# ========================================

# OpenAI (Required for embeddings)
OPENAI_API_KEY="sk-..."
OPENAI_ORG_ID=""  # Optional

# Anthropic Claude (Optional but recommended)
ANTHROPIC_API_KEY="sk-ant-..."

# Google Gemini (Optional)
GOOGLE_AI_API_KEY="..."

# Cohere (Optional - for alternative embeddings)
COHERE_API_KEY="..."

# ========================================
# VECTOR DATABASE
# ========================================
QDRANT_URL="http://localhost:6333"
QDRANT_API_KEY=""  # Required for cloud

# ========================================
# AI CONFIGURATION
# ========================================
DEFAULT_AI_MODEL="gpt-4-turbo-preview"
DEFAULT_EMBEDDING_MODEL="text-embedding-3-small"
MAX_TOKENS_PER_REQUEST=4000
ENABLE_STREAMING=true

# ========================================
# RATE LIMITING
# ========================================
AI_RATE_LIMIT_FREE=10        # requests per hour (free tier)
AI_RATE_LIMIT_PRO=100        # requests per hour (pro tier)
AI_RATE_LIMIT_ENTERPRISE=1000 # requests per hour (enterprise)

# ========================================
# TOKEN LIMITS (per month)
# ========================================
TOKEN_LIMIT_FREE=10000
TOKEN_LIMIT_PRO=100000
TOKEN_LIMIT_ENTERPRISE=1000000

AI Provider Setup

OpenAI Setup (Required)

Go to platform.openai.com
Navigate to API Keys
Click Create new secret key
Copy and add to .env:

OPENAI_API_KEY="sk-proj-..."

Available OpenAI Models

| Model | Use Case | Cost | | ------------------- | ------------ | ---------------- | | gpt-4-turbo-preview | Best quality | $0.01/1K tokens | | gpt-4 | High quality | $0.03/1K tokens | | gpt-3.5-turbo | Fast & cheap | $0.001/1K tokens |

Anthropic Claude Setup

Go to console.anthropic.com
Navigate to API Keys
Create new key

ANTHROPIC_API_KEY="sk-ant-api03-..."

Available Claude Models

| Model | Use Case | Context | | --------------- | --------------- | ----------- | | claude-3-opus | Highest quality | 200K tokens | | claude-3-sonnet | Balanced | 200K tokens | | claude-3-haiku | Fastest | 200K tokens |

Google Gemini Setup

Go to makersuite.google.com/app/apikey
Create API key

GOOGLE_AI_API_KEY="..."

Vector Database (Qdrant) {#vector-database}

AIForge uses Qdrant for semantic search and RAG capabilities.

Option 1: Docker (Recommended for Development)

# Start Qdrant with Docker
docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v qdrant_storage:/qdrant/storage \
  qdrant/qdrant

# Verify it's running
curl http://localhost:6333/collections

QDRANT_URL="http://localhost:6333"

Option 2: Qdrant Cloud (Production)

Sign up at cloud.qdrant.io
Create a cluster
Get connection URL and API key

QDRANT_URL="https://your-cluster.qdrant.io:6333"
QDRANT_API_KEY="your-api-key"

Creating Collections

AIForge automatically creates collections, but you can also do it manually:

// lib/vector/qdrant.ts
import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({
  url: process.env.QDRANT_URL,
  apiKey: process.env.QDRANT_API_KEY,
});

await client.createCollection("documents", {
  vectors: {
    size: 1536, // OpenAI embedding size
    distance: "Cosine",
  },
});

RAG Implementation

How RAG Works in AIForge

User Question → Generate Embedding → Search Vector DB →
Retrieve Relevant Chunks → Augment Prompt → Send to LLM → Response

Document Upload Flow

// 1. User uploads document
const file = await request.formData();

// 2. Extract text
const text = await extractText(file);

// 3. Chunk the text
const chunks = chunkText(text, { chunkSize: 500, overlap: 50 });

// 4. Generate embeddings
const embeddings = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: chunks,
});

// 5. Store in Qdrant
await qdrant.upsert("documents", {
  points: chunks.map((chunk, i) => ({
    id: generateId(),
    vector: embeddings.data[i].embedding,
    payload: { text: chunk, documentId, userId },
  })),
});

Querying with RAG

// 1. Generate query embedding
const queryEmbedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: userQuery,
});

// 2. Search similar documents
const results = await qdrant.search("documents", {
  vector: queryEmbedding.data[0].embedding,
  limit: 5,
  filter: { must: [{ key: "userId", match: { value: userId } }] },
});

// 3. Build context
const context = results.map((r) => r.payload.text).join("\n\n");

// 4. Generate response
const response = await openai.chat.completions.create({
  model: "gpt-4-turbo-preview",
  messages: [
    { role: "system", content: `Use this context to answer: ${context}` },
    { role: "user", content: userQuery },
  ],
});

Chat Interface

AIForge includes a complete chat UI with:

Features

Real-time streaming responses
Model selection dropdown
Chat history sidebar
Context window display
Token usage indicator
Copy/regenerate buttons

Streaming Implementation

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { messages, model } = await request.json();

  const stream = await openai.chat.completions.create({
    model,
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || "";
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

Token Tracking & Cost Management {#token-tracking}

Database Schema

model TokenUsage {
  id        String   @id @default(cuid())
  userId    String
  model     String
  promptTokens    Int
  completionTokens Int
  totalTokens     Int
  cost      Float
  createdAt DateTime @default(now())

  user      User     @relation(fields: [userId], references: [id])
}

Tracking Usage

// After each AI call
await prisma.tokenUsage.create({
  data: {
    userId: session.user.id,
    model: "gpt-4-turbo-preview",
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
    totalTokens: response.usage.total_tokens,
    cost: calculateCost(response.usage, "gpt-4-turbo-preview"),
  },
});

Cost Calculation

const PRICING = {
  "gpt-4-turbo-preview": { input: 0.01, output: 0.03 },
  "gpt-3.5-turbo": { input: 0.0005, output: 0.0015 },
  "claude-3-opus": { input: 0.015, output: 0.075 },
  "claude-3-sonnet": { input: 0.003, output: 0.015 },
};

function calculateCost(usage, model) {
  const prices = PRICING[model];
  return (
    (usage.prompt_tokens / 1000) * prices.input +
    (usage.completion_tokens / 1000) * prices.output
  );
}

Deployment

Production Checklist

API Keys: Ensure all production API keys are set
Rate Limiting: Configure appropriate limits
Error Handling: Set up Sentry for AI errors
Caching: Enable Redis for response caching
Monitoring: Track token usage and costs

VPS Deployment

# Install dependencies
npm install --production

# Build
npm run build

# Start with PM2
pm2 start npm --name "aiforge" -- start

# Configure Nginx with increased timeouts for AI
location / {
    proxy_pass http://localhost:3000;
    proxy_read_timeout 300;  # 5 minutes for AI responses
    proxy_connect_timeout 300;
}

Environment Variables for Production

# Use production API keys
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."

# Production Qdrant
QDRANT_URL="https://your-cluster.qdrant.io:6333"
QDRANT_API_KEY="production-key"

# Enable caching
REDIS_URL="redis://localhost:6379"
ENABLE_RESPONSE_CACHING=true

Best Practices

1. API Key Security

Never commit API keys to Git
Use environment variables
Rotate keys regularly

2. Cost Control

Set user token limits
Monitor usage daily
Implement spending alerts

3. Error Handling

try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  if (error.code === 'rate_limit_exceeded') {
    // Fall back to cheaper model
    return useFallbackModel();
  }
  if (error.code === 'context_length_exceeded') {
    // Truncate context
    return retryWithShorterContext();
  }
  throw error;
}

4. Model Fallbacks

const MODEL_FALLBACKS = {
  "gpt-4-turbo-preview": "gpt-3.5-turbo",
  "claude-3-opus": "claude-3-sonnet",
};

Getting Help

Documentation: https://fastsaas.cloud/docs
Discord: Join our community
Email: support@fastsaas.cloud

Next Steps

Happy building with AI! 🤖