AI-Powered

Aurora

AI Content Creation Tool with RAG

An end-to-end content creation tool combining AI, document processing, and RAG for intelligent document analysis. Processes multiple formats (PDF, images with OCR, URLs, DOCX) into semantic embeddings, enabling AI-powered content generation grounded in uploaded source material.

Multi-format extraction (PDF, OCR, URLs, DOCX)

384-dim semantic embeddings via Sentence-Transformers

Source tracking with UUID for traceability

Key Features

  • Multi-format document extraction: PDF (pdfplumber), images with OCR (Tesseract + PaddleOCR), URLs, DOCX, plain text
  • Semantic embedding pipeline using Sentence-Transformers (all-MiniLM-L6-v2, 384-dim vectors)
  • Intelligent 500-token text chunking with UUID-based source tracking for traceability
  • URL content extraction with readability filtering (BeautifulSoup4 + readability-lxml)
  • Google Gemini API integration for AI-powered content generation
  • Semantic search across all uploaded documents for RAG retrieval

Tech Stack

FastAPINuxt 3Vue 3SupabaseSentence-TransformersGoogle Gemini APIpdfplumberTesseract OCRPaddleOCRBeautifulSoup4