AI-Powered
Aurora
AI Content Creation Tool with RAG
An end-to-end content creation tool combining AI, document processing, and RAG for intelligent document analysis. Processes multiple formats (PDF, images with OCR, URLs, DOCX) into semantic embeddings, enabling AI-powered content generation grounded in uploaded source material.
Multi-format extraction (PDF, OCR, URLs, DOCX)
384-dim semantic embeddings via Sentence-Transformers
Source tracking with UUID for traceability
Key Features
- Multi-format document extraction: PDF (pdfplumber), images with OCR (Tesseract + PaddleOCR), URLs, DOCX, plain text
- Semantic embedding pipeline using Sentence-Transformers (all-MiniLM-L6-v2, 384-dim vectors)
- Intelligent 500-token text chunking with UUID-based source tracking for traceability
- URL content extraction with readability filtering (BeautifulSoup4 + readability-lxml)
- Google Gemini API integration for AI-powered content generation
- Semantic search across all uploaded documents for RAG retrieval
Tech Stack
FastAPINuxt 3Vue 3SupabaseSentence-TransformersGoogle Gemini APIpdfplumberTesseract OCRPaddleOCRBeautifulSoup4