import * as fs from 'fs'; import { createReadStream, writeFile } from 'fs'; import OpenAI from 'openai'; import * as path from 'path'; import { promisify } from 'util'; import * as uuid from 'uuid'; import { filesDirectory, publicDirectory } from '../SocketData'; import { Method } from '../RouteManager'; import ApiManager, { Registration } from './ApiManager'; import axios from 'axios'; import { RAGChunk } from '../../client/views/nodes/chatbot/types/types'; import { UnstructuredClient } from 'unstructured-client'; import { PartitionResponse } from 'unstructured-client/sdk/models/operations'; import { ChunkingStrategy, Strategy } from 'unstructured-client/sdk/models/shared'; import * as cheerio from 'cheerio'; import { google } from 'googleapis'; import * as puppeteer from 'puppeteer'; import { JSDOM } from 'jsdom'; import { Readability } from '@mozilla/readability'; // Enumeration of directories where different file types are stored export enum Directory { parsed_files = 'parsed_files', images = 'images', videos = 'videos', pdfs = 'pdfs', text = 'text', pdf_thumbnails = 'pdf_thumbnails', audio = 'audio', csv = 'csv', chunk_images = 'chunk_images', scrape_images = 'scrape_images', } /** * Constructs a normalized path to a file in the server's file system. * @param directory The directory where the file is stored. * @param filename The name of the file. * @returns The full normalized path to the file. */ export function serverPathToFile(directory: Directory, filename: string) { return path.normalize(`${filesDirectory}/${directory}/${filename}`); } /** * Constructs a normalized path to a directory in the server's file system. * @param directory The directory to access. * @returns The full normalized path to the directory. */ export function pathToDirectory(directory: Directory) { return path.normalize(`${filesDirectory}/${directory}`); } /** * Constructs the client-accessible URL for a file. * @param directory The directory where the file is stored. * @param filename The name of the file. * @returns The URL path to the file. */ export function clientPathToFile(directory: Directory, filename: string) { return `/files/${directory}/${filename}`; } // Promisified versions of filesystem functions const writeFileAsync = promisify(writeFile); const readFileAsync = promisify(fs.readFile); /** * Class responsible for handling various API routes related to the Assistant functionality. * This class extends `ApiManager` and handles registration of routes and secure request handlers. */ export default class AssistantManager extends ApiManager { /** * Registers all API routes and initializes necessary services like OpenAI and Google Custom Search. * @param register The registration method to register routes and handlers. */ protected initialize(register: Registration): void { // Initialize OpenAI API with client key const openai = new OpenAI({ apiKey: process.env._CLIENT_OPENAI_KEY, dangerouslyAllowBrowser: true, }); // Initialize Google Custom Search API const customsearch = google.customsearch('v1'); // Register Wikipedia summary API route register({ method: Method.POST, subscription: '/getWikipediaSummary', secureHandler: async ({ req, res }) => { const { title } = req.body; try { // Fetch summary from Wikipedia using axios const response = await axios.get('https://en.wikipedia.org/w/api.php', { params: { action: 'query', list: 'search', srsearch: title, format: 'json', }, }); const summary = response.data.query.search[0]?.snippet || 'No article found with that title.'; res.send({ text: summary }); } catch (error: any) { console.error('Error retrieving Wikipedia summary:', error); res.status(500).send({ error: 'Error retrieving article summary from Wikipedia.', details: error.message, }); } }, }); // Register Google Web Search Results API route register({ method: Method.POST, subscription: '/getWebSearchResults', secureHandler: async ({ req, res }) => { const { query, max_results } = req.body; try { // Fetch search results using Google Custom Search API const response = await customsearch.cse.list({ q: query, cx: process.env._CLIENT_GOOGLE_SEARCH_ENGINE_ID, key: process.env._CLIENT_GOOGLE_API_KEY, safe: 'active', num: max_results, }); const results = response.data.items?.map((item: any) => ({ url: item.link, snippet: item.snippet, })) || []; res.send({ results }); } catch (error: any) { console.error('Error performing web search:', error); res.status(500).send({ error: 'Failed to perform web search', details: error.message, }); } }, }); // Axios instance with custom headers for scraping const axiosInstance = axios.create({ headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', }, }); /** * Utility function to introduce delay (used for retries). * @param ms Delay in milliseconds. */ const delay = (ms: number) => new Promise(resolve => setTimeout(resolve, ms)); /** * Function to fetch a URL with retry logic, handling rate limits. * Retries a request if it fails due to rate limits (HTTP status 429). * @param url The URL to fetch. * @param retries The number of retry attempts. * @param backoff Initial backoff time in milliseconds. */ const fetchWithRetry = async (url: string, retries = 3, backoff = 300) => { try { const response = await axiosInstance.get(url); return response.data; } catch (error: any) { if (retries > 0 && error.response?.status === 429) { console.log(`Rate limited. Retrying in ${backoff}ms...`); await delay(backoff); return fetchWithRetry(url, retries - 1, backoff * 2); } throw error; } }; // Register a proxy fetch API route register({ method: Method.POST, subscription: '/proxyFetch', secureHandler: async ({ req, res }) => { const { url } = req.body; if (!url) { res.status(400).send({ error: 'No URL provided' }); return; } try { const data = await fetchWithRetry(url); res.send({ data }); } catch (error: any) { console.error('Error fetching the URL:', error); res.status(500).send({ error: 'Failed to fetch the URL', details: error.message, }); } }, }); // Register an API route to scrape website content using Puppeteer and JSDOM register({ method: Method.POST, subscription: '/scrapeWebsite', secureHandler: async ({ req, res }) => { const { url } = req.body; try { // Launch Puppeteer browser to navigate to the webpage const browser = await puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox'], }); const page = await browser.newPage(); await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'); await page.goto(url, { waitUntil: 'networkidle2' }); // Extract HTML content const htmlContent = await page.content(); await browser.close(); // Parse HTML content using JSDOM const dom = new JSDOM(htmlContent, { url }); // Extract readable content using Mozilla's Readability API const reader = new Readability(dom.window.document); const article = reader.parse(); if (article) { const plainText = article.textContent; res.send({ website_plain_text: plainText }); } else { res.status(500).send({ error: 'Failed to extract readable content' }); } } catch (error: any) { console.error('Error scraping website:', error); res.status(500).send({ error: 'Failed to scrape website', details: error.message, }); } }, }); // Register an API route to create documents by sending files to a chatbot register({ method: Method.POST, subscription: '/createDocument', secureHandler: async ({ req, res }) => { const { file_path } = req.body; const public_path = path.join(publicDirectory, file_path); // Resolve the file path in the public directory const file_name = path.basename(file_path); // Extract the file name from the path try { // Read the file data and encode it as base64 const file_data: string = fs.readFileSync(public_path, { encoding: 'base64' }); // Send the file data to a local chatbot API for document creation const response = await axios.post( 'http://localhost:8080/createDocument', { file_data, file_name, }, { headers: { 'Content-Type': 'application/json', }, } ); // Retrieve the job ID from the response const jobId = response.data['job_id']; console.log('Job ID:', jobId); // Send the job ID back to the client res.send({ jobId }); } catch (error: any) { console.error('Error communicating with chatbot:', error); res.status(500).send({ error: 'Failed to communicate with the chatbot', details: error.message, }); } }, }); // Register an API route to check the progress of a document creation job register({ method: Method.GET, subscription: '/getProgress/:jobId', secureHandler: async ({ req, res }) => { const { jobId } = req.params; // Get the job ID from the URL parameters try { // Query the local API to get the progress of the job const progressResponse = await axios.get(`http://localhost:8080/getProgress/${jobId}`); console.log(`Current step: ${progressResponse.data.step}, Progress within step: ${progressResponse.data.progress}%`); res.json(progressResponse.data); // Send the progress data back to the client } catch (error) { console.error('Error getting progress:', error); res.status(500).send({ error: 'Failed to get progress', details: error, }); } }, }); // Register an API route to get the final result of a document creation job register({ method: Method.GET, subscription: '/getResult/:jobId', secureHandler: async ({ req, res }) => { const { jobId } = req.params; // Get the job ID from the URL parameters try { // Query the local API to get the final result of the job const finalResponse = await axios.get(`http://localhost:8080/getResult/${jobId}`); console.log('Result:', finalResponse.data); const result = finalResponse.data; // If the result contains image or table chunks, save the base64 data as image files if (result.chunks && Array.isArray(result.chunks)) { for (const chunk of result.chunks) { if (chunk.metadata && (chunk.metadata.type === 'image' || chunk.metadata.type === 'table')) { let files_directory = '/files/chunk_images/'; const directory = path.join(publicDirectory, files_directory); // Ensure the directory exists or create it if (!fs.existsSync(directory)) { fs.mkdirSync(directory); } const fileName = path.basename(chunk.metadata.file_path); // Get the file name from the path const filePath = path.join(directory, fileName); // Create the full file path // Check if the chunk contains base64 encoded data if (chunk.metadata.base64_data) { // Decode the base64 data and write it to a file const buffer = Buffer.from(chunk.metadata.base64_data, 'base64'); await fs.promises.writeFile(filePath, buffer); // Update the file path in the chunk's metadata chunk.metadata.file_path = path.join(files_directory, fileName); chunk.metadata.base64_data = undefined; // Remove the base64 data from the metadata } else { console.warn(`No base64_data found for chunk: ${fileName}`); } } } result['status'] = 'completed'; } else { console.warn('Not ready'); result.status = 'pending'; } res.json(result); // Send the result back to the client } catch (error) { console.error('Error getting result:', error); res.status(500).send({ error: 'Failed to get result', details: error, }); } }, }); // Register an API route to format chunks (e.g., text or image chunks) for display register({ method: Method.POST, subscription: '/formatChunks', secureHandler: async ({ req, res }) => { const { relevantChunks } = req.body; // Get the relevant chunks from the request body // Initialize an array to hold the formatted content const content: { type: string; text?: string; image_url?: { url: string } }[] = [{ type: 'text', text: '' }]; for (const chunk of relevantChunks) { // Format each chunk by adding its metadata and content content.push({ type: 'text', text: ``, }); // If the chunk is an image or table, read the corresponding file and encode it as base64 if (chunk.metadata.type === 'image' || chunk.metadata.type === 'table') { try { const filePath = serverPathToFile(Directory.chunk_images, chunk.metadata.file_path); // Get the file path const imageBuffer = await readFileAsync(filePath); // Read the image file const base64Image = imageBuffer.toString('base64'); // Convert the image to base64 // Add the base64-encoded image to the content array if (base64Image) { content.push({ type: 'image_url', image_url: { url: `data:image/jpeg;base64,${base64Image}`, }, }); } else { console.log(`Failed to encode image for chunk ${chunk.id}`); } } catch (error) { console.error(`Error reading image file for chunk ${chunk.id}:`, error); } } // Add the chunk's text content to the formatted content content.push({ type: 'text', text: `${chunk.metadata.text}\n\n` }); } content.push({ type: 'text', text: '' }); // Send the formatted content back to the client res.send({ formattedChunks: content }); }, }); // Register an API route to create and save a CSV file on the server register({ method: Method.POST, subscription: '/createCSV', secureHandler: async ({ req, res }) => { const { filename, data } = req.body; // Validate that both the filename and data are provided if (!filename || !data) { res.status(400).send({ error: 'Filename and data fields are required.' }); return; } try { // Generate a UUID for the file to ensure unique naming const uuidv4 = uuid.v4(); const fullFilename = `${uuidv4}-${filename}`; // Prefix the file name with the UUID // Get the full server path where the file will be saved const serverFilePath = serverPathToFile(Directory.csv, fullFilename); // Write the CSV data (which is a raw string) to the file await writeFileAsync(serverFilePath, data, 'utf8'); // Construct the client-accessible URL for the file const fileUrl = clientPathToFile(Directory.csv, fullFilename); // Send the file URL and UUID back to the client res.send({ fileUrl, id: uuidv4 }); } catch (error: any) { console.error('Error creating CSV file:', error); res.status(500).send({ error: 'Failed to create CSV file.', details: error.message, }); } }, }); } }