ChatGPT took the world by storm in 2023 but now we're getting to a point where AI-powered chat applications can be installed and run locally, without needing an online connection. Nvidia is bringing offline AI chatting to millions of RTX powered PCs today with the launch of Chat with RTX.
Chat with RTX was first announced in January. The app uses Retrieval Augmented Generation (RAG), in conjunction with TensorRT-LLM, which users can customise by granting access to specified files and folders. Running locally, Chat with RTX can pull information from text, PDF, doc and XML files, so you can swiftly have the app retrieve data for you. There is also a YouTube transcript function, so you can paste in a YouTube playlist and have the app generate a local transcript. You can then ask the bot questions about the content to sift through what you need.
I've had access to Chat with RTX for a few days already. The initial download is substantial, coming in at over 30GB, and the final install size can be as high as 100GB, depending on the AI models you have installed. To start off with, Chat with RTX will have access to two AI models, Mistral and Llama. The former is a model created by ex Meta and Google employees, while the latter is a model created and released by Meta.
Installing the app takes some time. On my system with a Ryzen 9 5900X and RTX 4080, the install took around 20 minutes, with the LLM install taking the longest amount of time. The install time can take as much as one hour depending on your internet connection and hardware. After the installer wraps up though, loading up the app takes very little time and works impressively quick considering it is running locally.
As the app runs offline, you don't run the risk of exposing your sensitive data online, and you have greater control over what the AI has access to and can pull from. In its base form, Chat with RTX only has access to the RAG folder it comes with, so it can answer some basic questions regarding Nvidia RTX products and features, but won't be able to go beyond that. With that in mind, you'll want to point the app to your own dataset folder.
The most obvious use case to me is as an office assistant, being able to quickly bring up information needed for a task without having to do a manual search. For editors, being able to pull in YouTube transcripts accurately will also save a lot of time. Transcribing a single video takes less than a minute, but as this portion of the app does require contacting servers, the speed may vary depending on how many active users there are, as well as your own internet connection. It will be interesting to see if there is any noticeable slowdown now that the app is widely available for all users.
For local data processing, Chat with RTX is not meant to be a ChatGPT replacement. The app works by breaking down data into chunks, which are then selected based on relevance to your query. So it works well for quickly finding specific facts present in documents, but it doesn't work well for tasks that require ‘reasoning', like summarising a set of documents into key bullet points.
Chat with RTX also does not currently have the ability to remember context, so you won't be able to ask follow-up questions based on your original query. Whether or not this functionality will change over time remains to be seen, but running machine learning and massive data sets locally isn't exactly feasible for the majority of users.
Chat with RTX can be downloaded directly from the Nvidia website. You will need an RTX 3000 or RTX 4000 series graphics card to run the application.
Discuss on our Facebook page, HERE.
KitGuru Says: Chat with RTX is interesting in its current form but now that it is out in the wild, we will be keen on seeing what AI enthusiasts come up with as they aim to get the most out of this new tool.