LLM Exploration, Workstation Cooling, and Knowledge Base Plans

2025-08-22
3 minute read

This recording, made quite late, reflects my recent deep dive into AI and large language models (LLMs). While it might be a distraction from other tasks, I've learned a great deal about setting up and working with these systems.

LLM Exploration and Tooling

I've been focusing on the practical aspects of LLMs, including quantization—how to quantize models myself and what it entails. I've also looked into fine-tuning, which I haven't done yet, but I see its potential utility in certain scenarios. A significant area of interest has been how models invoke and use tools.

I'm currently using Open Web UI as an interface to my Ollama instance, keeping all models within Ollama. I've successfully converted some models not natively supported by Ollama, such as a quantized version of GPT-OSS, which I'm eager to test, especially for coding tasks. I anticipate needing different models for different types of work.

I also developed a tool to integrate Kagi into Open Web UI, which is now working well after resolving some type-hinting issues. I've added a couple of tools: one for general thinking (like Kagi) and another for web browsing. I plan to develop a more robust web browser API tool, allowing the assistant to navigate websites, render pages, interact with HTML, and click on elements. This capability is exciting but also a bit daunting, as it grants the model more control, even though current models aren't always perfectly reliable. It could enable more human-like interactions, but security measures like CAPTCHAs will likely be prevalent, though perhaps ineffective against such advanced browser tools.

Workstation Power and Cooling

My computer's new fan controller is functioning well. I've observed that under heavy GPU load for a few minutes (e.g., 100% utilization for 2-3 minutes), the GPU temperature can reach 80-85 degrees Celsius. However, it quickly drops back to around 50 degrees within 30 seconds once the load is removed. This rapid cooling is encouraging.

I don't expect prolonged CPU loads, but if I start training models, it could become an issue. The airflow has significantly improved. I'm wondering if the fan control, which is managed by the motherboard, is fully aware of the GPU's temperature or if it primarily reacts to CPU temperature. This is something I need to investigate further, as better coordination could optimize cooling.

Knowledge Base and RAG System Development

I've started transferring some of my voice recordings, not just for my website, but also into Open Web UI to build a knowledge-based system, essentially a RAG (Retrieval Augmented Generation) system. It appears to be working, but it currently limits responses to three, which is quite restrictive.

I'm contemplating building a more sophisticated system. My intention is for it to go beyond just retrieving the first relevant document. When searching, it should identify documents, extract concepts from them, and then use a graph database to find related edges and other pertinent documents, which would then be indexed. This more complex pipeline would allow for deeper and more comprehensive information retrieval. However, this is a significant undertaking, and for now, the current system of recording, transcribing, and storing transcripts is sufficient.

Personal Workflow for Recordings

My current workflow involves making recordings and then transcribing them. I'm considering an alternative: directly uploading the recordings to the Open Web UI's AI to see if it can extract information and store it in the knowledge base, similar to my manual process. This could streamline the entire process and integrate it more tightly with my LLM tools. It's an interesting possibility worth exploring.