Skip to main content

2 posts tagged with "projects"

View All Tags

Accelerating AI Development with ChainLit OAI Assistants Template (COAT)

· 8 min read
Michael Wade
Founder, Wade Digital

A digital artwork representing the fusion of AI and development workflows, inspired by The Culture series. The image features futuristic elements like advanced AI interfaces, sleek cloud servers, and interconnected networks. There are visual representations of AI agents interacting with users, code snippets, and user interface icons. The background includes a gradient of blues and purples, evoking a sense of depth and forward motion, with subtle nods to The Culture's utopian technology and societal harmony. The overall theme is innovative and forward-thinking, highlighting the integration of cutting-edge technology and collaborative development.

Introduction to COAT

Since December of last year, we've been using ChainLit and OAI's Assistants API to build our LLM-enabled AiFGE app. Given the anticipated demand for boutique app studios in the coming months, we created a template version for easy deployment and scaling for clients. We've rebuilt the repository from scratch and made it public on GitHub.

ChainLit offers an excellent backend/frontend system for managing user and AI agent interactions. It handles user sessions with various OAuth providers and integrates with several popular model providers. Essentially, it's the closest thing to a hostable ChatGPT application we've found. Being model-agnostic, ChainLit allows you to build a local, self-sovereign Agent with thread history, file upload, and support for multiple users. Additionally, the ChainLit team developed Literal AI, a persistence, observability, and feedback platform integrated directly into ChainLit. They have also released an open-source, self-hostable data layer.

While locally hosted, on-device models might become the norm, currently, OAI is the best option for small teams. Since we haven't found Product-Market Fit (PMF) yet, it's not sensible to worry about the expense of hosting GPU/TPU time for any FOSS model we can deploy. Token costs are low, and we can always pivot later.

We decided to deploy our apps on Google's Cloud Platform as Docker-based CloudRun images, using shared Terraform state to manage our staging and production deployments. These scripts are included as well.

Technical Insights

Over the past year, we've struggled to find a suitable hosting platform for agent configurations and deployments. Sure, there's LangChain, LlamaIndex, and frameworks like Haystack and AutoGen that might fit the bill, but a year ago, there weren't many clear standout options for scalable application delivery. Until November, we were using Steamship's agent deployment platform to interact with OAI's Chat API, but upon seeing the Assistants beta release, we switched completely. I am a glutton for punishment, apparently, but that's the cost of being at the forefront of frontier technologies.

OAI Assistants

The OAI Assistants beta's killer feature for me was the integrated file search. Having all the file ingestion, embedding, and vector database details abstracted away was very attractive initially, but it has become a bit of a type-driven challenge as we've dealt with errors, missing data, and general issues while incorporating proper annotations and citations into the RAG-results. The initial implementation was somewhat passable, but keeping in mind that this tech is only going to get better, the V2 release data stores will be very interesting to explore. While the V1 retrieval search was limited to 10 files, the V2 allows up to 10,000.

The Assistants API is also interesting because it decouples the agent from the context of a thread. A quick recap on the evolution of OpenAI's endpoints:

  • Completions: A one-shot, stateless call. For a conversation app, you needed to resend the entire message history for each call, which inflated the token count and ran up against the limited window sizes on GPT-3 (recall those measly 4000 token limits only a year ago?).
  • Chat: This is the current model that I imagine most people are using. I believe this API is available for Azure customers and what most enterprises are building on right now, sans Google Gemini or whatever Amazon calls theirs. This API still requires devs to prompt the model and has some tool/function capabilities, but it abstracts the management of the conversation window, aka threads, from the developer, who can focus on prompt engineering and front-end work. RAG-pipelines still need an external provider.
  • Assistants: In addition to managing the thread, the Assistants endpoint allows you to save agent configurations, each with their own instructions and defined tools, including file retrieval, code completion, and functions. It also lets you call different agents into the same thread, enabling some interesting use cases in multi-agent workflows, which we are just starting to explore. (See AgencySwarm for my current favorite implementation of this.)

And a note about GPT-o: if you don't think we're going to incorporate all those multimodal capabilities in the coming months, then just wait.

ChainLit

So, on to the frontend. As someone who's been scared off of FE work since the great Web Standards wars of yore, finding an off-the-shelf front-end system for AI agents was a priority. That's one reason I was building on Discord. We did a little bit of testing with StreamLit examples but quickly saw that ChainLit was the way to go for production deployments.

  • Websocket React client components available for custom frontend
  • Bundled React Frontend
  • REST API (FastAPI)
  • Authentication (OAuth, custom)
  • Websocket Session Handling (Starlette)
  • Data Persistence (Literal, self-hosted custom)
  • Discord and Slack integrations

Understanding ChainLit's event wrappers and context decorators, how threads, messages, and elements (attachments), and run steps (agent invocations) are sent to the UX and updated is crucial. I'm not going to get into the how-to of it here, but I have learned a lot from working through their source code to understand how the framework operates.

ChainLit has a wonderful cookbook that this repo started from, using their example of the Assistants code. We've spent the last two weeks converting it from the poll-driven function to an event-driven one that streams its response. We decided this would be a good milestone to open up the repo and give something back to the ChainLit community, and yes, to show off our work, warts and all.

I am releasing COAT for demonstration purposes, not because I think it's perfect or finished, but because it is a work in progress, and I feel the need to learn and build in public. We have a lot of things we want to do with the Assistants API, and part of our purpose here is recruitment, or 'finding the others' interested in helping out with our little workshop studio project.

Cheers!

Future Visions

A quick aside about the debate ongoing around AI-doomers vs e/acc-Altman types: I'm fully open to criticism of power usage, resource constraints, bias, mass unemployment, ethical concerns around the training process, and general chaos and disruption that the next 3-5 years are going to bring, but The Times They Are A'Changing and I honestly don't think there's a way to turn it around without AI. My current take is that humans are so driven by the systems that we create around us, consciously or not (h/t Atomic Habits, Principles), that having a 'nudge' in the right direction from an intelligent system is too attractive to pass up. 'Right' in this case might be in the eyes of the beholder, but it is apparent that what our genes and evolutionary inheritance want from us short-term and what is best for us long-term are at odds, and we need some sort of interrupt. My Whoop fitness tracker is the ne plus ultra example right now. Of course, I don't want to see OAI or the other 'bigs' win at the cost of more corporate personhood and environmental degradation, but for better or worse, I think the only way out of that future is through it, and working like hell to be an optimist. These tools are knowledge-generating things, not just intelligent word calculators, and I think the flywheel of technological advancement still has a lot of kinetic energy in it right now. Try and keep up.

But that topic is for another blog post, and part of an AI-ethics project we are putting together. Today we are just glad to announce this release, hoping it might be helpful for others trying to learn, understand, and build agentic workflow systems. Feel free to reach out with any comments.

Right now, you should be able to fork the project and spin it up for a quick demo. We're hoping that anyone else building on Assistants will help build out additional features. Our current backlog, beyond the public issue board, is to build a multi-assistant system, where each assistant has workflow-based tools, tied to whatever API we can dream up. I would personally like a setup that can manage my grocery orders for me, but figuring out strategies for incorporating my various libraries of personal writings and downloaded knowledge into a coherent system is what's really driving me these days. I am also very focused on meta-tools that the agents can use to modify source code (Copilot is nice, and Workspaces looks even better); getting tools to the Assistants that will allow them to manage user context, issue boards, and save messages and entire threads to its data stores; building ingestion pipelines to learn books, PDFs, URLs, audio, video, anything. And then figuring out how to delegate and manage all these tasks and workflows in a way that we can keep the human in the loop.

The future is going to be mighty interesting, indeed.

LLM Assisted YouTube Video Pipeline

· 3 min read
Michael Wade
Founder, Wade Digital

Header Image We have released v0.1 of our YouTube Video Pipeline repo.

Demo Video

Use and Purpose

This repo is a snapshot of the current workflow we're using to publish impromptu videos to our YouTube channel as part of an exercise in discipline. It's called 100 Days Challenge, and is mainly an excuse for me to spitball topics that I want to talk about. I'm trying to make the process as frictionless as possible, and this is the result.

The current constraints that we have for the project is that we'll record a short video, about 10-15 minutes on my iPhone, and upload it to YouTube without any edits. I am however, using a transcript of the video, fed through an LLM, to provide the title, summary, and other metadata about the video.

This repo relies on a separate WhisperASR service to do the transcription, but you can probably use whatever third-party transcription engine you want.

Workflow

I had originally been doing the process via a kludge of: Airdrop from my iPhone to Macbook; the Obsidian transcription plugin; an ASR-endpoint in my lab; ChatGPT; and the YouTube web app. I would like to automate the entire process. So far we've writen automator files to strip the audio before passing it to the ASR (it's less bandwidth intensive to strip it) and returning the transcript file. From there we swapped out ChatGPT for an API call using CrewAI, which is the latest Agent-Task management tool we've been playing with. We've also added a python-upload CLI tool, written in Python, that we've hacked into today.

There is still a lot of room for improvement here, and we'll be making additional tweaks as we continue to use it over the next eighty days or so. We're releasing it here as-is, and have no plans to support it or maintain it at all.

We will be prioritizing any friction we run into with the model, and will likely be making improvements to the CrewAI loop to make sure it's passing the proper data to the YouTubeUploadTool, so that we might make it a proper tool which we can fork back to CrewAI.