AI Dev Journey v0.1.0

All right. It’s been a minute since I’ve been chatting with y’all. Obviously the politics stuff is what it is. Feel free to wander around that part of this site or visit Escaping Dystopia if you’re looking for that kind of stuff. This here website is now converting do a developer blog.

Why? Well…I’m a developer, I guess. I write code. Lately I’ve been writing quite a bit of code. With some help.

That journey has been an absolute blast. I’ve had more fun coding in the last few months than I’ve had in years. And been more productive. This blog (now) is about that journey. Some of the issues I’ve had and overcome, some I still face. All the stuff I’ve been learning and still am learning.

Programming around non-deterministic functions is an interesting task. The power, magic, and basic function of a computer is to produce the same output from the same input a billion billion times in a row (per second, if need be). Absolute precision and perfection. That’s what they *do*. And they do it well.

Programming with and around AI is a bit different. Before I go too much further, check out my app:

https://haikuthenews.io/

Haiku the News.

I wanted to share that with you so you can understand that I ain’t just whistling Dixie here, this thing is live. I’ve waited to start this next iteration of my nom-de-website until I had something I could show you.

Some you could see and touch and…maybe…feel. It makes so much more sense with examples, and I’ll be sharing a lot of them with you.

My application as it stands WAS NOT POSSIBLE A YEAR AGO. It wasn’t possible before Chat-GPT-4-omni was released, and now Claude 3.5 has similar capabilities.

GPT-4o was released on May 13th, 2024. Some time that week I began playing with their cookbook examples: https://cookbook.openai.com/

My application was released and (somewhat) functional in a live production environment on June 30, 2024. I’ve been coding my ass off. I’ve had some help. Very little of it human (echoes from Stack Overflow when 4o and Claude get confused). My robot assistants have been whirring.

To be clear, my application is basically just a wrapper around a number of custom functions, not unlike those you’ll find in the cookbook. To also be clear, it’s freaking a blast to play with. And creating and dropping new functions is a less than a day task. I’ll be adding many more, it’s too fun to avoid.

A you work through the examples here: https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o, you’ll see that “multi-modal” really only takes two kinds of input, text and images. Audio is translated to text prior to submission, and Video is cut into screenshots. Two inputs, text/image. One output, text or image. This is the key to stacking functions. Any problem that can be broken down into those data types can be processed efficiently.

In my case, we are utilizing “screen scraping” or “uploading a picture” capability to provide the image soure, and the text prompts are relative to the function being applied. Two inputs, near infinite capabilities, as outputs can be serialized. For example, we are using the outputs of the “write me a poem summarizing this article” as the input for “draw me a picture of this poem.”

To do so, my application is utilizing the drawing capabilities of Dalle-3. I’ve been using generation AI art applications for a while now, having done several tens of thousands of images and a couple books with Stable Diffusion, I also did one with Dalle-3 earlier this year (the RobotPirateNinja Coloring Book…to be continued).

I’ve found Dalle-3 to be a lot of fun, capable or a lot of variety, and fairly responsive to instruction.

Generative art with AI is fun. It can also be tedious and annoying, as you get *soooo close*, but can’t quite get there. Writing prompts is exhausting. Haiku the News takes a slightly different approach. Here, we are using the near infinite possibilities of text prompts created by AI to create relevant but deliciously random art possible. When you translate an article into emoji, and that emoji translation into art, you get some things, ya know?

BTW, GPT-4o speaks emoji fluently. I realized this while testing, and built a full function around the idea.

But I’m not married to *any* of these models. One of the very fun and excited parts about building AI application is the potential for swapping out models easily. When built correctly, the call to the backend model is going to agnostic, it won’t matter to whom you pray for a response, they will hear you (they will, of course, respond at their own discretion).

This also allows for the integration of other types of models (i.e. uncensored) running on their own dedicated hardware.

I experimented a good bit with LM Studio early on: https://lmstudio.ai/

I would HIGHLY recommend it for anyone interested in this stuff. While I learned fairly quickly that the local models couldn’t match something of Omni or Claude’s capabilities, they are more than capable for many tasks (including uncensored adult conversations that commercial applications simply will not entertain). I will be sharing a repo for my “Voice GPT” application that I plan to open source soon. It turns your computer into the “computer” from the “futuristic” TV show Star Trek. Talking to a local model running on your own machine in your (and it’s) own voice is a new human experience. Again, not really possible a few years ago.

But it should be free to all with the compute power, so I’ll try and help that along a bit.

We will definitely be off-loading some of the easier analysis tasks to specialized “local” models at some point, but for the current road-map (6mos-1yr) we are likely to stay with the big dogs. But who knows, maybe a ripping multi-modal Llama 4 will be dropped before anyone knows it.

Ok, this post is getting long and I haven’t even said anything yet, so we’ll cut it off here.

tl;dr: developing with AI is fast and fun. Developing AI applications with AI assistants while building AI assistant applications has levels of recursive enjoyment not possible only a short while ago. AI driven-dev is like freaking mainlining “V” (from either “True Blood” or “the Boys”, same diff…super-hero powers on demand) whiling pulling off Hugh Jackman in Swordfish level programming challenges. Quite the ride. Talk to you soon.

AI Dev Journey v0.1.0

Share this: