The past week I have seen two concepts for AI that need more exploration – late noise injection and 360 Facebook images.
AI Concepts
Late noise injection
Flux is (in my opinion) the best generative AI available at this point. It’s great when creating realistic-looking and weird fantasy images, though I have seen some people saying that it’s bad at copying the styles of famous artists. This is something that I believe to be intentional from Black Forest Labs since AI has gotten a lot of critique for “stealing” other people’s work.
How do we improve details in images created with Flux? As I’ve mentioned before, Flux uses the t-5 xxl text encoder, which works best on long and detailed prompts written in natural language.
The usual short prompt used for the CLIP ViT G 14 might look something like this.
An athletic and beautiful 35 year old steampunk woman caught cheating at a card game.
This short prompt used with Flux dev will result in the following image.
it’s not a bad image, but it’s not very detailed and steampunky.
To get a more detailed image, you have to provide a more detailed prompt. The expanded prompt could be something like this.
In a dimly lit, smoke-filled tavern adorned with intricate brass gears and flickering gas lamps, a strikingly beautiful 35-year-old woman sits at a round table, her athletic frame a sharp contrast to her elegantly tailored steampunk attire. Her rich auburn hair, pulled back with a leather band, showcases her delicate features, highlighted by a smattering of freckles across her sun-kissed cheeks. The air is thick with the musky scent of tobacco and the sweet fragrance of aged whiskey, a testament to the night's revelries.
As she leans forward, a mischievous glint dances in her emerald eyes, her fingers expertly shuffling a deck of ornate cards embellished with mechanical designs. The ambiance is tense but lively, filled with the low hum of conversation and the occasional rattle of dice. Her opponents, a group of rugged individuals dressed in dusty leather and copper accessories, are oblivious to her clever sleight of hand as she slips an extra card into her palm, a smirk playing on her lips.
Suddenly, the tall, broad-shouldered man seated opposite her leans in, his furrowed brow indicating suspicion. The atmosphere shifts from light-hearted competition to charged confrontation, as whispers of dishonesty ripple through the crowd. The soft clinking of coins comes to an abrupt halt, replaced by a prickling tension, as the woman realizes she has been caught in her deceit. The flickering candlelight casts shadowy patterns across her face, revealing a mixture of surprise and defiance as she prepares to defend her actions amidst a storm of accusations. The setting feels alive with anticipation, a silent battle of wits suspended in the air as the stakes rise dramatically in the fading light of the tavern.
I know, it’s not easy to come up with a whole story for every image you are generating. Luckily you don’t have to, because there are tools available for that. This for example: Flux Image Generator. Just put in whatever short prompt you want to use, and after a few seconds, you’ll have a mini-story ready.
Result from the prompt above:
Flux doesn’t natively support CFG (Classifier-Free Guidance) as previous models do, and are instead using their own type of Guider. However, using custom nodes it is possible to use CFG on the Flux models. Setting CFG will also allow the use of negative prompts (but I’m not sure how effective that is).
By adding CFG early on in the generation both intensity and details are enhanced, and the time it takes to generate an image almost doubles. A tradeoff is to add CFG early on and then switch it off and inject noise. This way the time it takes to generate an image is only slightly longer than without CGF.
Below you can see a few examples of images that use CFG compared to the same image without CFG. All the images have been created with the same seed and otherwise the same settings, except for adding CFG and late noise injection.
As you can see the CFG and noise injection affect different images in different ways. Playing around with the artificial CFG and Guidance scale can make a larger difference.
You can download the workflow I used for these images here: CFG and noise injection
Generate a 360 panoramic image for Facebook
This method will create an image that will tell Facebook that it’s a 360 panorama photo, making Facebook treat it as such. However, Facebook will greatly reduce the quality of the image. Maybe there is a better way to get this done, but none that I know of at the moment.
First, an image with an aspect ratio of 2:1 needs to be generated.
You need to download ExifTool or similar software to edit the meta information in the image. I’m using ExifTool so I will use that in my example.
Put your image in the same folder as ExifTool and give it a simple name. Then open the command prompt in the same folder you have ExifTool and your image.
Copy and paste the following command, changing the name of “yourimage.png” to whatever you named your image.
.\exiftool -ProjectionType="equirectangular" -UsePanoramaViewer="True" -"PoseHeadingDegrees<$exif:GPSImgDirection" -"CroppedAreaImageWidthPixels<$ImageWidth" -"CroppedAreaImageHeightPixels<$ImageHeight" -"FullPanoWidthPixels<$ImageWidth" -"FullPanoHeightPixels<$ImageHeight" -CroppedAreaLeftPixels="0" -CroppedAreaTopPixels="0" yourimage.png
When you hit enter it will create a new image with the name of your image, and rename the old image to “yourimage.png_original”. When you upload this new image to Facebook, you can look at it as if it were a 360 panorama photo. I found that the quality is slightly better when looking at the image on your phone and flipping the phone horizontally.