A Wider Perspective

Generative AI has a tendency to generate close up images of the subject, even if you wanted a wider perspective. And if you do manage to generate a subject a bit further away the quality suffers greatly, and especially faces tend to get disfigured.

I’m sure most of us have gotten our fair share of these types of images, at least when we were new at generative AI.

Image perspective

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

I have written about outpainting before so I will not go into details here.

One of the issues, which is inherent, is the limitations of how large images can be generated in one go (i.e. without upscaling or using other tools). SDXL generates images at 1024×1024 px, and variations of that (usually the px adding up to 2048 px in various aspect ratios). If you are to create an image where your subject is a bit further away from the “camera” under these circumstances, the face will most likely get distorted in some way.

One of the solutions is outpainting, and just start creating the subject in high resolution with good quality and then build the rest of the image around the subject. I’m going to use a few examples to visualize the process, and why it’s good.

Take the following image for instance, it has a resolution of 2555 x 3278 px (280% larger than SDXL can generate in one go). The woman is place relatively far away from the viewer, at least compared to when you generate images with a defined subject in SDXL. And the face of the woman is still clear and well defined, even when you zoom in on it.

The original image looked like this, and had a resolution of 896 x 1152 px (2048 px all together). But the image is missing context and depth, so to speak. The image above shows an environment and a setting that kind of triggers the imagination. This image, even though the face is pretty, is quite frankly boring. The top image can’t be generated as it is right now, even though rumors says that PixArt can generate images in 4k resolution in one go, something I haven’t tried myself yet.

However, if we could generate the top image in one go (with SDXL) we would see the distorted and disfigured faces. And creating a miniature image that fits within 1024 x 1024 px and then upscaling it to 4k will also give other problems. The first one would be to force SDXL to generate the subject far away from the viewer (“far away” as in the large image above). It’s very hard, and if you do manage to force SDXL to comply, the face will be distorted.

So the larger images is 286% bigger than the initial image. If we were to equalize the differences and put the faces next to each other it would look like this.

Independent research like this is self-funded. If this guide saved you hours of troubleshooting, consider fueling the lab.

Support the Project

And zoomed in it looks like this. Basically there’s no difference.

This is an image I created not long ago, and it’s completely unrelated but can work as an example. You can clearly tell that the faces are deformed, and just upscaling would not fix any of that.

The image above has the resolution 1304 x 1152 and is 20% larger than the initial image, and equalizing the difference and putting the faces next to each other would look like this.

And zoomed in on the faces the differences will be extreme.

You could argue that the poor quality face was poor quality from start, and that’s my point. If you try to generate images with the subject too far away from the viewer, the face will have poor quality. Which leaves us with only one way to create beautiful and creating images (involving people at least) without a close-up shot of the subject . And that’s outpainting.

So if you haven’t tried outpainting yet, you really should. If for no other reason, do it in order to not be the guy that only can generate the same image that everyone else does over and over.

Below you can see a selection of my page at CivitAI and as you can see, every image containing a person is either a close up shot on the face, a close up shot from waist up or a full body shot with the person standing as close as possible to the viewer (almost exclusively). There are a few exceptions, but they are few.

And this is what it looks at most AI artists pages, if they focus on creating images of humans.

All the AI related work I do, I do on my spare time and most of it I share with the world completely free of charge. It does take up a lot of my time, as well as the cost for running this website. I’m grateful for every bit of support I can get from users like you.

A Wider Perspective

Image perspective

Creepybits Newsletter

Thank you!

Fueling Independent Research