Midjourney 6 rendering of Chewbacca relaxing in a hot tub out in winter

AI Image Generators: Midjourney v6 vs DALL-E v3

Dave Gibson

December 28, 2023

In the past few months I’ve been testing AI image generators and have begun using them to create visuals for both fun and marketing. I love this new ability to envision a custom image that compliments the topic that I’m writing about. And with the recent advance of MidJourney’s version 6, AI image generation has taken yet another leap forward. After enjoying a holiday “break” during which I dove headlong into this new update to Midjourney, I thought I’d share my take on where AI image generation now stands, and compare Midjourney to it's closest compatitor, DALL-E 3.

To set the table, the primary field of players includes Stable Diffusion, Adobe Firefly v2, DALL-E v3, and the new Midjourney v6.

But before I dive into the models, allow me to start with a bit of context. AI image generation blew up this year, and we’re now getting models that are capable of rivaling custom photography. As a photographer, yes, this is disheartening, but as a marketer that creates content, it is very exciting. This is especially good for those that need images of minorities, for which stock photography is quite limited - especially of people with disabilities.

Each model combines machine learning and creative expression to create images from written prompts. Each is far from perfect, and requires patience… a lot of patience to fine tune results to get the visual you want through trial and error. For anyone just trying, this may be a surprise, but I will say that it builds your skills in envisioning a desired image, and converting that vision into words.

Midjourney

Midjourney Inc. is a private company based in San Francisco and founded by David Holz. The first version of the model launched in its beta in March of 2022 on Discord, which is a separate platform for communication and communities. The previous version was version 5.2.

Midjourney 6 Alpha

Version 6 continues to be accessible via Discord as of this writing, but access through a more standard website is expected soon. And I can’t wait. There’s nothing fun about using Discord and I look forward to using an effective UI.

Where MJ6 stands out is in rendering amazingly realistic images. It seems to understand composition much better and the relationship between objects in an image.

Comparing MJ6 to MJ5.2

Midjourney 6 over 5.2 creates images that are more realistic, crisp, detailed and accurate. The nuances and depth have increased - especially for things like backlighting, reflections, and representing the natural world overally.

Midjourney v6
Midjourney rendering of Chewbacca in hot tub
Midjourney v5.2
Midjourney v 5.2 rendering of Chewbacca in hot tub

While the 5.2 image does have fine detail, in version 6, Chewie looks like he’s really chillin’.

Experimenting with Midjourney v6

Over the holiday, I was in a Star Wars mood and had fun hanging with Chewie. It took much patience and trial and error to get these. It often took dozens of prompts to get an image that I really liked.

While the detail of each is impressive, what I think is most interesting is the depth, the tone, the style and composition. These are nuanced images which tell stories.

_{prompt: photo of tiny Pikachu and huge Chewbacca on a couch. Chewbacca trying to take game controller from Pikachu. Dark room}

_{prompt: Little Annie character smoking a fat lit cigar from side of mouth}
Midjourney rendering of orphan annie smoking a cigar

_{prompt: a black 2019 Volkwagen Golf GTI with snow groomer cat tracks instead of wheels. Plowing through snow heading up mountain} blog_propdave_a_black_with_red_trim_2019_volkwagen_golf_gti_with_sno_6feb6359-87df-4d72-926b-6a7e690b6aa7.jpg

_{prompt: a photo of cyberpunk post apocolyptic roadwarrior woman with shaved head ordering a drink at a dimly lit rough bar. she has abundant tatoos and peircings. leather vest, leather boots} Midjourney 6 rendering of a female cyberpunk in a bar

_{prompt: a photo of Pikachu meeting the Queen of England}
Midjourney rendering of Pikachu meeting the Queen

Midjourney 6 Pros

For generating photo-real images, it’s the best. Images not only have more detail, but also more natural nuance. It gets nature and seems to lean toward taking its own poetic license.
Artistic - nuanced
Strong community via Discord

Midjourney 6 Cons

Does not follow instructions well.
Not iterative, you can’t respond to it and ask it to make adjustments
Not good at graphic design such as logo creation
Text generation has improved but still has a journey to take
Requires Discord - for now
User controls not as advanced as Firefly

ProTip: Use – style raw to get the most realistic images

Open AI's DALL-E 3 

DALL-E, developed by Open AI, may not deliver the same level of natural realism, but where it shines is in creating graphical design and illustrations. It can also be accessed for free using Bing Image Generator or through ChatGPT-4. One thing I really like as a ChatGPT 4 Pro user is the convenience of having these two combined. And not only in the same space, but what I find really interesting is that if you enter a vague prompt, it may take the opportunity to juice it up (see the following example), offering more details to get a better image back from DALL-E. I also really like the iterative process, which allows you to make adjustments in subsequent prompts, like add or remove objects, increase styling, or anything you dream up. This is hopefully the direction that Midjourney will take with its new website.

_{DALL-E enhanced prompt with followup}
Screenshot of DALL-E prompt thread demonstrating how Chat GPT enhances a prompt

DALL-E’s usage restrictions may be too much for some as well. For instance, I wanted it to create an ironic image of Orphan Annie smoking a cigar. DALL-E said no and suggested swapping the cigar for a lollipop. And also no-go for Chewie in a hot tub. 

_{DALL-E alternative rendering of Orhan Annie smoking a cigar}
DALL-E rendering of a little girl with red hair holding a lollipop

_{DALL-E alternative rendering of Chewbacca in a hot tub}
Screen shot of DALL-E UI explaining why I it won't render an image of Chewbacca, then shows a pathetic little furry alternative in a tub

DALL-E Pros

It follows instructions better
Better at graphic design
Better for highly saturated eye-catching social media images
Infusion of GPT allows it to enhance the prompt you write to give DALL-E better instructions
Iterative in nature that allows the user to interact with the the image

DALL-E Cons

Simplistic immature design style
Not quite capable of creating photo-realistic images
Displays two images, versus Midjourney’s 4 per prompt
Also has issues with text generation
Restrictive usage rules

Output Comparisons: Midjourney 6 vs DALL-E 3

The proof is in the pudding, so let’s see some comparisons starting with the user interface. Although note that we’re expecting Midjourney’s new website UI soon. But I do want to show the difference between the two, and also show Adobe Firefly’s interface, which is the best.

User Interface Comparisons for Prompting

I'm first throwing in Adobe's Firefly UI, because it is just so good and easy to use. I hope to see Midjourney emulate this.

Adobe Firefly UI for prompting

DALL-E is offers nothing more than an open prompt field, and outputs two versions:

DALL-E UI for prompting

Midjourney via Discord... augh. Soon to be updated:

Midjourney UI for prompting

Comparing Renderings from DALL-E and Midjourney

First though, I want to include just one rendering from Adobe Firefly to start us off. The rest compare DALL-E and Midjourney where I trust you'll see how much better Midjourney is at creating realistic visual spaces, and how DALL-E shines in graphic design.

Comparing Adobe Firefly v DALLE-3 V Midjourney 6
Prompt : realistic photo of a brook in a lush forest in New England. Morning light streaking through the trees. A deer approaching the brook. Ferns.

_{< Firefly>}
Firefly rendering of a brook through a forest with a deer

^{< DALL-E 3 >} DALL-E 3 rendering of a brook through a forest with a deer

^{< Midjourney 6 >} Midjourney 6 rendering of a brook through a forest with a deer

Comparing DALLE-3 V Midjourney 6
Prompt : realistic closeup photo of an elderly man in an urban setting, leaning against a door opening with his hand to his face smoking a cigar. No emotion. Distant look in eyes. Moody misty.

^{< DALL-E >} DALL-E rendering of man smoking cigar in doorway

^{< Midjourney >}

Comparing DALLE-3 V Midjourney 6
Prompt :a graphic designed logo for a company called Buzz Coffee that sells very strong coffee

< DALL-E >
DALL-E rendering of a logo for fictitious company Buzz Coffee

< Midjourney>
Midjourney DALL-E rendering of a logo for fictitious company Buzz Coffee

Comparing DALLE-3 V Midjourney 6
Prompt : a creative graphic design square poster for an electronic music band from the late 1990s. Flyer art style.

< DALL-E >

< Midjourney >

Final Thoughts

I'm more of a photographer than a graphic designer, so I'm just blown away by the natural realism that Midjourney delivers. It gets mood. It gets color, depth, nuance and composition. It can create visual poetry.

Considering these capabilities were developed in just a year, just imagine what 2024 will bring for AI image creation… and video… and music…

Midjourney 6 rendering of David and Chewie

Stay tuned...

and may The Force be with you.

- Dave