Conversational MMGen
In this project, you are tasked with simulating a collaborative woghghfghfghgfhfhgfhgfhghrkflow
between two roles: a goal oriented user and a professional editor. The user provides reference
images and detailed prompts outlining specific required adjustments or transformations. The
editor then creates high-quality result images that reflect the user’s instructions, mimicking the
output that an AI model would eventually generate.
The goal of this project is to develop training data for an AI model designed to follow user-
provided instructions and make precise adjustments to images. This AI model will cater to
professional users, such as small business owners or marketing professionals, who need
custom visual assets tailored to their specific needs.
The reference images, user prompts, and result images are all integral parts of the dataset. The
focus is on ensuring the transformations are accurate, visually coherent, and aligned with the
user’s input, enabling the AI to learn how to interpret and execute context and instructions in a
way that meet professional standards.
Objective:
Showcase your ability to develop an engaging, goal-oriented conversational prompt sequence
for a hypothetical visual project. Your task is to create a unique scenario, provide visual
inspiration with reference images and create an output based on your idea. This task simulates
a conversation between a client and a designer, and you will take on both roles.
Task Details:
1. Develop a Unique Scenario:
- Design an illustration scenario and a specific type of visual content within your chosen
area. The task should have a business related scenario, that a business
owner/marketing specialist, etc. would give to their designer. In the examples below we
are referring to this persona as “User”.
- Clearly define the project’s goal in your scenario, focusing on achieving a specific
outcome.
2. Incorporate Reference Images:
- Include at least 2 reference images in Step 1 that align with your scenario and would
help inspire or guide the hypothetical creator.
- Reference images should convey the style, tone, or key elements you envision for the
final piece. You can use reference images that could be directly or partially used to
create the result image.
3. Write a Multi-Turn Conversation:
- Begin with a detailed prompt that introduces the project, describes your needs, and
includes any stylistic preferences.
- Guide the designer through the process with clear and comprehensive prompts. After
each hypothetical result from the AI, follow up with additional guidance or refinements to
steer it toward the final result.
- Your multi-turn conversation should include at least two back-and-forth exchanges
(Steps).
- Be creative and diverse with your instructions. The more requirements you include in the
instructions, the better.
- Make the result images based on the prompts you’ve written - result images should
comply and closely resemble your instructions in the prompts.
4. Submission Requirements:
- Write your conversation in a simple text format in 2 or 3 steps that reads like an actual
conversation you would have with a designer.
- Include links or descriptions for your reference images.
- Ensure that your conversation demonstrates clarity, creativity, and goal-oriented
guidance.
- Please do not use ChatGPT for writing the instructions, as it can easily be detected.
- Write your conversation in a simple text format in 2 or 3 steps, as shown in the
examples.
- Include links or descriptions for your reference images.
- Ensure that your conversation demonstrates clarity, creativity, and goal-oriented
guidance.
- Please do not use ChatGPT for writing the instructions, as it can easily be detected.
Example 1
Step 1
User: As an illustrator, create an illustration of my dog to serve as a promotional visual for the
design studio. Here is the dog.
User: As a style reference, use the picture of this man with a dark blue beanie and light blue
shirt. Keep the positive attitude.
Assistant: [Creates the image]
Step 2
User: Create 2 characters for this dog. 1. drinking coffee. 2. Working on the laptop. Put these
two illustrations one above the other, on a white background.
Assistant: [Creates the image]
Example 2
Step 1
User: As an illustrator, create a character illustration for a comic sequel. The character should
be a stunning woman with pale skin and dark red lips, resembling the woman in the image.
User: Add a dark-colored Victorian-style dress that complements her look. Position her at a ¾
angle.
Assistant: [Creates the image]
Step 2
User: Illustrate two additional headshots of the character, one from the front and one from the
profile.
Assistant: [Creates the image]
FAQ
1. Are we looking for professional use cases only (marketing, game design, character
design) or also casual use cases (take my photo and make me a superhero, create
mushroom looking buildings, make a random person an astronaut)?
Professional use-cases. They can also be consumer use-cases but they should be
purposeful, not “just playing around”. For example, the “baseball team fundraiser” is
more of a consumer use-case, but it has a purpose beyond play or experimentation.
Making a holiday card or a school project would also count as purposeful. We don’t
want the playful “make me a superhero” or “turn the buildings into mushrooms”
examples in this collection.
2. Does each turn require a visual reference? Can a turn be limited to a text instruction?
Meaning, in the baseball field example, is it ok if the second turn does not include the
image reference of the spooky font and just has text?
The first turn must include at least two, and ideally three image references. The
second turn doesn’t have to include a reference image. We’d like for some of the
second-turn edits to include an image, but the image doesn’t have to be very
complicated.