0% found this document useful (0 votes)

42 views6 pages

NM Narash

Uploaded by

ragul2661996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

NM Narash

Uploaded by

ragul2661996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Gen AI: Creating images from text description with ai: build a system that

can generate high quality images based on textual prompts.

1. Abstract:

The proposed project focuses on the development of an AI-driven system capable of

generating high-quality images from textual descriptions. Leveraging advancements in
natural language processing (NLP) and generative models, the system will interpret user
prompts and create visually accurate images. The system will be trained on large datasets
comprising both text and corresponding images, ensuring it can understand a wide variety of
descriptions, ranging from simple objects to complex scenes. The model will be designed to
handle various artistic styles, photorealism, and abstract visuals, ensuring flexibility and
creativity in image generation.

The core technology behind this project is a combination of transformer-based NLP models,
such as GPT, and deep generative models like Generative Adversarial Networks (GANs) or
diffusion models. By synthesizing textual input with visual elements, the system will
progressively enhance the realism and quality of generated images. Key features include
customizable style parameters, the ability to refine outputs, and scalability to handle diverse
requests. The system will also focus on efficiency, ensuring high-quality output with
optimized computational resources.

This project has potential applications across various industries such as media, design,
marketing, and education, where creating visual content quickly and accurately is crucial. By
streamlining the creative process, this AI image generation system will empower
professionals and hobbyists alike to produce stunning visuals with minimal effort, lowering
the barriers to high-quality content creation.

2. System Requirements for AI Image Generation from Textual Prompts

2.1. Hardware Requirements:

● GPU (Graphics Processing Unit):

A powerful GPU is essential for training and running deep learning models.
Recommended: NVIDIA A100, V100, or RTX 3080 with at least 12 GB of VRAM.
● CPU (Central Processing Unit):
A high-performance multi-core processor is required for handling complex operations
and data preprocessing.
Recommended: Intel Core i7/i9 or AMD Ryzen 7/9.
● RAM:
For handling large datasets and processing image generations efficiently.
Minimum: 32 GB
Recommended: 64 GB or more.
● Storage:
High-speed SSDs are required for storing datasets, models, and generated images.
Minimum: 1 TB SSD
Recommended: 2 TB SSD with additional external storage for backups.
● Power Supply and Cooling:
A robust power supply and efficient cooling system are essential, especially for
extended model training sessions.

2.2. Software Requirements:

● Operating System:
Ubuntu 20.04 or later (for compatibility with machine learning frameworks) or
Windows 10/11.
● Python Environment:
Python 3.8 or later, with virtual environment support to isolate dependencies.
● Libraries and Frameworks:
o PyTorch or TensorFlow (for building and training generative models)
o Hugging Face Transformers (for handling NLP tasks and textual prompt
processing)
o CUDA (for GPU acceleration on NVIDIA hardware)
o cuDNN (for optimizing deep learning performance)
o OpenCV or Pillow (for image handling and preprocessing)
● Text-to-Image Models:
Pre - trained models such as DALL-E, Stable Diffusion, or CLIP can be fine-
tuned for improved performance.
● Development Tools:
o Jupyter Notebook or VS Code for interactive development and debugging.
o Git for version control.
o Docker (optional) for containerized environments and easy deployment.

2.3. Additional Requirements:

● Datasets:
Access to large image-text paired datasets like MS-COCO, OpenAI’s WebImageText,
or other publicly available datasets for training the model.
● Cloud Support (Optional):
For large-scale deployments and training, services like AWS, Google Cloud, or Azure
for GPU/TPU instances may be used for scalability.
● API Integration:
Optional API integration for generating images via web interfaces, which requires
setting up RESTful APIs with Flask or Fast API for seamless integration.
3.Flow chart:
4.Code implementation for project on text to speech:

!pip install diffusers transformers accelerate torch datasets

from huggingface_hub import notebook_login

# Log in to Hugging Face to access the dataset

notebook_login()

from datasets import load_dataset

# Load the dataset from Hugging Face Hub with the correct split
dataset = load_dataset("anjunhu/naively_captioned_CUB2002011_test", split="train")

import torch
from diffusers import StableDiffusionPipeline, DDPMScheduler
from transformers import CLIPTokenizer
from datasets import load_dataset
from torch.optim import AdamW
from accelerate import Accelerator

# Load the dataset from Hugging Face Hub with the correct split
dataset = load_dataset("anjunhu/naively_captioned_CUB2002011_test", split="train")

# Load the Stable Diffusion model

model_id = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPipeline.from_pretrained(model_id,
torch_dtype=torch.float16).to("cuda")

# Load the scheduler for Stable Diffusion

noise_scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")

# Initialize the optimizer on the UNet part of the pipeline

optimizer = AdamW(pipeline.unet.parameters(), lr=5e-5)

# Accelerator for mixed precision training

accelerator = Accelerator(mixed_precision="fp16")

# Initialize tokenizer for the text

tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")

# Tokenize the text prompts in the dataset

def preprocess_data(batch):
text_inputs = tokenizer(batch['text'], padding='max_length', truncation=True,
return_tensors='pt')
return text_inputs

# Apply the preprocessing to the dataset

dataset = dataset.map(preprocess_data, batched=True)

# Set the number of training epochs

num_epochs = 3

# Training loop
for epoch in range(num_epochs):
for batch in dataset:
optimizer.zero_grad()

# Convert tokenized captions (which is a list) to a tensor of type long

captions = torch.tensor(batch["input_ids"]).to(accelerator.device).long().unsqueeze(0)
# Add batch dimension

# Sample random noise for diffusion (in float16)

noise = torch.randn((1, 4, 64, 64), dtype=torch.float16).to(accelerator.device)
timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (1,),
device=accelerator.device).long()

# Use the text encoder to get embeddings for the captions

text_embeddings = pipeline.text_encoder(captions).last_hidden_state

# Forward pass (generate images from text using diffusion)

noise_added = noise_scheduler.add_noise(noise, noise, timesteps)

# Pass the noise and text embeddings to the U-Net

model_output = pipeline.unet(noise_added, timesteps, text_embeddings).sample

# Compute the loss (MSE)

loss = torch.nn.functional.mse_loss(model_output, noise)

# Backward pass
accelerator.backward(loss)
optimizer.step()

print(f"Epoch {epoch + 1} | Loss: {loss.item()}")

from PIL import Image

# Generate an image from a caption

text_prompt = "A photo of a crow."

# Use the pipeline to generate an image from the text

image = pipeline(prompt=text_prompt).images[0]

# Display the generated image

image.show()

# Save the generated image to a file

image.save("generated_image.png")

print("Image saved as generated_image.png")

Project Hurdles

As we use free GPU for training using the data sets number of epochs we test is very
less. Hope it can be optimized in phase three. Due to this issue the output we get is not
proper.

5. OUTPUT

AI Text-to-Image Generator Guide
No ratings yet
AI Text-to-Image Generator Guide
12 pages
Building A System That Can Generate High
No ratings yet
Building A System That Can Generate High
2 pages
GenAI Pjs
No ratings yet
GenAI Pjs
2 pages
Ijariie 26613
No ratings yet
Ijariie 26613
5 pages
Image Caption
No ratings yet
Image Caption
16 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Title Aproval Page
No ratings yet
Title Aproval Page
1 page
Final SRS
No ratings yet
Final SRS
10 pages
Ai Image Phase 2
No ratings yet
Ai Image Phase 2
4 pages
Final PPT
No ratings yet
Final PPT
13 pages
Text Generation Using Deep Learning Abstract
No ratings yet
Text Generation Using Deep Learning Abstract
24 pages
Project I - Image Captioning With Deep Learning
No ratings yet
Project I - Image Captioning With Deep Learning
3 pages
Gen Ai Lab - DS
No ratings yet
Gen Ai Lab - DS
26 pages
AI Project Challenges for Developers
No ratings yet
AI Project Challenges for Developers
6 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
Document From Deependra Singh
No ratings yet
Document From Deependra Singh
10 pages
Review 3
No ratings yet
Review 3
18 pages
Updated Poster
No ratings yet
Updated Poster
1 page
Pgi20s02j - Lab Record
No ratings yet
Pgi20s02j - Lab Record
24 pages
Generative AI Mini Projects
No ratings yet
Generative AI Mini Projects
39 pages
Group 8
No ratings yet
Group 8
13 pages
Genaitable
No ratings yet
Genaitable
3 pages
UNIT VI Gen-AI ASP Notes
No ratings yet
UNIT VI Gen-AI ASP Notes
11 pages
Synopsis Minorproject1
No ratings yet
Synopsis Minorproject1
3 pages
Lab Programs
No ratings yet
Lab Programs
4 pages
Review 3
No ratings yet
Review 3
18 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
BCA Image Captioning Project
No ratings yet
BCA Image Captioning Project
15 pages
Image Captioning with Deep Learning
No ratings yet
Image Captioning with Deep Learning
5 pages
Henmnath
No ratings yet
Henmnath
4 pages
Project Review
No ratings yet
Project Review
12 pages
AI Image Captioning System Development
No ratings yet
AI Image Captioning System Development
25 pages
ALGORITHM Saikareddy Img Cap-1742112866980
No ratings yet
ALGORITHM Saikareddy Img Cap-1742112866980
6 pages
FYP Ideas: AI-Based Image Generation For Social Media Content
No ratings yet
FYP Ideas: AI-Based Image Generation For Social Media Content
2 pages
Project Proposal
No ratings yet
Project Proposal
22 pages
Omnigen: Unified Image Generation
No ratings yet
Omnigen: Unified Image Generation
20 pages
BTP Report
No ratings yet
BTP Report
27 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
8 pages
ImageGenerator Project Report (Tech-Titans)
No ratings yet
ImageGenerator Project Report (Tech-Titans)
25 pages
AI Text Generator Beginner Prototype
No ratings yet
AI Text Generator Beginner Prototype
2 pages
P59 258421e
No ratings yet
P59 258421e
1 page
20241017-NVIDIA-生成式 AI 的實踐之路：輕鬆推論與大規模部署
No ratings yet
20241017-NVIDIA-生成式 AI 的實踐之路：輕鬆推論與大規模部署
29 pages
Ai Image Phase 3
No ratings yet
Ai Image Phase 3
5 pages
Text Rendering A Survey
No ratings yet
Text Rendering A Survey
36 pages
Internship Report (Sanjay Final)
No ratings yet
Internship Report (Sanjay Final)
45 pages
Ttoimage Merged
No ratings yet
Ttoimage Merged
57 pages
Major Project Synopsis
No ratings yet
Major Project Synopsis
14 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
Bithack Tac
No ratings yet
Bithack Tac
3 pages
Assignment 4 - CSE - AI - 2
No ratings yet
Assignment 4 - CSE - AI - 2
6 pages
Deep Learning Image Captioning
No ratings yet
Deep Learning Image Captioning
6 pages
Sample Report PDF
No ratings yet
Sample Report PDF
25 pages
Utilizing Generative AI For Text-To-Image Generation
No ratings yet
Utilizing Generative AI For Text-To-Image Generation
6 pages
Minor
No ratings yet
Minor
14 pages
Edith PPT
No ratings yet
Edith PPT
22 pages
Group No.17: Class-Ai - A Sub-Edi
No ratings yet
Group No.17: Class-Ai - A Sub-Edi
14 pages
Generative AI Questions
No ratings yet
Generative AI Questions
4 pages
Multimodal
No ratings yet
Multimodal
25 pages
Introduction to Design Thinking
No ratings yet
Introduction to Design Thinking
57 pages
Raghul Front Page
No ratings yet
Raghul Front Page
2 pages
Empathy Mapping - The First Step in Design Thinking
No ratings yet
Empathy Mapping - The First Step in Design Thinking
18 pages
Events Scoreboard
No ratings yet
Events Scoreboard
1 page
Mechanical B Section
No ratings yet
Mechanical B Section
9 pages
Rahul
No ratings yet
Rahul
2 pages
E Certificate
No ratings yet
E Certificate
1 page
Dell Inspiron M5030 AMD 10211-1 48.4EM18.110 Schematics
No ratings yet
Dell Inspiron M5030 AMD 10211-1 48.4EM18.110 Schematics
58 pages
Nutanix Community Edition Getting Started v2 - 0
No ratings yet
Nutanix Community Edition Getting Started v2 - 0
20 pages
AP Invoice Interface
50% (2)
AP Invoice Interface
29 pages
WindowWindows PowerShells PowerShell - Compressed
75% (4)
WindowWindows PowerShells PowerShell - Compressed
84 pages
Computer Full in One
No ratings yet
Computer Full in One
151 pages
HP Printer Memory & Connectivity
No ratings yet
HP Printer Memory & Connectivity
2 pages
Cara Format Laptop Tanpa CD
100% (1)
Cara Format Laptop Tanpa CD
7 pages
ThunderBolt Training - Day 1
No ratings yet
ThunderBolt Training - Day 1
32 pages
Ipt
No ratings yet
Ipt
32 pages
Blue Coat SNMP Critical Resource Monitoring.8
No ratings yet
Blue Coat SNMP Critical Resource Monitoring.8
26 pages
Master of Integrated Technology - CSE - STRUCTURE - Syllabus - July 2020 PDF
No ratings yet
Master of Integrated Technology - CSE - STRUCTURE - Syllabus - July 2020 PDF
34 pages
Interactive Historical Web Maps
No ratings yet
Interactive Historical Web Maps
17 pages
Answer Exam Microprocessor - ECE341 2023 2024
No ratings yet
Answer Exam Microprocessor - ECE341 2023 2024
6 pages
Deploy Eda 6100 8100 9100
No ratings yet
Deploy Eda 6100 8100 9100
5 pages
Price List - EOL (India) - L597321A-en - GB
No ratings yet
Price List - EOL (India) - L597321A-en - GB
15 pages
Professional Data Engineers Google
No ratings yet
Professional Data Engineers Google
1 page
InteliSCADA GlobalGuide 2 2 50
No ratings yet
InteliSCADA GlobalGuide 2 2 50
153 pages
PROFINET Overview and Features
No ratings yet
PROFINET Overview and Features
25 pages
Calibrate MacBook Battery Guide
100% (1)
Calibrate MacBook Battery Guide
2 pages
LT-1113 OpenGN Administrator Guide
No ratings yet
LT-1113 OpenGN Administrator Guide
139 pages
Spring Boot Notes:: List of Annotations
No ratings yet
Spring Boot Notes:: List of Annotations
9 pages
7.web Application Architecture
No ratings yet
7.web Application Architecture
23 pages
Cloud Storage Comparison Chart
No ratings yet
Cloud Storage Comparison Chart
2 pages
11 Schematic Diagram
No ratings yet
11 Schematic Diagram
95 pages
Claris IT Checklist
No ratings yet
Claris IT Checklist
4 pages
Student Record System
No ratings yet
Student Record System
15 pages
TM3-GuideBook en Rev3
No ratings yet
TM3-GuideBook en Rev3
110 pages
Tar
No ratings yet
Tar
44 pages
Introduction To Huawei Intelligent Storage Products
No ratings yet
Introduction To Huawei Intelligent Storage Products
38 pages
Iolin: 7300 Flash Storage Platform Service Guide
No ratings yet
Iolin: 7300 Flash Storage Platform Service Guide
82 pages

NM Narash

Uploaded by

NM Narash

Uploaded by

Gen AI: Creating images from text description with ai: build a system that

can generate high quality images based on textual prompts.

The proposed project focuses on the development of an AI-driven system capable of

2. System Requirements for AI Image Generation from Textual Prompts

2.1. Hardware Requirements:

● GPU (Graphics Processing Unit):

2.2. Software Requirements:

2.3. Additional Requirements:

!pip install diffusers transformers accelerate torch datasets

# Log in to Hugging Face to access the dataset

from datasets import load_dataset

# Load the Stable Diffusion model

# Load the scheduler for Stable Diffusion

# Initialize the optimizer on the UNet part of the pipeline

# Accelerator for mixed precision training

# Initialize tokenizer for the text

# Tokenize the text prompts in the dataset

# Apply the preprocessing to the dataset

# Set the number of training epochs

# Convert tokenized captions (which is a list) to a tensor of type long

# Sample random noise for diffusion (in float16)

# Use the text encoder to get embeddings for the captions

# Forward pass (generate images from text using diffusion)

# Pass the noise and text embeddings to the U-Net

# Compute the loss (MSE)

print(f"Epoch {epoch + 1} | Loss: {loss.item()}")

from PIL import Image

# Generate an image from a caption

# Use the pipeline to generate an image from the text

# Display the generated image

# Save the generated image to a file

print("Image saved as generated_image.png")

You might also like