0% found this document useful (0 votes)
91 views11 pages

Cat Dog Classification Report

This technical report details an image classification project using convolutional neural networks (CNN) to distinguish between cat and dog images, achieving approximately 94% accuracy on test data. The methodology includes dataset processing, data augmentation, and the exploration of various CNN architectures, including custom models and transfer learning approaches. Key findings highlight the effectiveness of regularization techniques and data augmentation in enhancing model performance and generalization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views11 pages

Cat Dog Classification Report

This technical report details an image classification project using convolutional neural networks (CNN) to distinguish between cat and dog images, achieving approximately 94% accuracy on test data. The methodology includes dataset processing, data augmentation, and the exploration of various CNN architectures, including custom models and transfer learning approaches. Key findings highlight the effectiveness of regularization techniques and data augmentation in enhancing model performance and generalization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Technical Report: Cat vs Dog Image Classification

1. Introduction
This report provides a comprehensive overview of an image classification project focused
on distinguishing between cat and dog images. The project implements a convolutional
neural network (CNN) to perform binary classification on a dataset of pet images. The
primary objective is to build a model capable of accurately identifying whether an image
contains a cat or a dog, which demonstrates the application of deep learning techniques in
computer vision tasks.
The ability to automatically classify images of pets has numerous practical applications,
including: - Content moderation and filtering in social media platforms - Automated tagging
in photo management applications - Pet monitoring systems for smart homes - Research
tools for animal behavior studies - Enhanced search functionality in image databases
This project explores various CNN architectures, including both custom-designed networks
and transfer learning approaches using pre-trained models, to determine the most effective
approach for this specific classification task.

2. Methodology
2.1 Dataset
The project uses the “PetImages” dataset, which contains thousands of cat and dog images
stored in a directory structure under ‘PetImages/Cat’ and ‘PetImages/Dog’.

2.1.1 Dataset Characteristics


• Source: The dataset consists of cat and dog images collected from various sources
• Size: The original dataset contains several thousand images for each class
• Format: Images are stored in JPEG format with varying dimensions and quality
• Organization: Originally organized into two main directories (Cat and Dog)

2.1.2 Dataset Processing


The dataset underwent several preprocessing steps:
1. Image Validation: A validation function (is_valid_image()) was implemented to
check if each image could be opened without errors, filtering out corrupted files.

2. Renaming and Organization: Valid images were renamed systematically (e.g.,


‘cat1.jpg’, ‘dog1.jpg’) and moved to a structured directory format using the
rename_and_move() function.

3. Train-Test Split: Images were split into training and testing sets with a ratio of
80:20 using stratified sampling to maintain class balance.
4. Dataset Structure: A custom dataset structure was created with the following
organization:

new_dataset_dogs_vs_cats/
├── train/
│ ├── cats/
│ └── dogs/
└── test/
├── cats/
└── dogs/

2.2 Data Preprocessing


Several sophisticated preprocessing steps were implemented to prepare the data:

2.2.1 Image Validation and Filtering


def is_valid_image(file_path):
try:
img = Image.open(file_path)
img.verify()
return True
except Exception:
return False

This function filtered out corrupted or invalid images that could potentially cause errors
during training.

2.2.2 Data Augmentation


Data augmentation was applied to increase the diversity of the training data using
ImageDataGenerator:
train_datagen = ImageDataGenerator(rescale=1./255,
rotation_range = 15,
horizontal_flip = True,
zoom_range = 0.2,
shear_range = 0.1,
fill_mode = 'reflect',
width_shift_range = 0.1,
height_shift_range = 0.1)

The augmentation included: - Rotation: Images were randomly rotated up to 15 degrees -


Horizontal Flipping: Images were randomly flipped horizontally - Zoom: Random zoom
transformations up to 20% - Shear: Random shear transformations up to 10% -
Width/Height Shifts: Random shifts in width and height up to 10% - Fill Mode: ‘reflect’
mode was used to fill empty pixels created by transformations

2.2.3 Normalization
All images were normalized by rescaling pixel values from [0,255] to [0,1]:
test_datagen = ImageDataGenerator(rescale=1./255)

2.2.4 Resizing
All images were resized to a standard size of 128x128 pixels to ensure consistent input
dimensions for the neural network:
target_size = (image_size, image_size) # image_size = 128

2.3 Data Split


The dataset was meticulously split using stratified sampling to ensure class balance:
X_train, X_temp = train_test_split(data, test_size=0.2,
stratify=labels, random_state=42)
label_test_val = X_temp['label']
X_test, X_val = train_test_split(X_temp, test_size=0.5,
stratify=label_test_val, random_state=42)

This resulted in: - Training set: 80% of the data (approximately 2,263 cat images and
2,267 dog images) - Validation set: 10% of the data (approximately 283 cat images and
284 dog images) - Testing set: 10% of the data (approximately 283 cat images and 284
dog images)

2.4 Model Architecture


The project experimented with three different model architectures to compare their
performance:

2.4.1 Custom CNN Model


A sequential model was designed with the following architecture:
model = Sequential()

# Input Layer
model.add(Conv2D(32,
(3,3),activation='relu',input_shape=(image_size,image_size,image_chann
el)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

# Block 1
model.add(Conv2D(64,(3,3),activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

# Block 2
model.add(Conv2D(128,(3,3),activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

# Block 3
model.add(Conv2D(256,(3,3),activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

# Fully Connected layers


model.add(Flatten())
model.add(Dense(512,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.2))

# Output layer
model.add(Dense(2,activation='softmax'))

The architecture includes: - Input Layer: Conv2D layer with 32 filters (3x3 kernel) and
ReLU activation - Block 1-3: Increasingly complex convolutional blocks with 64, 128, and
256 filters respectively - Regularization: Batch normalization and dropout (0.2) after each
block to prevent overfitting - Fully Connected: Flattened layer followed by a dense layer
with 512 units - Output: Dense layer with 2 units (one for each class) and softmax
activation for probability distribution

2.4.2 ResNet50 Transfer Learning Model


def create_resnet_model(input_shape):
base_model_resnet = ResNet50(weights='imagenet',
include_top=False, input_shape=input_shape)
x = base_model_resnet.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(2, activation='softmax')(x)
model_resnet = Model(inputs=base_model_resnet.input,
outputs=predictions)
return model_resnet

This model used: - Pre-trained ResNet50 architecture with weights from ImageNet - Global
Average Pooling to reduce spatial dimensions - Dense layer with 512 units and ReLU
activation - Dropout layer with 0.5 rate for regularization - Output layer with softmax
activation

2.4.3 DenseNet121 Transfer Learning Model


def create_densenet_model(input_shape):
base_model_densenet = DenseNet121(weights='imagenet',
include_top=False, input_shape=input_shape)
x = base_model_densenet.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(2, activation='softmax')(x)
model_densenet = Model(inputs=base_model_densenet.input,
outputs=predictions)
return model_densenet

This model used: - Pre-trained DenseNet121 architecture with weights from ImageNet -
Same additional layers as the ResNet50 model for adaptation to the cat/dog classification
task

2.5 Training Procedure

2.5.1 Model Compilation


The model was compiled with the following parameters:
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])

• Optimizer: Adam optimizer, which adapts the learning rate based on the first and
second moments of the gradients
• Loss Function: Binary cross-entropy, appropriate for binary classification tasks
• Metrics: Accuracy was tracked during training

2.5.2 Callbacks
Two important callbacks were used to improve training efficiency:
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy',
patience=2,
factor=0.5,
min_lr=0.00001,
verbose=1)

early_stoping = EarlyStopping(monitor='val_loss',
patience=3,
restore_best_weights=True,
verbose=0)

1. ReduceLROnPlateau: Reduced the learning rate by a factor of 0.5 when the


validation accuracy plateaued for 2 epochs, with a minimum learning rate of
0.00001.
2. EarlyStopping: Stopped training when the validation loss stopped improving for 3
consecutive epochs and restored the model weights to the best epoch.

2.5.3 Training Configuration


The model was trained with the following configuration:
cat_dog = model.fit(train_generator,
validation_data=val_generator,
callbacks=[early_stoping,
learning_rate_reduction],
epochs=30)

• Batch Size: 32 (defined earlier)


• Maximum Epochs: 30, though early stopping could terminate training earlier
• Training Data: Augmented images from the training set
• Validation Data: Non-augmented images from the validation set

3. Results
3.1 Model Performance Metrics
The final model achieved excellent performance across all datasets:

3.1.1 Accuracy Metrics


• Training Accuracy: 94.2%
• Validation Accuracy: 93.8%
• Test Accuracy: 94.0%

3.1.2 Loss Metrics


• Training Loss: 0.158
• Validation Loss: 0.187
• Test Loss: 0.169

3.2 Learning Curves


The learning curves showed a consistent improvement in model performance over the
training epochs:

3.2.1 Accuracy Curves


The accuracy curves demonstrated: - Rapid increase in the first 3-5 epochs (from ~70% to
~90%) - Gradual improvement in the subsequent epochs - Minimal gap between training
and validation accuracy, indicating good generalization

3.2.2 Loss Curves


The loss curves showed: - Sharp decrease in the first few epochs - Gradual convergence
thereafter - Close tracking between training and validation loss, further confirming good
generalization

3.3 Classification Report


The detailed classification report showed balanced performance across both classes:
precision recall f1-score support

Cat 0.94 0.94 0.94 283


Dog 0.94 0.94 0.94 284

accuracy 0.94 567


macro avg 0.94 0.94 0.94 567
weighted avg 0.94 0.94 0.94 567

• Precision: 0.94 for both cat and dog classes


• Recall: 0.94 for both cat and dog classes
• F1-Score: 0.94 for both cat and dog classes
• Support: Nearly identical for both classes (283 cats and 284 dogs)

3.4 Confusion Matrix


The confusion matrix revealed detailed classification outcomes:
[[266 17]
[ 17 267]]

• True Positives (Cat): 266


• False Positives (Cat classified as Dog): 17
• False Negatives (Dog classified as Cat): 17

• True Positives (Dog): 267


These results show a highly balanced model with nearly identical performance across
classes.

4. Discussion
4.1 Model Performance Analysis
The model demonstrated strong performance in distinguishing between cats and dogs,
achieving an accuracy of 94% on the test dataset. Several factors contributed to this
success:

4.1.1 Data Augmentation Impact


The application of image augmentation techniques significantly improved the model’s
generalization capabilities. By artificially expanding the training dataset through
transformations, the model learned to recognize cats and dogs in various positions,
orientations, and scales. The reflection-based fill mode also ensured that transformations
did not introduce artifacts that could confuse the model.
4.1.2 Regularization Effectiveness
The implementation of dropout layers (with a rate of 0.2) and batch normalization after
each convolutional block effectively prevented overfitting, as evidenced by: - The small gap
between training and validation accuracy (~0.4%) - Consistent performance on the test set
- Smooth learning curves without wild fluctuations

4.1.3 Architectural Decisions


The progressive increase in filter numbers (32 → 64 → 128 → 256) allowed the model to
learn hierarchical features effectively: - Lower layers captured basic features like edges and
textures - Middle layers learned patterns like whiskers, ears, and paws - Higher layers
identified complex features unique to cats and dogs

4.2 Challenges Encountered


During model development, several significant challenges were encountered:

4.2.1 Image Variability


The dataset contained images with: - Different resolutions (from low-quality to high-
resolution) - Various lighting conditions (indoor, outdoor, bright, dim) - Different poses and
orientations of animals - Multiple animals in a single image - Partial visibility of animals
To address this, a combination of resizing and data augmentation was employed to
normalize inputs while preserving important features.

4.2.2 Invalid Images


The original dataset contained corrupted or invalid images that required filtering. The
is_valid_image() function successfully identified and removed problematic files,
preventing training errors. Approximately 5-10% of the original images were found to be
invalid.

4.2.3 Class Balance


Ensuring balanced representation of both classes was crucial for achieving unbiased model
performance. This was accomplished through: - Stratified sampling during train-test splits -
Monitoring class distributions in the generated batches - Equal augmentation applied to
both classes

4.2.4 Computational Resources


Training deeper models, particularly those using transfer learning, required significant
computational resources. To manage this constraint: - Image size was limited to 128x128
pixels - Batch size was set to 32 - Early stopping was implemented to prevent unnecessary
epochs - Learning rate reduction was used to fine-tune convergence without excessive
iterations
4.3 Model Improvements Over Time
The model’s performance improved significantly through several iterations:

4.3.1 Architectural Refinements


Initial experiments with simpler architectures (fewer layers, fewer filters) achieved ~85%
accuracy. The introduction of additional convolutional blocks and increased filter counts
improved feature extraction capabilities, resulting in a ~5% accuracy boost.

4.3.2 Hyperparameter Tuning


Systematic hyperparameter optimization led to notable improvements: - Adjusting dropout
rates from 0.5 to 0.2 improved training dynamics - Batch normalization stabilized learning
and accelerated convergence - Learning rate reduction allowed fine-grained optimization
in later epochs

4.3.3 Transfer Learning Comparison


Both ResNet50 and DenseNet121 models were tested as alternatives to the custom CNN.
These pre-trained models showed: - Faster initial convergence (reaching ~90% accuracy in
2-3 epochs) - Comparable final accuracy (~94-95%) - Higher computational requirements
The custom CNN was ultimately chosen for the final model due to its balance of
performance and efficiency.

5. Conclusion
This project successfully implemented a deep learning model for cat and dog image
classification, achieving approximately 94% accuracy on test data. The results demonstrate
the effectiveness of convolutional neural networks for image classification tasks.

5.1 Key Findings


1. CNN Architecture: The custom CNN architecture with progressively increasing
filter sizes (32→64→128→256) proved highly effective for pet image classification.

2. Regularization Importance: The combination of dropout (0.2) and batch


normalization was crucial in preventing overfitting, allowing the model to
generalize well to unseen data.

3. Data Augmentation Impact: Image augmentation techniques significantly


enhanced model performance by exposing it to a wider variety of image
transformations.

4. Transfer Learning Viability: While transfer learning models performed well, the
custom CNN achieved comparable results with lower computational requirements.

5. Balanced Performance: The model demonstrated nearly identical precision, recall,


and F1-scores for both cat and dog classes, indicating robust and unbiased
classification.
5.2 Limitations
Despite the strong performance, several limitations should be acknowledged:
1. Image Resolution: The restriction to 128x128 pixels may have limited the model’s
ability to capture fine-grained details.

2. Binary Classification: The model is limited to binary classification (cats vs. dogs)
and would require significant modifications for multi-class scenarios.

3. Background Influence: The model may be influenced by background elements


rather than focusing exclusively on the animals.

5.3 Future Work


Several avenues for future improvement include:
1. Higher Resolution: Training with larger images (e.g., 224x224 or 299x299) could
improve feature extraction at the cost of increased computational requirements.

2. Attention Mechanisms: Implementing attention mechanisms could help the model


focus on the animals rather than background elements.

3. Ensemble Methods: Combining predictions from multiple models (custom CNN,


ResNet, DenseNet) could potentially increase accuracy further.

4. Explainability: Adding visualization techniques like Grad-CAM could provide


insights into which image regions influence classification decisions.

5. Deployment Optimization: Converting to TensorFlow Lite or ONNX format could


facilitate deployment on mobile or edge devices.

This project has demonstrated the practical application of deep learning techniques to a
real-world image classification problem, achieving excellent results that could be valuable
in various pet-related applications.

6. References
1. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition, 770-778.

2. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected
convolutional networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition, 4700-4708.

3. Chollet, F. (2017). Deep learning with Python. Manning Publications.

4. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:1409.1556.
5. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network
training by reducing internal covariate shift. In International conference on machine
learning, 448-456.

6. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting. The journal of
machine learning research, 15(1), 1929-1958.

7. TensorFlow documentation: https://www.tensorflow.org/api_docs

8. Keras documentation: https://keras.io/api/

9. Parkhi, O. M., Vedaldi, A., Zisserman, A., & Jawahar, C. V. (2012). Cats and dogs. In
IEEE Conference on Computer Vision and Pattern Recognition, 3498-3505.

10. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980.

You might also like