0% found this document useful (0 votes)
20 views60 pages

Report II

yy

Uploaded by

penyameen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views60 pages

Report II

yy

Uploaded by

penyameen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

ANALYSING NEAR EARTH OBJECTS &

VISUALIZING HAZARDS

A PROJECT REPORT

Submitted by

Amirthavarshini P M (913120205005)

Swathi K (913120205073)

in partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

VELAMMAL COLLEGE OF ENGINEERING AND TECHNOLOGY,


(AUTONOMOUS)
MADURAI-625 009

ANNA UNIVERSITY :: CHENNAI 600 025


MAY 2024

1
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “ANALYSING NEAR EARTH


OBJECTS & VISUALIZING HAZARDS” is the bonafide work of
“Amirthavarshini P M (913120205005) and Swathi K (913120205073)”
who carried out the project work under my supervision.

SIGNATURE SIGNATURE

Dr. R. Kavitha M.E., Ph.D Mrs. A Vinora

HEAD OF THE DEPARTMENT SUPERVISOR

Professor and Head Assistant Professor

Department of Information Department of Information


Technology Technology

Velammal College of Engineering Velammal College of Engineering


and Technology and Technology

Submitted for the Project Viva-Voce Examination held on at


Velammal College of Engineering and Technology, Madurai.

INTERNAL EXAMINER EXTERNAL EXAMINER

i
ACKNOWLEDGEMENT

We thank the almighty for giving us moral strength to work on the project for the
past of months.

We would like to express thanks to management for giving an opportunity to do


our research work.

Our sincere thanks to our respected Principal Dr. P. ALLI for the encouragement
and motivation they offered us during the period of our research study.

Our heartfelt gratitude to Dr. R. KAVITHA, Head of the Department,


Information Technology for her valuable guidance, inspiration and encouraging
appreciations, which helped us a lot in completing this project in time.

We convey our thanks to our guide, Mrs. A.VINORA, for her innovative
suggestions and valuable guidance.

We would also wish to extend our sincere gratefulness to all faculty members of
the Department of Information Technology for their valuable guidance
throughout the course of our project. We also thank to our parents and friends
who provided moral and physical support.

ii
ABSTRACT

Embarking on an enthralling journey through the cosmos, this project


seamlessly intertwines cutting-edge data exploration and machine learning
techniques to unravel the mysteries within NASA's NEO database. With a
primary focus on deciphering NEO characteristics, predicting potential
hazards, and crafting intuitive visualizations, our study encompasses pivotal
stages such as meticulous data preprocessing, in-depth exploratory analysis,
machine learning model training, hazard predictions, and captivating visual
representations. Innovatively, our approach introduces novel methodologies in
data preprocessing, ensuring not just consistency but unlocking hidden facets
within the NASA NEO database. The exploratory analysis becomes a cosmic
adventure, revealing unprecedented patterns and trends. Emphasis is placed on
the dynamic distribution, orbital intricacies, and temporal variations of NEOs,
breathing life into the celestial tapestry. The heart of our innovation lies in
machine learning model training, where algorithms are not just optimized but
tailored with unprecedented precision, leveraging insights from historical data.
As these refined models take flight, they predict potential hazards entwined
with NEOs, heralding a new era in celestial threat anticipation. Our
visualizations transcend the ordinary, painting a cosmic mural that showcases
spatial and temporal distributions with an artistic flair, alongside vivid
depictions of hazard predictions.

iii
TABLE OF CONTENT

CHAPTER NO TITLE PAGE NO

ABSTRACT iii

LIST OF FIGURES v

1. INTRODUCTION 1
1.1. Overview 1
1.2. Asteroids and Comets 4
1.3. Introduction to Machine Learning 6

2. LITERATURE SURVEY 9
2.1. Research papers 9
2.2. Bottlenecks of Survey papers 13

3. METHODOLOGY 16
3.1. Dataset 16
3.2. Data Pre-Processing 17
3.3. Proposed System 18
3.4. Software Requirements 23
3.5. Hardware Requirements 23

4. RESULTS & DISCUSSION 24


4.1. Predicting Best Machine 24
Learning Model

5. CONCLUSION 30

APPENDIX A 32
APPENDIX B 50
APPENDIX C 52
REFEREENCES 54

iv
LIST OF FIGURES

Figure No Figure name Page No


1.1 Frequency of NEOs by Size, Impact Energy 5

1.2. Asteroid Orbit Types 6

1.3. Random Forest Classifier 8

3.1. Neo dataset 17

3.2 Feature importance for important parameters 22

3.3 Feature importance for all parameters 23

4.1 Accuracy scores 26

4.2 Precision Score For All Parameters 27

4.3 Scatterplot 28

v
CHAPTER 1

INTRODUCTION

1.1 Introduction and Overview


1.1.1 Purpose
The purpose of this report is to fulfill the requirements outlined in the George E.
Brown Jr. Near-Earth Object Survey Act, which was included in Public Law No:
109-155, known as the 2006 NASA Authorization Act of 2005 (referenced in
Appendix A and Reference [1]). The Act mandates that the Administrator of
NASA initiates a Near-Earth Object (NEO) Survey program with the following
objectives: detection, tracking, cataloging, and characterization of objects larger
than 140 meters in diameter, with a perihelion distance of less than 1.3 AU
(Astronomical Units) from Earth. The program aims to detect these NEOs to
understand their orbital paths and potential impact risks, systematically catalog
them for ongoing monitoring, and characterize their physical properties such as
size, composition, rotational characteristics, and orbital parameters. This
comprehensive approach is designed to provide early warning and inform
mitigation efforts in case any of these detected NEOs pose a hazard to Earth. The
program is mandated to achieve a 90% completion rate within 15 years from the
date of the Act's enactment, underscoring the urgency and long-term commitment
required to address the NEO threat effectively.1

1.1.2. Statement of Need


In the NEO Survey Act, Congress made significant findings regarding Near-Earth
Objects (NEOs) and their potential threat to humanity. Firstly, Congress
acknowledged that NEOs pose a serious and credible threat based on scientific
theories linking a major asteroid or comet impact to the mass extinction event that
wiped out the dinosaurs and many other species approximately 65 million years
ago. This historical context highlights the destructive potential of NEO impacts.

1
Additionally, Congress noted that similar NEOs have impacted Earth or entered
its atmosphere multiple times throughout history, indicating a recurrent threat that
continues to pose risks in the future. The discovery timeline of certain NEOs, often
identified only days before their closest approach to Earth, highlights the
challenges in timely detection and monitoring of these objects.

Despite recent discoveries of large NEOs, Congress expressed concerns about the
significant number of undetected NEOs, emphasizing the ongoing need for
comprehensive and continuous efforts in NEO detection and tracking.

The findings outlined in the NEO Survey Act highlighted the critical importance
of implementing a robust NEO Survey program. Such a program aims to enhance
detection, tracking, cataloging, and characterization efforts, ultimately improving
preparedness, early warning systems, and potential mitigation strategies to
mitigate the potential risks associated with NEO impacts on Earth.

1.1.3. Direction
The NEO Survey Act, as outlined in Appendix A, amended the Space Act of 1958
with specific directives regarding the responsibilities of the National Aeronautics
and Space Administration (NASA) in addressing the threat of near-Earth asteroids
and comets. Congress declared that safeguarding the general welfare and security
of the United States necessitates directing NASA's unique competence towards
detecting, tracking, cataloging, and characterizing these near-Earth objects. The
primary goal is to provide early warning and mitigation strategies for potential
hazards they pose to Earth.

Furthermore, the Act mandates the Administrator of NASA to plan, develop, and

2
implement a comprehensive Near-Earth Object Survey program. This program
focuses on detecting, tracking, cataloging, and characterizing the physical
properties of near-Earth objects that are equal to or greater than 140 meters in
diameter. The objective is to assess the threat posed by such objects to Earth's
safety. A key target of the Survey program is to achieve a 90 percent completion
rate of its near-Earth object catalogue, based on statistically predicted populations
of these objects, within 15 years from the date of the Act's enactment.

The findings outlined in the NEO Survey Act highlighted the critical importance
of implementing a robust NEO Survey program. Such a program aims to enhance
detection, tracking, cataloging, and characterization efforts, ultimately improving
preparedness, early warning systems, and potential mitigation strategies to
mitigate the potential risks associated with NEO impacts on Earth.

Additionally, the Congress has directed the NASA Administrator to deliver


specific outcomes within one year from the law's enactment:

1. Conduct an analysis of possible alternatives that NASA may employ to carry


out the Survey program. This includes exploring ground-based and space-based
alternatives, accompanied by detailed technical descriptions.
2. Provide a recommended option along with a proposed budget to execute the
Survey program in accordance with the recommended approach.
3. Perform an analysis of potential alternatives that NASA could utilize to divert
an object that is on a likely collision course with Earth, thereby mitigating the
potential impact threat.

3
1.2. Asteroids and Comets
Asteroids and comets are the primary focus of discussion in this study due to their
potential hazard to Earth. Astronomers differentiate between these celestial bodies
based on their appearance. Asteroids typically appear as star-like points of light
when observed, while comets exhibit a diffuse appearance with visible tails.

The existence of asteroids was not known until roughly 200 years ago when
telescopes advanced enough to detect these small celestial objects. On the other
hand, people have been aware of comets since ancient times due to their distinctive
tails visible to the naked eye.

The differences in appearance also reflect differences in composition. Asteroids


are generally rocky or metallic objects lacking atmospheres. In contrast, comets
contain volatile materials such as water ice that vaporize when exposed to heat.
Consequently, comets that are farther from the Sun or have lost most of their
volatiles may resemble asteroids, while those rich in volatiles develop an
atmosphere only upon close approach to the Sun.

Asteroids are further categorized into different groups based on their orbits
relative to Earth:
- Apollos: Orbits cross Earth's orbit with a period exceeding 1 year.
- Atens: Orbits cross Earth's orbit with a period less than 1 year.
- Amors: Orbits entirely outside Earth's orbit.
- Interior Earth Objects (IEOs): Orbits entirely within Earth's orbit.

The distribution of these asteroid categories within the Near-Earth Object (NEO)
population is illustrated in Figure 1, providing insight into the varying orbits and
potential impact risks associated with different types

4
Figure 1.1. Frequency of NEOs by Size, Impact Energy, and Magnitude

1.2.1. Population of Near-Earth Objects

The analysis presented in Figure 2 utilizes a constant power law to estimate the
number of Near-Earth Objects (NEOs) based on their size. This power law
approximation demonstrates a significant increase in the number of NEOs as the
diameter of the objects decreases by an order of magnitude. Specifically, there is
a hundred-fold increase in the number of NEOs as the diameter decreases.

Figure 2 also provides additional information such as the approximate absolute


magnitude (brightness) of the NEOs, their average impact interval, and the
estimated impact energy they would deliver in the event of a collision with Earth.
These factors are crucial in understanding the potential threat posed by NEOs of
varying sizes.

It's important to note that while the estimates presented in Figure 2 are valuable,
they may have a margin of error within a factor of two or three in any given size
class. This margin of error is due to limitations in observations, particularly in size
classes where there are not enough data points to form a statistically valid sample.

5
In this report, the term PHO (Potentially Hazardous Objects) is used to indicate
potential threats among NEOs. It is noted that NEOs smaller than 1 km are
predominantly asteroids, as comets do not significantly contribute to the
population below this size threshold. This distinction helps in categorizing and
understanding the potential hazard levels associated with different sizes of NEOs,
aiding in risk assessment and mitigation planning.

Figure 1.2. Asteroid Orbit Types

1.3 Introduction to Machine Learning :


Supervised machine learning algorithms are foundational tools in predictive
modeling, categorizing data, and solving classification or regression problems.
These algorithms are trained using labeled datasets to learn patterns and
relationships within the data, enabling them to make accurate predictions on unseen
or unlabeled data. One of the key distinctions of supervised learning is the
availability of labeled data for training, which guides the algorithm in
understanding the desired output.

6
1.3.1 Logistic regression (LR)

Logistic regression (LR) is a widely-used supervised classification method that


predicts the probability of an instance belonging to a certain class. It's especially
useful for binary classification tasks where the outcome is dichotomous, such as
'yes' or 'no,' 'positive' or 'negative.' LR models the relationship between input
features and the probability of the target class, making it a powerful tool for
probability-based classification.

1.3.2 Support Vector Machine (SVM)

Support Vector Machine (SVM) is another supervised learning algorithm capable


of classifying both linear and non-linear data. SVM works by mapping data points
into a high-dimensional feature space and finding the hyperplane that best separates
different classes while maximizing the margin between them. This margin
maximization helps SVM achieve good generalization performance and handle
complex classification tasks.

1.3.3 Decision Tree (DT)

Decision Tree (DT) algorithms construct tree-like structures where internal nodes
represent feature tests, and leaf nodes represent class labels. DTs are intuitive and
easy to interpret, making them valuable for decision-making processes. They
recursively partition the data based on feature splits, leading to hierarchical
decision rules that classify instances into distinct classes.

1.3.4 Random Forest (RF)

Random Forest (RF) is an ensemble method that combines multiple decision trees
to improve predictive accuracy and reduce overfitting. Each tree in the forest is

7
trained on a random subset of the training data, and the final prediction is made by
aggregating the predictions of individual trees. RF mitigates the high variance
associated with deep decision trees, making it robust and suitable for various
classification tasks.

Figure 1.3 Random Forest Classifier

1.3.5 K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a straightforward classification algorithm that


classifies instances based on their similarity to neighboring data points. KNN
assigns the majority class among the K nearest neighbors of an instance, making it
sensitive to local patterns and suitable for simple classification tasks.

1.3.6 Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) mimic the interconnected structure of


biological neurons in the brain. ANNs consist of layers of nodes connected by
weighted edges, where each node performs a transformation on incoming data.
Deep learning, a subset of ANNs, involves networks with multiple hidden layers,
enabling them to learn complex hierarchical representations from data. ANNs are
powerful for tasks like image recognition, natural language processing, and pattern
recognition, leveraging their ability to learn intricate patterns and relationships.

8
CHAPTER 2

LITERATURE SURVEY

2.1 Research papers

1. G. Deva ram ganesh, P. Vidyullatha, M. Ravi krishna, S. T. Prapulla, A.


Pavan Saran and P. Ramya, "Machine Vision based Object Detection using
Deep Learning Techniques," 2023 3rd International Conference on Smart
Data Intelligence (ICSMDI), Trichy, India, 2023.
His team employed Convolutional Neural Networks (CNNs) to
categorize hyperspectral images. Unlike traditional methods that rely solely on
spectral features, CNNs consider both spectral and spatial features, leading to
enhanced classification accuracy. The CNN architecture strategically processes
information within 13x13 patches, allowing it to capture intricate patterns and
relationships within the data. Although CNNs require more computational
resources compared to other classifiers, they outperform them significantly,
achieving the best hyperspectral image classification accuracy based on simulation
results .
2. T. Hoffman et al., "Near-Earth Object Surveyor Project Preliminary
Design," 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 2023
In the preliminary design phase of the NEOS Project,T. Hoffman and
collaborators have made substantial progress towards the project's objectives,
which include identifying, characterizing, and cataloging Near-Earth Objects
(NEOs). The project, involving collaborations with the University of Arizona,
NASA's Jet Propulsion Lab, and industry partners, has advanced critical
components and spacecraft design. This progress positions the NEOS Project well
for the upcoming detailed design phase, aligning closely with NASA's planetary
defense goals.

9
3. H. Lin, C. Sun and Y. Liu, "OBBStacking: An Ensemble Method for
Remote Sensing Object Detection," in IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing
Introduction an innovative ensemble method called OBBStacking,
designed for geo-spatial sensing object identification using oriented bounding
boxes (OBBs). OBBStacking effectively combines OBB-based detection results
and leverages confidence indicators from deep learning object detectors. In
experiments conducted in 2021, OBBStacking outperformed the Gaofen Challenge
in mechanized Earth Observation Image Interpretation with High-Resolution,
showcasing enhanced performance on datasets like DOTA and FAIR1M.
4. D. Khajuria, A. Sharma, N. Sharma and M. Mangla, "Classification and
Comparative Analysis of Earth's Nearest Objects using Machine Learning
Models," 2023 10th International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, India, 2023
His research he uses machine learning to classify asteroids (from the NASA
Nearest Earth Objects dataset) as hazardous or non-hazardous, aiming to unveil
space mysteries. Visualization aids data analysis, culminating in an efficient
prediction model evaluation based on accuracy, showcasing machine learning's
efficacy in tackling astronomical challenges.
5. L. Z. Ladeira, F. Santos, L. Cléopas, P. Buteneers and L. Villas, "NEO-
NDA: Neo Natural Language Data Augmentation," 2022 IEEE 16th
International Conference on Semantic Computing (ICSC), Laguna Hills, CA,
USA, 2022
Introduced NEO-NDA (NEO Natural Language Data Augmentation), a
versatile tool for generating and rebalancing textual datasets. Unlike many existing
approaches, NEO-NDA operates across multiple languages, offering a wide range
of modifications to generate novel data samples. Their system significantly

10
enhances the performance of machine learning models, often doubling their
effectiveness compared to original datasets without augmentation.
6. R. N. Ranaweera and T. Fernando, "Prediction of Potentially Hazardous
Asteroids using Deep Learning," 2022 2nd International Conference on
Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, 2022,
They utilized deep neural networks to analyze the orbital data of Near-Earth
Asteroids (NEAs). By allowing the model to learn intricate patterns and features
within the dataset, the approach efficiently classifies objects as harmful or harmless
based on acquired insights. This enhances the efficiency of identifying and
characterizing celestial bodies, contributing to efforts aimed at preventing potential
impacts on Earth.
7. R. J. Wainscoat, K. Chambers, R. Weryk, Y. Ramanjooloo, M. Huber
and E. Magnier, "Recent Advances in the Pan-STARRS Search for Near-
Earth Objects," 2022 IEEE Aerospace Conference (AERO), Big Sky, MT,
USA, 2022,
They introduced the Pan-STARRS telescopes on Maui search for near-
Earth objects, with Pan-STARRS2 overcoming early technical challenges like
mirror coating and mount issues. With improved reporting latency and same-night
follow-up, it aids in discovering interstellar objects like 'Oumuamua and is poised
as a potent tool for identifying impacting asteroids.
8. T. Hromakina et al., "NEOROCKS project: surface properties of small
near-Earth asteroids," in Monthly Notices of the Royal Astronomical
Society, vol. 520, no. 2, pp. 3143-3150, Sept. 2022,
They explained about the NEOROCKS project at Haute-Provence
Observatory classified 42 NEOs, adding to a dataset of 93 objects, identifying
potential spacecraft-accessible targets like (163014) 2001 UA5 and 2017 SE19 for
planetary defense research and missions.

11
9. C. Rosu and V. Bacu, "Asteroid Image Classification Using Convolutional
Neural Networks," 2021 IEEE 17th International Conference on Intelligent
Computer Communication and Processing (ICCP), Cluj-Napoca, Romania,
2021,
Introduction of a convolutional neural network (CNN) model designed
specifically for classifying astronomical images to detect asteroids. Trained on
original data from the Telescope of Newton in La Palma, the system aims to
develop a mechanized system for accurate asteroid detection while minimizing
false negatives. Through training with solitary images, the model achieved a
remarkable 94.01% recall rate, showcasing its potential for precise asteroid
detection
10. V. Bahel, P. Bhongade, J. Sharma, S. Shukla and M. Gaikwad,
"Supervised Classification for Analysis and Detection of Potentially
Hazardous Asteroid," 2021 International Conference on Computational
Intelligence and Computing Applications (ICCICA), Nagpur, India, 2021.
Emphasized the growing importance of artificial intelligence (AI) in
space science, particularly for mitigating asteroid-related risks. Their study
involved a meticulous understanding of Potentially Harmful Asteroids (PHAs) and
employed supervised machine learning (ML) to assess asteroid hazard based on
specific parameters. Among various classification algorithms, Random Forest
stood out with an average F1-score of 99.22% and an exceptional 99.9% accuracy,
offering a robust solution for asteroid risk evaluation.
11. M. Veto et al., "Ball Aerospace Spacecraft and Systems for the Near-
Earth Object Surveyor Mission," 2021 IEEE Aerospace Conference (50100),
Big Sky, MT, USA, 2021.
Developed by Ball Aerospace for NASA's NEOSM, aims to detect >140-
meter Near-Earth Objects, enhancing planetary defense and informing policy with
a comprehensive survey leveraging Ball's spacecraft and infrared expertise from

12
NEOWISE. This collaborative effort combines scientific, engineering, and
institutional heritage to advance understanding of NEOs' physical properties and
their origins in the early solar system.
12. L. Gentile, A. Capria, M. Conti, C. Moscardini and M. Martorella,
"Resident Space Object Passive Bistatic Radar Detection Using DVB-S2
Signals," 2018 19th International Radar Symposium (IRS), Bonn, Germany,
2018,
This work explores the feasibility of a DVB-S2 passive multi-bistatic
radar for tracking Resident Space Objects (RSOs), contrasting its benefits with
active ground-based systems. Unlike optical systems, this radar operates day and
night, regardless of weather conditions, offering a robust monitoring solution for
Earth's orbit.[12]
13. R. A. Mohammed and M. Y. Alradi, "Model of planetary radar for
tracking near-Earth asteroids," 2017 Fifth International Conference on
Aerospace Science & Engineering (ICASE), Islamabad, Pakistan, 2017.
Discussed the increasing hazards posed by asteroids in the asteroid belt
located between Mars and Jupiter, within a range of 2.2 to 3.2 astronomical units
(AU). The study highlighted the use of planetary radars for tracking celestial
objects and safeguarding Earth. It detailed a simulation-based approach that
involved analyzing radar cross-sectional data and modeling signal processing using
MATLAB and Simulink. This approach allowed for the examination of moving
asteroids' behavior and contributed to strategies aimed at mitigating asteroid threats
.

2.2 Bottlenecks of Survey papers:


The literature survey reveals several bottlenecks in current methodologies for
analyzing celestial objects. Firstly, the use of Convolutional Neural Networks
(CNNs) for hyperspectral image classification, as demonstrated by G. Deva Ram

13
Ganesh et al., presents a challenge due to its high computational resource
requirements. This limitation could hinder real-time applications or deployment in
resource-constrained environments, impacting the practicality and scalability of
the approach. Similarly, T. Hoffman et al.'s NEOS Project, while making
significant progress, may
encounter challenges related to the detailed design phase. The lack of specific
details about algorithms or methodologies used could impede a comprehensive
assessment of technical feasibility, potentially affecting the project's effectiveness
and future scalability.
Furthermore, the effectiveness of data augmentation techniques like NEO-NDA by
Z. Ladeira et al. may be influenced by their ability to handle extremely imbalanced
datasets or rare classes. Generalizability across different scenarios and datasets is
crucial for ensuring the reliability and applicability of these techniques in diverse
contexts.
Additionally, while CNN models like the one introduced by C. Rosu et al. for
asteroid detection showcase impressive recall rates, their generalization to diverse
datasets and environmental conditions may require further validation and
optimization for real-world deployment.Moreover, advancements in geo-spatial
sensing object identification, as proposed by H. Lin's OBBStacking method, may
vary in performance based on the complexity and variability of the data. Thorough
evaluation and benchmarking against diverse datasets are necessary to assess the
method's reliability and effectiveness across different scenarios. Lastly,
simulation-based approaches, as discussed in R. A. Mohammed's work on
planetary radars for asteroid tracking, may face challenges in accurately predicting
real-world scenarios, especially concerning unexpected asteroid behaviors or
events.
In conclusion, our proposed Advanced Analysis and Classification of Near-Earth
Objects (NEOs) project aims to overcome these identified bottlenecks effectively.

14
By leveraging a comprehensive approach that includes data quality assurance,
robust preprocessing techniques, and advanced machine learning algorithms such
as Random Forest, Neural Networks, and Gradient Boosting, we enhance
predictive capabilities while ensuring scalability, generalizability, and
interpretability. The project's focus on real-world applications, planetary defense
implications, and international collaboration further strengthens its efficacy in
advancing space science and AI for NEO analysis and classification.

15
CHAPTER 3
METHODOLOGY
3.1. Dataset
NASA's Near-Earth Object (NEO) Program manages a vast and essential dataset
crucial for understanding asteroids and comets that come close to Earth. This
dataset is multifaceted, comprising critical components like the NEO Confirmation
Page and the NEO Earth Close-Approaches Database.

The NEO Confirmation Page is a real-time resource that provides detailed


information on confirmed NEOs, including discovery dates, orbits, sizes, and Earth
impact probabilities. This resource aids in tracking newly discovered NEOs and
understanding their characteristics.

The NEO Earth Close-Approaches Database documents instances when NEOs


come close to Earth, providing data on close approach dates, distances, velocities,
and physical properties. This database helps assess potential impact risks and plan
observation and mitigation strategies.

NASA collaborates globally with observatories, research institutions, and


international partners to enhance data quality and enrich the overall NEO dataset.
This collaboration contributes valuable observational data, aiding impact risk
assessments conducted by NASA's Center for Near-Earth Object Studies
(CNEOS).

In summary, NASA's NEO dataset is a vital tool for studying, monitoring, and
mitigating potential hazards from NEOs, ensuring Earth's safety from potential
impact events.

16
Figure 3.1. Neo dataset

3.2 Data preprocessing

Data preprocessing for NEO (Near-Earth Object) analysis involves several steps to
clean, transform, and prepare the data for analysis in Excel. Here's a structured
paragraph outlining the process:

The data preprocessing of NEO information in Excel begins with data collection
from reliable sources such as NASA's NEO Program database or other
astronomical repositories. The collected data typically includes attributes like NEO
designation, discovery date, observation arc, orbital parameters (e.g., semi-major
axis, eccentricity, inclination), estimated diameter, and potential impact
probabilities.

Once the data is imported into Excel, the first step is data cleaning, which involves
identifying and handling missing values, correcting errors, and removing
duplicates if any. This ensures the dataset's integrity and accuracy for further
analysis.

17
The next step is data transformation, where certain columns may need to be
formatted or calculated to derive new variables or metrics relevant to NEO
analysis. For example, converting dates to a standardized format, calculating
orbital velocities or impact probabilities based on provided data, or categorizing
NEOs based on size ranges.

Data normalization or standardization may also be performed to bring consistency


to the dataset, especially if the data comes from different sources with varying
formats or units. Finally, data validation and verification are crucial steps to ensure
that the preprocessed data is accurate, consistent, and ready for analysis. This
involves checking for logical inconsistencies, cross-referencing with external
sources if necessary, and validating calculated values or derived metrics.

Overall, data preprocessing in Excel for NEO analysis involves a systematic


approach to clean, transform, standardize, filter, and validate the data, ensuring its
quality and usability for subsequent analytical tasks and insights.

3.3 Proposed System

The proposed system aims to enhance the accuracy and efficiency of Near-Earth
Object (NEO) classification using advanced machine learning techniques. By
leveraging state-of-the-art algorithms, including Random Forest, Neural Networks
& Gradient Boosting, the system aims to improve the predictive capabilities for
identifying Potentially Harmful Asteroids among NEOs. The system's
methodology involves feature extraction, preprocessing, and training the machine
learning models on a comprehensive dataset containing key NEO attributes. This
approach enables the development of a robust classification system capable of
categorizing NEOs based on various physical and orbital parameters. To address

18
challenges such as missing data and categorical variables, the system employs data
preprocessing techniques, ensuring the quality and uhcompleteness of the input
dataset. The utilization of machine learning algorithms facilitates more accurate
predictions, contributing to the early identification of PHAs and enhancing
planetary defense capabilities. The proposed system not only advances the
accuracy of NEO classification but also provides a scalable framework for ongoing
research and monitoring of celestial objects. By combining astronomical data with
cutting-edge machine learning methodologies, the system stands to significantly
impact our understanding of NEOs and improve our ability to assess potential
threats from these cosmic bodies.
The Advanced Analysis and Classification of Near-Earth Objects (NEOs) project
is a multidisciplinary endeavor aimed at enhancing our understanding of NEOs,
which are comets & asteroids that come in close proximity to the Earth. The
primary objectives of this project are to:

1. Data Collection and Preparation: The project begins with gathering and
processing a comprehensive dataset of NEOs. This dataset contains crucial
information about these celestial objects, including their names, sizes, albedo,
orbital parameters, and potential threat levels (Potentially Hazardous Asteroids -
PHAs).

2. Exploratory Data Analysis (EDA): To acquire deeper understanding of the


dataset, exploratory data analysis is conducted. EDA involves data visualization
and statistical techniques to identify patterns and correlations among NEO
attributes. This step helps in better understanding the characteristics of NEOs.

3. Classification and Risk Assessment: The project involves utilizing machine


learning algorithms to classify NEOs into different categories based on their

19
characteristics. Particular attention is given to identifying PHAs that pose a
potential threat to Earth. This risk assessment is a crucial component of planetary
defense efforts.

4. Deep Learning and Machine Learning Models: The project explores the use of
deep learning and ML models to make predictions and classifications based on the
dataset. Random Forest, Artificial Neural Networks (ANNs) & Gradient Boosting
are employed to enhance the accuracy of risk assessment & classification.

5. Planetary Defense and Space Exploration Implications: Understanding NEOs


and identifying potential threats is not only vital for planetary defense but also for
advancing space exploration. By gaining a better understanding of these celestial
bodies, this project contributes to humanity's knowledge of our solar system and
our ability to prepare for and mitigate potential hazards.

6. Safety and Education: The project has implications for safety and education. By
sharing the results and insights with the public, educational institutions, and space
agencies, this initiative contributes to raising awareness about NEOs and the
importance of planetary defense.

7. International Collaboration: Given the global nature of the NEO threat, this
project may involve international collaboration, as organizations like NASA, ESA,
and other space agencies actively monitor NEOs.

Pseudo code:

# Input: NEO Dataset


# Output: Prediction of Potentially Hazardous Asteroids

20
Step 1: Load Dataset
neo_data = load_neo_dataset()

Step 2: Data Preprocessing


neo_processed = preprocess_data(neo_data)

Step 3: Split Dataset


train_set, test_set = split_dataset(neo_processed)

Step 4: Random Forest Classifier


rf_model = initialize_random_forest_classifier()
rf_train_acc, rf_test_acc = train_and_evaluate(rf_model, train_set, test_set)

Step 5: Gradient Boosting Classifier


gb_model = initialize_gradient_boosting_classifier()
gb_train_acc, gb_test_acc = train_and_evaluate(gb_model, train_set, test_set)

Step 6: Neural Network Model


nn_model = initialize_neural_network_model()
nn_train_acc, nn_test_acc = train_and_evaluate(nn_model, train_set, test_set)

Step 7: Display Results


display_results(rf_train_acc, rf_test_acc, gb_train_acc, gb_test_acc, nn_train_acc,
nn_test_acc)

# Input: NEO Dataset with Hazard Information


# Output: Visualization of Potentially Hazardous Asteroids

21
Step 1: Load Dataset with Hazard Information
neo_hazard_data = load_neo_dataset_with_hazard_info()

Step 2: Filter Potentially Hazardous Asteroids


pha_data = filter_potentially_hazardous_asteroids(neo_hazard_data)

Step 3: Visualize Potentially Hazardous Asteroids


plot_potentially_hazardous_asteroids(pha_data).

Figure 3.2 feature importance for important parameters

The above bar chart 3.1 shows the feature importance for important parameters
such as a-Semi-major axis,
q-Perihelion distance, i- Inclination (deg), e-Orbital eccentricity, diameter-
Diameter of the object (km) , tp- Time of perihelion passage, & albedo-Albedo
(reflectivity).

22
Figure 3.3 feature importance for all parameters

The bar charts 3.2 shows the feature importance of all the parameters such as a-
Semi-major axis , q- Perihelion distance, e- Orbital eccentricity, i- Inclination
(degrees), om- Longitude of the ascending node (degrees), w- Argument of
perihelion (degrees), ma- Mean anomaly (degrees), ad:- aphelion distance, n-Mean
motion (degrees per day), tp- Time of perihelion passage.

3.4 Software Requirements


1) OS : Windows 10
2) Language: Python
3) IDE : Visual Studio, Kaggle Notebook

3.5 Hardware Requirements


1) RAM : At least 4 GB.
2) Storage : At least 1 GB.

23
CHAPTER 4

RESULTS & DISCUSSIONS

4.1 Predicting Best Model:


In predictive modeling, determining the most effective machine learning algorithm
is crucial for achieving accurate and reliable results. One common approach to
evaluating model performance is by assessing their accuracy scores across various
classifiers. Accuracy scores provide insights into how well a model performs in
correctly predicting outcomes compared to the actual data.

Accuracy Score:
Accuracy is a measure of how often a classification model correctly predicts the
outcome. It is calculated as the ratio of the number of correct predictions to the
total number of predictions.
Formula: Accuracy = (Number of Correct Predictions) / (Total Number of
Predictions) * 100%

A higher accuracy score indicates that the model is making more correct
predictions, but it may not provide a complete picture if the dataset is imbalanced
(i.e., one class dominates the dataset).

Precision:
Precision is a measure of the model's ability to correctly identify positive instances
(True Positives) out of all instances predicted as positive (True Positives + False
Positives).
Formula: Precision = True Positives / (True Positives + False Positives)

Precision is essential in scenarios where the cost of false positives is high. A higher
precision indicates fewer false positives, but it may lead to more false negatives if

24
the model becomes too conservative.

Recall (Sensitivity):
Recall, also known as Sensitivity or True Positive Rate, measures the model's
ability to correctly identify positive instances out of all actual positive instances
(True Positives + False Negatives).
Formula: Recall = True Positives / (True Positives + False Negatives)

Recall is crucial when the cost of false negatives is high. A higher recall indicates
fewer false negatives, but it may lead to more false positives if the model becomes
too aggressive in predicting positives.

F1 Score:
The F1 score is the harmonic mean of precision and recall, providing a balance
between the two metrics. It is useful when there is an uneven class distribution or
when both false positives and false negatives are equally important.
Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

The F1 score ranges from 0 to 1, with 1 being the best possible score. It considers
both false positives and false negatives, making it a comprehensive metric for
model evaluation.
These evaluation metrics play a crucial role in assessing the performance of
classification models and determining which model is best suited for a specific task
based on its ability to balance accuracy, precision, recall, and F1 score.

25
Figure 4.1 Accuracy scores

The table 3.1 represents the accuracy of gradient boosting classifier, ada boost
classifier, Random forest classifier, decision tree classifier , logistic Regression, K
Neighbors classifier & SVC. Among this it is clearly concluded that Random forest
classifier has higher accuracy than other algorithms

Through a meticulous evaluation process encompassing crucial metrics like


accuracy, precision, recall, and F1 score, we meticulously compared multiple
machine learning models. This rigorous analysis unequivocally revealed Random
Forest as the standout performer among the classifiers assessed. Notably, Random
Forest consistently showcased higher accuracy, demonstrating its adeptness at
making accurate predictions across the dataset. Moreover, its superior precision
and recall rates underscored its proficiency in correctly identifying positive
instances while minimizing false positives and false negatives. The balanced F1
score further reinforced Random Forest's efficacy, indicating its ability to manage
both types of errors adeptly.

26
Figure 4.2 Precision score for all parameters

The bar chart 3.3 shows the Precision score for all parameters, from that we observe
that random forest classifier has the highest precision score, so we do the analysis
by it.

27
Figure 4.3 Scatterplot

The figure 3.4 shows the representation of the Near-Earth object as non-hazardous
or hazardous using a scatterplot.

The NEO (Near-Earth Object) project has yielded important results and insights
into the characteristics and dynamics of these celestial bodies. Here's a summary
of the project's key findings and a discussion of their implications:

Potentially Hazardous Asteroids (PHAs):


- The project identified and characterized PHAs among NEOs. These are objects
with the potential to impact Earth. The data collected is crucial for assessing and
mitigating potential threats.

Orbital and Physical Characteristics:


- Detailed information about NEOs' orbital elements, such as perihelion
distance, semi-major axis, eccentricity and physical properties like diameter &
albedo, has been compiled. These parameters provide valuable insights into their
trajectories and behavior.

Epoch Data:
- The inclusion of epoch data allows researchers to track the positions of NEOs
at specific times, which is essential for predicting their future positions and
potential close approaches to Earth.
- The identification of PHAs is crucial for early warning systems and planetary

28
defense strategies. Knowing which NEOs pose a threat helps in devising strategies
to mitigate potential impacts.
- Epoch data enhances the precision of NEO tracking, enabling better
predictions for astronomical observations and potential space missions.

The NEO project is pivotal in enhancing our understanding of Near-Earth Objects


(NEOs) and their implications for Earth's safety. By focusing on data accuracy,
classification improvements, and advanced prediction models, we aim to refine
our preparedness for potential NEO encounters. Collaborative efforts with
international partners, increased public awareness, and space missions dedicated
to NEO study are integral components of this ongoing initiative. Through these
concerted actions, we have made significant strides in classifying NEOs,
identifying hazardous asteroids, and gathering crucial data for improved tracking
and forecasting of NEO movements.

29
CHAPTER 5

CONCLUSION AND FUTURE WORK

The NEO (Near-Earth Object) project represents a significant milestone in our


efforts to understand and mitigate potential threats from asteroids and comets that
come close to Earth. Through meticulous classification efforts, the project has
successfully categorized NEOs and pinpointed potentially hazardous asteroids
among them. This classification is based on a range of parameters, including size,
composition, and trajectory, allowing us to identify objects that could pose a risk
to Earth.

One of the key contributions of the NEO project is the comprehensive data it
provides on the orbital and physical characteristics of NEOs. This data is crucial
for enhancing our preparedness and response capabilities in the event of a NEO
encounter. By understanding the precise movements and trajectories of these
objects, we can improve our early warning systems and take proactive measures to
mitigate potential impacts.

The inclusion of epoch data in the NEO project has significantly refined our ability
to predict NEO movements. Epoch data refers to the precise timestamps associated
with observations, allowing for more accurate predictions of an object's future path.
This enhancement is vital for improving our forecasting capabilities and providing
timely alerts about NEOs that may come close to Earth.

Moreover, the NEO project focuses on gathering more detailed and accurate data
on NEOs beyond their basic classification. This includes gathering information
about their size, shape, surface properties, and other relevant characteristics. Such
detailed data is instrumental in developing advanced prediction models that further
improve tracking and forecasting accuracy.

30
Collaboration is a cornerstone of the NEO project, as it actively engages with
international space agencies and observatories. This collaboration ensures a
broader dataset, combining observations and insights from various sources
worldwide. By pooling resources and expertise, the project can access a wealth of
information, contributing to a more comprehensive understanding of NEOs.

Lastly, the NEO project advocates for funding space missions dedicated to
studying NEOs up close. These
missions provide invaluable insights into the composition, behavior, and potential
threats posed by NEOs. By supporting such missions, the project aims to deepen
our understanding of these objects and strengthen our ability to safeguard Earth
from potential impacts.

31
APPENDIX A
CODE

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style = "ticks", color_codes=True)

ad = pd.read_csv('sample_data/dataset.csv')
print(ad.shape)

ad.head()

"""# Exploratory Data Analysis / EDA"""

def extract_neo(df):
neo = df.loc[df.neo == 'Y',:]
neo.to_csv('neo.csv')
return neo

def extract_pha(df):
pha = df.loc[df.pha == 'Y',:]
pha.to_csv('pha.csv')
return pha

def extract_neo_pha(df):

32
neo_pha = df.loc[df.pha == 'Y' & df.neo == 'Y',:]
neo_pha.to_csv('neo_pha.csv')
return neo_pha

def plot_scatter(df, params):


for i in params:
g = sns.FacetGrid(df, col = "class", hue = "pha")
g.map(sns.scatterplot, "moid", i, alpha = .7)
h = sns.FacetGrid(df, col = "class", hue = "pha")
h.map(sns.scatterplot, "H", i, alpha = .7)
g.add_legend()
h.add_legend()
plt.show()

neo = extract_neo(ad)
neo.head()

pha = extract_pha(ad)
pha.head()

print(neo.shape)
cols = list(neo.columns)
print(cols)

neo.describe()

new_ad = ad[ad['pha'].notna()]

33
new_ad = new_ad[new_ad['neo'].notna()]

sns.catplot(x = "neo", y = "pha", data = new_ad)


plt.show()

param_imp = ['diameter', 'albedo', 'e', 'a', 'q', 'i', 'tp']

neo[param_imp].info()

plot_scatter(neo, param_imp)

neo[param_imp].describe()

param_oth = ['om', 'w', 'ma', 'ad', 'n', 'rms']

neo[param_oth].info()

neo[param_oth].describe()

plot_scatter(neo, param_oth)

! pip3 install pyforest

import pyforest
import warnings
warnings.filterwarnings("ignore")
from sklearn import metrics

34
from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

def convert_to_bin(df, column):


df[column] = df[column].map({'Y': int(1), 'N': int(0), 'NaN': int(0)})
return df

neo_new = convert_to_bin(neo, 'pha')


neo_final = convert_to_bin(neo_new, 'neo')

params_imp = ['neo'] + ['pha'] + param_imp


params_all = params_imp + param_oth

neo_params_imp = neo_final[params_imp]
neo_params_imp

neo_params_all = neo_final[params_all]
neo_params_all

neo_params_imp.isnull().sum()

def fill_nan(df):
for column in df.columns:
if column == 'pha' or column == 'neo':
df[column].fillna(value = int(0), inplace=True)
else:

35
df[column].fillna(int(df[column].mean()), inplace=True)
return df

neo_params_imp = fill_nan(neo_params_imp)
neo_params_all = fill_nan(neo_params_all)

neo_params_all

neo_params_imp

plt.figure(figsize = (20,20))
sns.heatmap(data = round(neo_params_imp.corr(), 2), annot = True)
plt.show()

plt.figure(figsize = (20,20))
sns.heatmap(data = round(neo_params_all.corr(), 2), annot = True)
plt.show()

"""# Model Selection and Analysis"""

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression


from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

36
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,
GradientBoostingClassifier

def model_fit_score(models, df):


X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
np.random.seed(42)
model_scores = {}
for name, model in models.items():
model.fit(X_train, y_train)
model_scores[name] = model.score(X_test, y_test)
model_scores = pd.DataFrame(model_scores, index = ['Score']).transpose()
model_scores = model_scores.sort_values('Score')
return model_scores

models = {'LogisticRegression': LogisticRegression(max_iter = 10000),


'KNeighborsClassifier': KNeighborsClassifier(),
'SVC': SVC(),
'DecisionTreeClassifier': DecisionTreeClassifier(),
'RandomForestClassifier': RandomForestClassifier(),
'AdaBoostClassifier': AdaBoostClassifier(),
'GradientBoostingClassifier': GradientBoostingClassifier()}

37
model_scores_params_imp = model_fit_score(models, neo_params_imp)
model_scores_params_imp.sort_values('Score', ascending = False)

plt.figure(figsize = (20,10))
sns.barplot(data = model_scores_params_imp.sort_values('Score').T)
plt.title('Model Precision Score For Important Parameters')
plt.xticks(rotation = 90);
plt.show()

model_scores_params_all = model_fit_score(models, neo_params_all)


model_scores_params_all.sort_values('Score', ascending = False)

plt.figure(figsize = (20,10))
sns.barplot(data = model_scores_params_all.sort_values('Score').T)
plt.title('Model Precision Score For All Parameters')
plt.xticks(rotation = 90);
plt.show()

"""## Random Forest Classifier Model"""

from sklearn.metrics import accuracy_score, confusion_matrix,


classification_report

def rfc(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)

38
clf = RandomForestClassifier(n_estimators = 10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

"""### Results of RFC for Important Parameters"""

result_params_imp = rfc(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

"""### Results of RFC for All Parameters"""

result_params_all = rfc(neo_params_all)
print("Accuracy: ", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""## Feature Importance Analysis"""

from sklearn.ensemble import RandomForestClassifier


import numpy as np

39
import matplotlib.pyplot as plt

def fi_plot(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_
sorted_indices = np.argsort(importances)[::-1]
plt.title('Feature Importance')
plt.bar(range(X_train.shape[1]), importances[sorted_indices], align = 'center')
plt.xticks(range(X_train.shape[1]), X_train.columns[sorted_indices], rotation =
90)
plt.tight_layout()
plt.show()

"""### Feature importance for Important Parameters"""

fi_plot(neo_params_imp)

"""### Feature importance for All Parameters"""

fi_plot(neo_params_all)

"""## Gradient Boosting Classifier Model"""

40
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import KFold
from sklearn.ensemble import GradientBoostingClassifier

lr_list = [0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1]


X = neo_params_all.drop('pha', axis = 1)
y = neo_params_all['pha']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2,
random_state = 42)
for learning_rate in lr_list:
gb_clf = GradientBoostingClassifier(n_estimators = 20, learning_rate =
learning_rate, max_features = 2, max_depth = 2, random_state = 0)
gb_clf.fit(X_train, y_train)
print("Learning rate: ", learning_rate)
print("Accuracy score (training): {0:.3f}".format(gb_clf.score(X_train,
y_train)))
print("Accuracy score (validation): {0:.3f}".format(gb_clf.score(X_val,
y_val)))

def gbc(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = GradientBoostingClassifier(n_estimators = 20, learning_rate = 0.5,
max_features = 2, max_depth = 2, random_state = 0)
classifier.fit(X_train, y_train)

41
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

"""### Results of GBC for Important Parameters"""

result_params_imp = gbc(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

result_params_all = gbc(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""## SVC Model"""

import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score,
classification_report

42
def svc_lin(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

def svc_rbf(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = SVC(kernel='rbf', random_state = 0, gamma = .01, C = 1)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

43
"""### Results of SVC for Important Parameters"""

result_params_imp = svc_lin(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

result_params_imp = svc_rbf(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

"""### Results of SVC for All Parameters"""

result_params_all = svc_lin(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

result_params_all = svc_rbf(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""## Logistic Regression Model"""

from sklearn.linear_model import LogisticRegression

44
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

def logrm(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
acc = logreg.score(X_test, y_test)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

"""### Results of LRM for Important Parameters"""

result_params_imp = logrm(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

"""### Results of LRM for All Parameters"""

45
result_params_all = logrm(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""# ANN"""

from sklearn.neural_network import MLPRegressor


from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix

"""## Running for Important Parameters"""

def set_up_for_ann(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=25, batch_size=16)
_, accuracy = model.evaluate(X_test, y_test)
print()

46
print('Accuracy of test: %.2f' % (accuracy*100))

set_up_for_ann(neo_params_imp)

"""## Running for All Parameters"""

def set_up_for_ann2(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = Sequential()
model.add(Dense(12, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=25, batch_size=16)
_, accuracy = model.evaluate(X_test, y_test)
print()
print('Accuracy of test: %.2f' % (accuracy*100))

result_params_all_ann = set_up_for_ann2(neo_params_all)

import pandas as pd
df = pd.read_csv(filepath_or_buffer='sample_data/neo_data.csv',
parse_dates=['Close Approach Date', 'Close Approach Date (Full)'])
df['year'] = df['Close Approach Date'].dt.year

47
df.head()

df.info()

df.nunique()

from plotly.express import bar


bar(data_frame=df, x='Limited Name', color='Orbiting Body')

bar(data_frame=df, x='Limited Name', color='year')

from plotly.express import histogram


histogram(data_frame=df[df['year'] <2024].sort_values(by='Limited Name'),
x='year', color='Limited Name', nbins=124).show()
histogram(data_frame=df[df['year'] <2024].sort_values(by='Limited Name'),
x='year', color='Is Potentially Hazardous', nbins=124).show()

from plotly.express import scatter


scatter(data_frame=df, x='Limited Name', y='year', color='Miss Distance (km)')

scatter(data_frame=df, x='Limited Name', y='year', color='Is Potentially


Hazardous')

histogram(data_frame=df, x='Miss Distance (km)', color='Is Potentially


Hazardous',)

48
histogram(data_frame=df, x='Relative Velocity (km/s)', color='Is Potentially
Hazardous',)

scatter(data_frame=df, y='Relative Velocity (km/s)', x='Miss Distance (km)',


hover_name='Limited Name', size='Min Diameter (km)', color='year',
log_x=True)

scatter(data_frame=df, y='Relative Velocity (km/s)', x='Miss Distance (km)',


hover_name='Limited Name', size='Min Diameter (km)',
color='Is Potentially Hazardous', symbol='Orbiting Body', log_x=True)

scatter(data_frame=df, x='year', y='Miss Distance (km)', hover_name='Limited


Name', size='Min Diameter (km)',
color='Is Potentially Hazardous', log_y=True)

49
APPENDIX B

Figure B.1 NEO Representation

The output image depicts a scatterplot with miss distance on the y-axis and year on
the x-axis, showcasing the trajectory of various celestial objects over time. The
hazards, represented by red data points, indicate objects that have a close approach
to Earth, potentially posing a threat. Conversely, non-hazardous objects are
denoted by blue data points, indicating a safer distance from Earth.

The visualization allows for easy identification of hazardous objects based on their
proximity to Earth during specific time periods. Objects with a miss distance below
a certain threshold are highlighted in red, alerting observers to potential collision
risks. In contrast, blue data points signify objects that maintain a safe distance from
Earth, alleviating concerns of impact.

50
APPENDIX C
PUBLICATION STATUS

51
REFERENCES

1. G. Deva ram ganesh, P. Vidyullatha, M. Ravi krishna, S. T. Prapulla, A. Pavan


Saran and P. Ramya, "Machine Vision based Object Detection using Deep
Learning Techniques," 2023 3rd International Conference on Smart Data
Intelligence (ICSMDI), Trichy, India, 2023, pp. 466-470, doi:
10.1109/ICSMDI57622.2023.00088.

2. T. Hoffman et al., "Near-Earth Object Surveyor Project Preliminary Design,"


2023 IEEE Aerospace Conference, Big Sky, MT, USA, 2023, pp. 1-19, doi:
10.1109/AERO55745.2023.10115663.

3. H. Lin, C. Sun and Y. Liu, "OBBStacking: An Ensemble Method for Remote


Sensing Object Detection," in IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 16, pp. 2112-2120, 2023, doi:
10.1109/JSTARS.2023.3243168.

4. D. Khajuria, A. Sharma, N. Sharma and M. Mangla, "Classification and


Comparative Analysis of Earth's Nearest Objects using Machine Learning
Models," 2023 10th International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, India, 2023, pp. 16-23.

5. L. Z. Ladeira, F. Santos, L. Cléopas, P. Buteneers and L. Villas, "NEO-NDA:


Neo Natural Language Data Augmentation," 2022 IEEE 16th International
Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2022, pp.
99-102, doi: 10.1109/ICSC52841.2022.00021.J. Clerk Maxwell, A Treatise on
Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.

6. R. N. Ranaweera and T. Fernando, "Prediction of Potentially Hazardous


Asteroids using Deep Learning," 2022 2nd International Conference on Advanced
Research in Computing (ICARC), Belihuloya, Sri Lanka, 2022, pp. 31-36, doi:
10.1109/ICARC54489.2022.9753945.

7. R. J. Wainscoat, K. Chambers, R. Weryk, Y. Ramanjooloo, M. Huber and E.


Magnier, "Recent Advances in the Pan-STARRS Search for Near-Earth Objects,"
2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 2022, pp. 1-6,

52
doi: 10.1109/AERO53065.2022.9843625.

8. T. Hromakina et al., "NEOROCKS project: surface properties of small near-


Earth asteroids," in Monthly Notices of the Royal Astronomical Society, vol. 520,
no. 2, pp. 3143-3150, Sept. 2022, doi: 10.1093/mnras/stad330.

9. C. Rosu and V. Bacu, "Asteroid Image Classification Using Convolutional


Neural Networks," 2021 IEEE 17th International Conference on Intelligent
Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 2021,
pp. 3-10, doi: 10.1109/ICCP53602.2021.9733484.

10. V. Bahel, P. Bhongade, J. Sharma, S. Shukla and M. Gaikwad, "Supervised


Classification for Analysis and Detection of Potentially Hazardous Asteroid," 2021
International Conference on Computational Intelligence and Computing
Applications (ICCICA), Nagpur, India, 2021, pp. 1-4, doi:
10.1109/ICCICA52458.2021.9697222.

11. M. Veto et al., "Ball Aerospace Spacecraft and Systems for the Near-Earth
Object Surveyor Mission," 2021 IEEE Aerospace Conference (50100), Big Sky,
MT, USA, 2021, pp. 1-14, doi: 10.1109/AERO50100.2021.9438216.

12. L. Gentile, A. Capria, M. Conti, C. Moscardini and M. Martorella, "Resident


Space Object Passive Bistatic Radar Detection Using DVB-S2 Signals," 2018 19th
International Radar Symposium (IRS), Bonn, Germany, 2018, pp. 1-7, doi:
10.23919/IRS.2018.8448182.

13. R. A. Mohammed and M. Y. Alradi, "Model of planetary radar for tracking


near-Earth asteroids," 2017 Fifth International Conference on Aerospace Science
& Engineering (ICASE), Islamabad, Pakistan, 2017, pp. 1-4, doi:
10.1109/ICASE.2017.8374262.

14. C. P. Yang et al., "An architecture for mitigating near earth object's impact to
the earth," 2017 IEEE Aerospace Conference, Big Sky, MT, USA, 2017, pp. 1-13,
doi: 10.1109/AERO.2017.7943594.

15. N. Rowell, S. Parkes and M. Dunstan, "Image Processing for Near Earth Object
Optical Guidance Systems," in IEEE Transactions on Aerospace and Electronic

53
Systems, vol. 49, no. 2, pp. 1057-1072, APRIL 2013, doi:
10.1109/TAES.2013.6494399.

54

You might also like