0% found this document useful (0 votes)

20 views60 pages

Report II

Uploaded by

penyameen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views60 pages

Report II

Uploaded by

penyameen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

ANALYSING NEAR EARTH OBJECTS &

VISUALIZING HAZARDS

A PROJECT REPORT

Submitted by

Amirthavarshini P M (913120205005)

Swathi K (913120205073)

in partial fulfillment for the award of the degree

of
BACHELOR OF TECHNOLOGY

INFORMATION TECHNOLOGY

VELAMMAL COLLEGE OF ENGINEERING AND TECHNOLOGY,

(AUTONOMOUS)
MADURAI-625 009

ANNA UNIVERSITY :: CHENNAI 600 025

MAY 2024

1
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “ANALYSING NEAR EARTH

OBJECTS & VISUALIZING HAZARDS” is the bonafide work of
“Amirthavarshini P M (913120205005) and Swathi K (913120205073)”
who carried out the project work under my supervision.

SIGNATURE SIGNATURE

Dr. R. Kavitha M.E., Ph.D Mrs. A Vinora

HEAD OF THE DEPARTMENT SUPERVISOR

Professor and Head Assistant Professor

Department of Information Department of Information

Technology Technology

Velammal College of Engineering Velammal College of Engineering

and Technology and Technology

Submitted for the Project Viva-Voce Examination held on at

Velammal College of Engineering and Technology, Madurai.

INTERNAL EXAMINER EXTERNAL EXAMINER

i
ACKNOWLEDGEMENT

We thank the almighty for giving us moral strength to work on the project for the
past of months.

We would like to express thanks to management for giving an opportunity to do

our research work.

Our sincere thanks to our respected Principal Dr. P. ALLI for the encouragement
and motivation they offered us during the period of our research study.

Our heartfelt gratitude to Dr. R. KAVITHA, Head of the Department,

Information Technology for her valuable guidance, inspiration and encouraging
appreciations, which helped us a lot in completing this project in time.

We convey our thanks to our guide, Mrs. A.VINORA, for her innovative
suggestions and valuable guidance.

We would also wish to extend our sincere gratefulness to all faculty members of
the Department of Information Technology for their valuable guidance
throughout the course of our project. We also thank to our parents and friends
who provided moral and physical support.

ii
ABSTRACT

Embarking on an enthralling journey through the cosmos, this project

seamlessly intertwines cutting-edge data exploration and machine learning
techniques to unravel the mysteries within NASA's NEO database. With a
primary focus on deciphering NEO characteristics, predicting potential
hazards, and crafting intuitive visualizations, our study encompasses pivotal
stages such as meticulous data preprocessing, in-depth exploratory analysis,
machine learning model training, hazard predictions, and captivating visual
representations. Innovatively, our approach introduces novel methodologies in
data preprocessing, ensuring not just consistency but unlocking hidden facets
within the NASA NEO database. The exploratory analysis becomes a cosmic
adventure, revealing unprecedented patterns and trends. Emphasis is placed on
the dynamic distribution, orbital intricacies, and temporal variations of NEOs,
breathing life into the celestial tapestry. The heart of our innovation lies in
machine learning model training, where algorithms are not just optimized but
tailored with unprecedented precision, leveraging insights from historical data.
As these refined models take flight, they predict potential hazards entwined
with NEOs, heralding a new era in celestial threat anticipation. Our
visualizations transcend the ordinary, painting a cosmic mural that showcases
spatial and temporal distributions with an artistic flair, alongside vivid
depictions of hazard predictions.

iii
TABLE OF CONTENT

CHAPTER NO TITLE PAGE NO

ABSTRACT iii

LIST OF FIGURES v

1. INTRODUCTION 1
1.1. Overview 1
1.2. Asteroids and Comets 4
1.3. Introduction to Machine Learning 6

2. LITERATURE SURVEY 9
2.1. Research papers 9
2.2. Bottlenecks of Survey papers 13

3. METHODOLOGY 16
3.1. Dataset 16
3.2. Data Pre-Processing 17
3.3. Proposed System 18
3.4. Software Requirements 23
3.5. Hardware Requirements 23

4. RESULTS & DISCUSSION 24

4.1. Predicting Best Machine 24
Learning Model

5. CONCLUSION 30

APPENDIX A 32
APPENDIX B 50
APPENDIX C 52
REFEREENCES 54

iv
LIST OF FIGURES

Figure No Figure name Page No

1.1 Frequency of NEOs by Size, Impact Energy 5

1.2. Asteroid Orbit Types 6

1.3. Random Forest Classifier 8

3.1. Neo dataset 17

3.2 Feature importance for important parameters 22

3.3 Feature importance for all parameters 23

4.1 Accuracy scores 26

4.2 Precision Score For All Parameters 27

4.3 Scatterplot 28

v
CHAPTER 1

INTRODUCTION

1.1 Introduction and Overview

1.1.1 Purpose
The purpose of this report is to fulfill the requirements outlined in the George E.
Brown Jr. Near-Earth Object Survey Act, which was included in Public Law No:
109-155, known as the 2006 NASA Authorization Act of 2005 (referenced in
Appendix A and Reference [1]). The Act mandates that the Administrator of
NASA initiates a Near-Earth Object (NEO) Survey program with the following
objectives: detection, tracking, cataloging, and characterization of objects larger
than 140 meters in diameter, with a perihelion distance of less than 1.3 AU
(Astronomical Units) from Earth. The program aims to detect these NEOs to
understand their orbital paths and potential impact risks, systematically catalog
them for ongoing monitoring, and characterize their physical properties such as
size, composition, rotational characteristics, and orbital parameters. This
comprehensive approach is designed to provide early warning and inform
mitigation efforts in case any of these detected NEOs pose a hazard to Earth. The
program is mandated to achieve a 90% completion rate within 15 years from the
date of the Act's enactment, underscoring the urgency and long-term commitment
required to address the NEO threat effectively.1

1.1.2. Statement of Need

In the NEO Survey Act, Congress made significant findings regarding Near-Earth
Objects (NEOs) and their potential threat to humanity. Firstly, Congress
acknowledged that NEOs pose a serious and credible threat based on scientific
theories linking a major asteroid or comet impact to the mass extinction event that
wiped out the dinosaurs and many other species approximately 65 million years
ago. This historical context highlights the destructive potential of NEO impacts.

1
Additionally, Congress noted that similar NEOs have impacted Earth or entered
its atmosphere multiple times throughout history, indicating a recurrent threat that
continues to pose risks in the future. The discovery timeline of certain NEOs, often
identified only days before their closest approach to Earth, highlights the
challenges in timely detection and monitoring of these objects.

Despite recent discoveries of large NEOs, Congress expressed concerns about the
significant number of undetected NEOs, emphasizing the ongoing need for
comprehensive and continuous efforts in NEO detection and tracking.

The findings outlined in the NEO Survey Act highlighted the critical importance
of implementing a robust NEO Survey program. Such a program aims to enhance
detection, tracking, cataloging, and characterization efforts, ultimately improving
preparedness, early warning systems, and potential mitigation strategies to
mitigate the potential risks associated with NEO impacts on Earth.

1.1.3. Direction
The NEO Survey Act, as outlined in Appendix A, amended the Space Act of 1958
with specific directives regarding the responsibilities of the National Aeronautics
and Space Administration (NASA) in addressing the threat of near-Earth asteroids
and comets. Congress declared that safeguarding the general welfare and security
of the United States necessitates directing NASA's unique competence towards
detecting, tracking, cataloging, and characterizing these near-Earth objects. The
primary goal is to provide early warning and mitigation strategies for potential
hazards they pose to Earth.

Furthermore, the Act mandates the Administrator of NASA to plan, develop, and

2
implement a comprehensive Near-Earth Object Survey program. This program
focuses on detecting, tracking, cataloging, and characterizing the physical
properties of near-Earth objects that are equal to or greater than 140 meters in
diameter. The objective is to assess the threat posed by such objects to Earth's
safety. A key target of the Survey program is to achieve a 90 percent completion
rate of its near-Earth object catalogue, based on statistically predicted populations
of these objects, within 15 years from the date of the Act's enactment.

Additionally, the Congress has directed the NASA Administrator to deliver

specific outcomes within one year from the law's enactment:

1. Conduct an analysis of possible alternatives that NASA may employ to carry

out the Survey program. This includes exploring ground-based and space-based
alternatives, accompanied by detailed technical descriptions.
2. Provide a recommended option along with a proposed budget to execute the
Survey program in accordance with the recommended approach.
3. Perform an analysis of potential alternatives that NASA could utilize to divert
an object that is on a likely collision course with Earth, thereby mitigating the
potential impact threat.

3
1.2. Asteroids and Comets
Asteroids and comets are the primary focus of discussion in this study due to their
potential hazard to Earth. Astronomers differentiate between these celestial bodies
based on their appearance. Asteroids typically appear as star-like points of light
when observed, while comets exhibit a diffuse appearance with visible tails.

The existence of asteroids was not known until roughly 200 years ago when
telescopes advanced enough to detect these small celestial objects. On the other
hand, people have been aware of comets since ancient times due to their distinctive
tails visible to the naked eye.

The differences in appearance also reflect differences in composition. Asteroids

are generally rocky or metallic objects lacking atmospheres. In contrast, comets
contain volatile materials such as water ice that vaporize when exposed to heat.
Consequently, comets that are farther from the Sun or have lost most of their
volatiles may resemble asteroids, while those rich in volatiles develop an
atmosphere only upon close approach to the Sun.

Asteroids are further categorized into different groups based on their orbits
relative to Earth:
- Apollos: Orbits cross Earth's orbit with a period exceeding 1 year.
- Atens: Orbits cross Earth's orbit with a period less than 1 year.
- Amors: Orbits entirely outside Earth's orbit.
- Interior Earth Objects (IEOs): Orbits entirely within Earth's orbit.

The distribution of these asteroid categories within the Near-Earth Object (NEO)
population is illustrated in Figure 1, providing insight into the varying orbits and
potential impact risks associated with different types

4
Figure 1.1. Frequency of NEOs by Size, Impact Energy, and Magnitude

1.2.1. Population of Near-Earth Objects

The analysis presented in Figure 2 utilizes a constant power law to estimate the
number of Near-Earth Objects (NEOs) based on their size. This power law
approximation demonstrates a significant increase in the number of NEOs as the
diameter of the objects decreases by an order of magnitude. Specifically, there is
a hundred-fold increase in the number of NEOs as the diameter decreases.

Figure 2 also provides additional information such as the approximate absolute

magnitude (brightness) of the NEOs, their average impact interval, and the
estimated impact energy they would deliver in the event of a collision with Earth.
These factors are crucial in understanding the potential threat posed by NEOs of
varying sizes.

It's important to note that while the estimates presented in Figure 2 are valuable,
they may have a margin of error within a factor of two or three in any given size
class. This margin of error is due to limitations in observations, particularly in size
classes where there are not enough data points to form a statistically valid sample.

5
In this report, the term PHO (Potentially Hazardous Objects) is used to indicate
potential threats among NEOs. It is noted that NEOs smaller than 1 km are
predominantly asteroids, as comets do not significantly contribute to the
population below this size threshold. This distinction helps in categorizing and
understanding the potential hazard levels associated with different sizes of NEOs,
aiding in risk assessment and mitigation planning.

Figure 1.2. Asteroid Orbit Types

1.3 Introduction to Machine Learning :

Supervised machine learning algorithms are foundational tools in predictive
modeling, categorizing data, and solving classification or regression problems.
These algorithms are trained using labeled datasets to learn patterns and
relationships within the data, enabling them to make accurate predictions on unseen
or unlabeled data. One of the key distinctions of supervised learning is the
availability of labeled data for training, which guides the algorithm in
understanding the desired output.

6
1.3.1 Logistic regression (LR)

Logistic regression (LR) is a widely-used supervised classification method that

predicts the probability of an instance belonging to a certain class. It's especially
useful for binary classification tasks where the outcome is dichotomous, such as
'yes' or 'no,' 'positive' or 'negative.' LR models the relationship between input
features and the probability of the target class, making it a powerful tool for
probability-based classification.

1.3.2 Support Vector Machine (SVM)

Support Vector Machine (SVM) is another supervised learning algorithm capable

of classifying both linear and non-linear data. SVM works by mapping data points
into a high-dimensional feature space and finding the hyperplane that best separates
different classes while maximizing the margin between them. This margin
maximization helps SVM achieve good generalization performance and handle
complex classification tasks.

1.3.3 Decision Tree (DT)

Decision Tree (DT) algorithms construct tree-like structures where internal nodes
represent feature tests, and leaf nodes represent class labels. DTs are intuitive and
easy to interpret, making them valuable for decision-making processes. They
recursively partition the data based on feature splits, leading to hierarchical
decision rules that classify instances into distinct classes.

1.3.4 Random Forest (RF)

Random Forest (RF) is an ensemble method that combines multiple decision trees
to improve predictive accuracy and reduce overfitting. Each tree in the forest is

7
trained on a random subset of the training data, and the final prediction is made by
aggregating the predictions of individual trees. RF mitigates the high variance
associated with deep decision trees, making it robust and suitable for various
classification tasks.

Figure 1.3 Random Forest Classifier

1.3.5 K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a straightforward classification algorithm that

classifies instances based on their similarity to neighboring data points. KNN
assigns the majority class among the K nearest neighbors of an instance, making it
sensitive to local patterns and suitable for simple classification tasks.

1.3.6 Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) mimic the interconnected structure of

biological neurons in the brain. ANNs consist of layers of nodes connected by
weighted edges, where each node performs a transformation on incoming data.
Deep learning, a subset of ANNs, involves networks with multiple hidden layers,
enabling them to learn complex hierarchical representations from data. ANNs are
powerful for tasks like image recognition, natural language processing, and pattern
recognition, leveraging their ability to learn intricate patterns and relationships.

8
CHAPTER 2

LITERATURE SURVEY

2.1 Research papers

1. G. Deva ram ganesh, P. Vidyullatha, M. Ravi krishna, S. T. Prapulla, A.

Pavan Saran and P. Ramya, "Machine Vision based Object Detection using
Deep Learning Techniques," 2023 3rd International Conference on Smart
Data Intelligence (ICSMDI), Trichy, India, 2023.
His team employed Convolutional Neural Networks (CNNs) to
categorize hyperspectral images. Unlike traditional methods that rely solely on
spectral features, CNNs consider both spectral and spatial features, leading to
enhanced classification accuracy. The CNN architecture strategically processes
information within 13x13 patches, allowing it to capture intricate patterns and
relationships within the data. Although CNNs require more computational
resources compared to other classifiers, they outperform them significantly,
achieving the best hyperspectral image classification accuracy based on simulation
results .
2. T. Hoffman et al., "Near-Earth Object Surveyor Project Preliminary
Design," 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 2023
In the preliminary design phase of the NEOS Project,T. Hoffman and
collaborators have made substantial progress towards the project's objectives,
which include identifying, characterizing, and cataloging Near-Earth Objects
(NEOs). The project, involving collaborations with the University of Arizona,
NASA's Jet Propulsion Lab, and industry partners, has advanced critical
components and spacecraft design. This progress positions the NEOS Project well
for the upcoming detailed design phase, aligning closely with NASA's planetary
defense goals.

9
3. H. Lin, C. Sun and Y. Liu, "OBBStacking: An Ensemble Method for
Remote Sensing Object Detection," in IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing
Introduction an innovative ensemble method called OBBStacking,
designed for geo-spatial sensing object identification using oriented bounding
boxes (OBBs). OBBStacking effectively combines OBB-based detection results
and leverages confidence indicators from deep learning object detectors. In
experiments conducted in 2021, OBBStacking outperformed the Gaofen Challenge
in mechanized Earth Observation Image Interpretation with High-Resolution,
showcasing enhanced performance on datasets like DOTA and FAIR1M.
4. D. Khajuria, A. Sharma, N. Sharma and M. Mangla, "Classification and
Comparative Analysis of Earth's Nearest Objects using Machine Learning
Models," 2023 10th International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, India, 2023
His research he uses machine learning to classify asteroids (from the NASA
Nearest Earth Objects dataset) as hazardous or non-hazardous, aiming to unveil
space mysteries. Visualization aids data analysis, culminating in an efficient
prediction model evaluation based on accuracy, showcasing machine learning's
efficacy in tackling astronomical challenges.
5. L. Z. Ladeira, F. Santos, L. Cléopas, P. Buteneers and L. Villas, "NEO-
NDA: Neo Natural Language Data Augmentation," 2022 IEEE 16th
International Conference on Semantic Computing (ICSC), Laguna Hills, CA,
USA, 2022
Introduced NEO-NDA (NEO Natural Language Data Augmentation), a
versatile tool for generating and rebalancing textual datasets. Unlike many existing
approaches, NEO-NDA operates across multiple languages, offering a wide range
of modifications to generate novel data samples. Their system significantly

10
enhances the performance of machine learning models, often doubling their
effectiveness compared to original datasets without augmentation.
6. R. N. Ranaweera and T. Fernando, "Prediction of Potentially Hazardous
Asteroids using Deep Learning," 2022 2nd International Conference on
Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, 2022,
They utilized deep neural networks to analyze the orbital data of Near-Earth
Asteroids (NEAs). By allowing the model to learn intricate patterns and features
within the dataset, the approach efficiently classifies objects as harmful or harmless
based on acquired insights. This enhances the efficiency of identifying and
characterizing celestial bodies, contributing to efforts aimed at preventing potential
impacts on Earth.
7. R. J. Wainscoat, K. Chambers, R. Weryk, Y. Ramanjooloo, M. Huber
and E. Magnier, "Recent Advances in the Pan-STARRS Search for Near-
Earth Objects," 2022 IEEE Aerospace Conference (AERO), Big Sky, MT,
USA, 2022,
They introduced the Pan-STARRS telescopes on Maui search for near-
Earth objects, with Pan-STARRS2 overcoming early technical challenges like
mirror coating and mount issues. With improved reporting latency and same-night
follow-up, it aids in discovering interstellar objects like 'Oumuamua and is poised
as a potent tool for identifying impacting asteroids.
8. T. Hromakina et al., "NEOROCKS project: surface properties of small
near-Earth asteroids," in Monthly Notices of the Royal Astronomical
Society, vol. 520, no. 2, pp. 3143-3150, Sept. 2022,
They explained about the NEOROCKS project at Haute-Provence
Observatory classified 42 NEOs, adding to a dataset of 93 objects, identifying
potential spacecraft-accessible targets like (163014) 2001 UA5 and 2017 SE19 for
planetary defense research and missions.

11
9. C. Rosu and V. Bacu, "Asteroid Image Classification Using Convolutional
Neural Networks," 2021 IEEE 17th International Conference on Intelligent
Computer Communication and Processing (ICCP), Cluj-Napoca, Romania,
2021,
Introduction of a convolutional neural network (CNN) model designed
specifically for classifying astronomical images to detect asteroids. Trained on
original data from the Telescope of Newton in La Palma, the system aims to
develop a mechanized system for accurate asteroid detection while minimizing
false negatives. Through training with solitary images, the model achieved a
remarkable 94.01% recall rate, showcasing its potential for precise asteroid
detection
10. V. Bahel, P. Bhongade, J. Sharma, S. Shukla and M. Gaikwad,
"Supervised Classification for Analysis and Detection of Potentially
Hazardous Asteroid," 2021 International Conference on Computational
Intelligence and Computing Applications (ICCICA), Nagpur, India, 2021.
Emphasized the growing importance of artificial intelligence (AI) in
space science, particularly for mitigating asteroid-related risks. Their study
involved a meticulous understanding of Potentially Harmful Asteroids (PHAs) and
employed supervised machine learning (ML) to assess asteroid hazard based on
specific parameters. Among various classification algorithms, Random Forest
stood out with an average F1-score of 99.22% and an exceptional 99.9% accuracy,
offering a robust solution for asteroid risk evaluation.
11. M. Veto et al., "Ball Aerospace Spacecraft and Systems for the Near-
Earth Object Surveyor Mission," 2021 IEEE Aerospace Conference (50100),
Big Sky, MT, USA, 2021.
Developed by Ball Aerospace for NASA's NEOSM, aims to detect >140-
meter Near-Earth Objects, enhancing planetary defense and informing policy with
a comprehensive survey leveraging Ball's spacecraft and infrared expertise from

12
NEOWISE. This collaborative effort combines scientific, engineering, and
institutional heritage to advance understanding of NEOs' physical properties and
their origins in the early solar system.
12. L. Gentile, A. Capria, M. Conti, C. Moscardini and M. Martorella,
"Resident Space Object Passive Bistatic Radar Detection Using DVB-S2
Signals," 2018 19th International Radar Symposium (IRS), Bonn, Germany,
2018,
This work explores the feasibility of a DVB-S2 passive multi-bistatic
radar for tracking Resident Space Objects (RSOs), contrasting its benefits with
active ground-based systems. Unlike optical systems, this radar operates day and
night, regardless of weather conditions, offering a robust monitoring solution for
Earth's orbit.[12]
13. R. A. Mohammed and M. Y. Alradi, "Model of planetary radar for
tracking near-Earth asteroids," 2017 Fifth International Conference on
Aerospace Science & Engineering (ICASE), Islamabad, Pakistan, 2017.
Discussed the increasing hazards posed by asteroids in the asteroid belt
located between Mars and Jupiter, within a range of 2.2 to 3.2 astronomical units
(AU). The study highlighted the use of planetary radars for tracking celestial
objects and safeguarding Earth. It detailed a simulation-based approach that
involved analyzing radar cross-sectional data and modeling signal processing using
MATLAB and Simulink. This approach allowed for the examination of moving
asteroids' behavior and contributed to strategies aimed at mitigating asteroid threats
.

2.2 Bottlenecks of Survey papers:

The literature survey reveals several bottlenecks in current methodologies for
analyzing celestial objects. Firstly, the use of Convolutional Neural Networks
(CNNs) for hyperspectral image classification, as demonstrated by G. Deva Ram

13
Ganesh et al., presents a challenge due to its high computational resource
requirements. This limitation could hinder real-time applications or deployment in
resource-constrained environments, impacting the practicality and scalability of
the approach. Similarly, T. Hoffman et al.'s NEOS Project, while making
significant progress, may
encounter challenges related to the detailed design phase. The lack of specific
details about algorithms or methodologies used could impede a comprehensive
assessment of technical feasibility, potentially affecting the project's effectiveness
and future scalability.
Furthermore, the effectiveness of data augmentation techniques like NEO-NDA by
Z. Ladeira et al. may be influenced by their ability to handle extremely imbalanced
datasets or rare classes. Generalizability across different scenarios and datasets is
crucial for ensuring the reliability and applicability of these techniques in diverse
contexts.
Additionally, while CNN models like the one introduced by C. Rosu et al. for
asteroid detection showcase impressive recall rates, their generalization to diverse
datasets and environmental conditions may require further validation and
optimization for real-world deployment.Moreover, advancements in geo-spatial
sensing object identification, as proposed by H. Lin's OBBStacking method, may
vary in performance based on the complexity and variability of the data. Thorough
evaluation and benchmarking against diverse datasets are necessary to assess the
method's reliability and effectiveness across different scenarios. Lastly,
simulation-based approaches, as discussed in R. A. Mohammed's work on
planetary radars for asteroid tracking, may face challenges in accurately predicting
real-world scenarios, especially concerning unexpected asteroid behaviors or
events.
In conclusion, our proposed Advanced Analysis and Classification of Near-Earth
Objects (NEOs) project aims to overcome these identified bottlenecks effectively.

14
By leveraging a comprehensive approach that includes data quality assurance,
robust preprocessing techniques, and advanced machine learning algorithms such
as Random Forest, Neural Networks, and Gradient Boosting, we enhance
predictive capabilities while ensuring scalability, generalizability, and
interpretability. The project's focus on real-world applications, planetary defense
implications, and international collaboration further strengthens its efficacy in
advancing space science and AI for NEO analysis and classification.

15
CHAPTER 3
METHODOLOGY
3.1. Dataset
NASA's Near-Earth Object (NEO) Program manages a vast and essential dataset
crucial for understanding asteroids and comets that come close to Earth. This
dataset is multifaceted, comprising critical components like the NEO Confirmation
Page and the NEO Earth Close-Approaches Database.

The NEO Confirmation Page is a real-time resource that provides detailed

information on confirmed NEOs, including discovery dates, orbits, sizes, and Earth
impact probabilities. This resource aids in tracking newly discovered NEOs and
understanding their characteristics.

The NEO Earth Close-Approaches Database documents instances when NEOs

come close to Earth, providing data on close approach dates, distances, velocities,
and physical properties. This database helps assess potential impact risks and plan
observation and mitigation strategies.

NASA collaborates globally with observatories, research institutions, and

international partners to enhance data quality and enrich the overall NEO dataset.
This collaboration contributes valuable observational data, aiding impact risk
assessments conducted by NASA's Center for Near-Earth Object Studies
(CNEOS).

In summary, NASA's NEO dataset is a vital tool for studying, monitoring, and
mitigating potential hazards from NEOs, ensuring Earth's safety from potential
impact events.

16
Figure 3.1. Neo dataset

3.2 Data preprocessing

Data preprocessing for NEO (Near-Earth Object) analysis involves several steps to
clean, transform, and prepare the data for analysis in Excel. Here's a structured
paragraph outlining the process:

The data preprocessing of NEO information in Excel begins with data collection
from reliable sources such as NASA's NEO Program database or other
astronomical repositories. The collected data typically includes attributes like NEO
designation, discovery date, observation arc, orbital parameters (e.g., semi-major
axis, eccentricity, inclination), estimated diameter, and potential impact
probabilities.

Once the data is imported into Excel, the first step is data cleaning, which involves
identifying and handling missing values, correcting errors, and removing
duplicates if any. This ensures the dataset's integrity and accuracy for further
analysis.

17
The next step is data transformation, where certain columns may need to be
formatted or calculated to derive new variables or metrics relevant to NEO
analysis. For example, converting dates to a standardized format, calculating
orbital velocities or impact probabilities based on provided data, or categorizing
NEOs based on size ranges.

Data normalization or standardization may also be performed to bring consistency

to the dataset, especially if the data comes from different sources with varying
formats or units. Finally, data validation and verification are crucial steps to ensure
that the preprocessed data is accurate, consistent, and ready for analysis. This
involves checking for logical inconsistencies, cross-referencing with external
sources if necessary, and validating calculated values or derived metrics.

Overall, data preprocessing in Excel for NEO analysis involves a systematic

approach to clean, transform, standardize, filter, and validate the data, ensuring its
quality and usability for subsequent analytical tasks and insights.

3.3 Proposed System

The proposed system aims to enhance the accuracy and efficiency of Near-Earth
Object (NEO) classification using advanced machine learning techniques. By
leveraging state-of-the-art algorithms, including Random Forest, Neural Networks
& Gradient Boosting, the system aims to improve the predictive capabilities for
identifying Potentially Harmful Asteroids among NEOs. The system's
methodology involves feature extraction, preprocessing, and training the machine
learning models on a comprehensive dataset containing key NEO attributes. This
approach enables the development of a robust classification system capable of
categorizing NEOs based on various physical and orbital parameters. To address

18
challenges such as missing data and categorical variables, the system employs data
preprocessing techniques, ensuring the quality and uhcompleteness of the input
dataset. The utilization of machine learning algorithms facilitates more accurate
predictions, contributing to the early identification of PHAs and enhancing
planetary defense capabilities. The proposed system not only advances the
accuracy of NEO classification but also provides a scalable framework for ongoing
research and monitoring of celestial objects. By combining astronomical data with
cutting-edge machine learning methodologies, the system stands to significantly
impact our understanding of NEOs and improve our ability to assess potential
threats from these cosmic bodies.
The Advanced Analysis and Classification of Near-Earth Objects (NEOs) project
is a multidisciplinary endeavor aimed at enhancing our understanding of NEOs,
which are comets & asteroids that come in close proximity to the Earth. The
primary objectives of this project are to:

1. Data Collection and Preparation: The project begins with gathering and
processing a comprehensive dataset of NEOs. This dataset contains crucial
information about these celestial objects, including their names, sizes, albedo,
orbital parameters, and potential threat levels (Potentially Hazardous Asteroids -
PHAs).

2. Exploratory Data Analysis (EDA): To acquire deeper understanding of the

dataset, exploratory data analysis is conducted. EDA involves data visualization
and statistical techniques to identify patterns and correlations among NEO
attributes. This step helps in better understanding the characteristics of NEOs.

3. Classification and Risk Assessment: The project involves utilizing machine

learning algorithms to classify NEOs into different categories based on their

19
characteristics. Particular attention is given to identifying PHAs that pose a
potential threat to Earth. This risk assessment is a crucial component of planetary
defense efforts.

4. Deep Learning and Machine Learning Models: The project explores the use of
deep learning and ML models to make predictions and classifications based on the
dataset. Random Forest, Artificial Neural Networks (ANNs) & Gradient Boosting
are employed to enhance the accuracy of risk assessment & classification.

5. Planetary Defense and Space Exploration Implications: Understanding NEOs

and identifying potential threats is not only vital for planetary defense but also for
advancing space exploration. By gaining a better understanding of these celestial
bodies, this project contributes to humanity's knowledge of our solar system and
our ability to prepare for and mitigate potential hazards.

6. Safety and Education: The project has implications for safety and education. By
sharing the results and insights with the public, educational institutions, and space
agencies, this initiative contributes to raising awareness about NEOs and the
importance of planetary defense.

7. International Collaboration: Given the global nature of the NEO threat, this
project may involve international collaboration, as organizations like NASA, ESA,
and other space agencies actively monitor NEOs.

Pseudo code:

# Input: NEO Dataset

# Output: Prediction of Potentially Hazardous Asteroids

20
Step 1: Load Dataset
neo_data = load_neo_dataset()

Step 2: Data Preprocessing

neo_processed = preprocess_data(neo_data)

Step 3: Split Dataset

train_set, test_set = split_dataset(neo_processed)

Step 4: Random Forest Classifier

rf_model = initialize_random_forest_classifier()
rf_train_acc, rf_test_acc = train_and_evaluate(rf_model, train_set, test_set)

Step 5: Gradient Boosting Classifier

gb_model = initialize_gradient_boosting_classifier()
gb_train_acc, gb_test_acc = train_and_evaluate(gb_model, train_set, test_set)

Step 6: Neural Network Model

nn_model = initialize_neural_network_model()
nn_train_acc, nn_test_acc = train_and_evaluate(nn_model, train_set, test_set)

Step 7: Display Results

display_results(rf_train_acc, rf_test_acc, gb_train_acc, gb_test_acc, nn_train_acc,
nn_test_acc)

# Input: NEO Dataset with Hazard Information

# Output: Visualization of Potentially Hazardous Asteroids

21
Step 1: Load Dataset with Hazard Information
neo_hazard_data = load_neo_dataset_with_hazard_info()

Step 2: Filter Potentially Hazardous Asteroids

pha_data = filter_potentially_hazardous_asteroids(neo_hazard_data)

Step 3: Visualize Potentially Hazardous Asteroids

plot_potentially_hazardous_asteroids(pha_data).

Figure 3.2 feature importance for important parameters

The above bar chart 3.1 shows the feature importance for important parameters
such as a-Semi-major axis,
q-Perihelion distance, i- Inclination (deg), e-Orbital eccentricity, diameter-
Diameter of the object (km) , tp- Time of perihelion passage, & albedo-Albedo
(reflectivity).

22
Figure 3.3 feature importance for all parameters

The bar charts 3.2 shows the feature importance of all the parameters such as a-
Semi-major axis , q- Perihelion distance, e- Orbital eccentricity, i- Inclination
(degrees), om- Longitude of the ascending node (degrees), w- Argument of
perihelion (degrees), ma- Mean anomaly (degrees), ad:- aphelion distance, n-Mean
motion (degrees per day), tp- Time of perihelion passage.

3.4 Software Requirements

1) OS : Windows 10
2) Language: Python
3) IDE : Visual Studio, Kaggle Notebook

3.5 Hardware Requirements

1) RAM : At least 4 GB.
2) Storage : At least 1 GB.

23
CHAPTER 4

RESULTS & DISCUSSIONS

4.1 Predicting Best Model:

In predictive modeling, determining the most effective machine learning algorithm
is crucial for achieving accurate and reliable results. One common approach to
evaluating model performance is by assessing their accuracy scores across various
classifiers. Accuracy scores provide insights into how well a model performs in
correctly predicting outcomes compared to the actual data.

Accuracy Score:
Accuracy is a measure of how often a classification model correctly predicts the
outcome. It is calculated as the ratio of the number of correct predictions to the
total number of predictions.
Formula: Accuracy = (Number of Correct Predictions) / (Total Number of
Predictions) * 100%

A higher accuracy score indicates that the model is making more correct
predictions, but it may not provide a complete picture if the dataset is imbalanced
(i.e., one class dominates the dataset).

Precision:
Precision is a measure of the model's ability to correctly identify positive instances
(True Positives) out of all instances predicted as positive (True Positives + False
Positives).
Formula: Precision = True Positives / (True Positives + False Positives)

Precision is essential in scenarios where the cost of false positives is high. A higher
precision indicates fewer false positives, but it may lead to more false negatives if

24
the model becomes too conservative.

Recall (Sensitivity):
Recall, also known as Sensitivity or True Positive Rate, measures the model's
ability to correctly identify positive instances out of all actual positive instances
(True Positives + False Negatives).
Formula: Recall = True Positives / (True Positives + False Negatives)

Recall is crucial when the cost of false negatives is high. A higher recall indicates
fewer false negatives, but it may lead to more false positives if the model becomes
too aggressive in predicting positives.

F1 Score:
The F1 score is the harmonic mean of precision and recall, providing a balance
between the two metrics. It is useful when there is an uneven class distribution or
when both false positives and false negatives are equally important.
Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

The F1 score ranges from 0 to 1, with 1 being the best possible score. It considers
both false positives and false negatives, making it a comprehensive metric for
model evaluation.
These evaluation metrics play a crucial role in assessing the performance of
classification models and determining which model is best suited for a specific task
based on its ability to balance accuracy, precision, recall, and F1 score.

25
Figure 4.1 Accuracy scores

The table 3.1 represents the accuracy of gradient boosting classifier, ada boost
classifier, Random forest classifier, decision tree classifier , logistic Regression, K
Neighbors classifier & SVC. Among this it is clearly concluded that Random forest
classifier has higher accuracy than other algorithms

Through a meticulous evaluation process encompassing crucial metrics like

accuracy, precision, recall, and F1 score, we meticulously compared multiple
machine learning models. This rigorous analysis unequivocally revealed Random
Forest as the standout performer among the classifiers assessed. Notably, Random
Forest consistently showcased higher accuracy, demonstrating its adeptness at
making accurate predictions across the dataset. Moreover, its superior precision
and recall rates underscored its proficiency in correctly identifying positive
instances while minimizing false positives and false negatives. The balanced F1
score further reinforced Random Forest's efficacy, indicating its ability to manage
both types of errors adeptly.

26
Figure 4.2 Precision score for all parameters

The bar chart 3.3 shows the Precision score for all parameters, from that we observe
that random forest classifier has the highest precision score, so we do the analysis
by it.

27
Figure 4.3 Scatterplot

The figure 3.4 shows the representation of the Near-Earth object as non-hazardous
or hazardous using a scatterplot.

The NEO (Near-Earth Object) project has yielded important results and insights
into the characteristics and dynamics of these celestial bodies. Here's a summary
of the project's key findings and a discussion of their implications:

Potentially Hazardous Asteroids (PHAs):

- The project identified and characterized PHAs among NEOs. These are objects
with the potential to impact Earth. The data collected is crucial for assessing and
mitigating potential threats.

Orbital and Physical Characteristics:

- Detailed information about NEOs' orbital elements, such as perihelion
distance, semi-major axis, eccentricity and physical properties like diameter &
albedo, has been compiled. These parameters provide valuable insights into their
trajectories and behavior.

Epoch Data:
- The inclusion of epoch data allows researchers to track the positions of NEOs
at specific times, which is essential for predicting their future positions and
potential close approaches to Earth.
- The identification of PHAs is crucial for early warning systems and planetary

28
defense strategies. Knowing which NEOs pose a threat helps in devising strategies
to mitigate potential impacts.
- Epoch data enhances the precision of NEO tracking, enabling better
predictions for astronomical observations and potential space missions.

The NEO project is pivotal in enhancing our understanding of Near-Earth Objects

(NEOs) and their implications for Earth's safety. By focusing on data accuracy,
classification improvements, and advanced prediction models, we aim to refine
our preparedness for potential NEO encounters. Collaborative efforts with
international partners, increased public awareness, and space missions dedicated
to NEO study are integral components of this ongoing initiative. Through these
concerted actions, we have made significant strides in classifying NEOs,
identifying hazardous asteroids, and gathering crucial data for improved tracking
and forecasting of NEO movements.

29
CHAPTER 5

CONCLUSION AND FUTURE WORK

The NEO (Near-Earth Object) project represents a significant milestone in our

efforts to understand and mitigate potential threats from asteroids and comets that
come close to Earth. Through meticulous classification efforts, the project has
successfully categorized NEOs and pinpointed potentially hazardous asteroids
among them. This classification is based on a range of parameters, including size,
composition, and trajectory, allowing us to identify objects that could pose a risk
to Earth.

One of the key contributions of the NEO project is the comprehensive data it
provides on the orbital and physical characteristics of NEOs. This data is crucial
for enhancing our preparedness and response capabilities in the event of a NEO
encounter. By understanding the precise movements and trajectories of these
objects, we can improve our early warning systems and take proactive measures to
mitigate potential impacts.

The inclusion of epoch data in the NEO project has significantly refined our ability
to predict NEO movements. Epoch data refers to the precise timestamps associated
with observations, allowing for more accurate predictions of an object's future path.
This enhancement is vital for improving our forecasting capabilities and providing
timely alerts about NEOs that may come close to Earth.

Moreover, the NEO project focuses on gathering more detailed and accurate data
on NEOs beyond their basic classification. This includes gathering information
about their size, shape, surface properties, and other relevant characteristics. Such
detailed data is instrumental in developing advanced prediction models that further
improve tracking and forecasting accuracy.

30
Collaboration is a cornerstone of the NEO project, as it actively engages with
international space agencies and observatories. This collaboration ensures a
broader dataset, combining observations and insights from various sources
worldwide. By pooling resources and expertise, the project can access a wealth of
information, contributing to a more comprehensive understanding of NEOs.

Lastly, the NEO project advocates for funding space missions dedicated to
studying NEOs up close. These
missions provide invaluable insights into the composition, behavior, and potential
threats posed by NEOs. By supporting such missions, the project aims to deepen
our understanding of these objects and strengthen our ability to safeguard Earth
from potential impacts.

31
APPENDIX A
CODE

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style = "ticks", color_codes=True)

ad = pd.read_csv('sample_data/dataset.csv')
print(ad.shape)

ad.head()

"""# Exploratory Data Analysis / EDA"""

def extract_neo(df):
neo = df.loc[df.neo == 'Y',:]
neo.to_csv('neo.csv')
return neo

def extract_pha(df):
pha = df.loc[df.pha == 'Y',:]
pha.to_csv('pha.csv')
return pha

def extract_neo_pha(df):

32
neo_pha = df.loc[df.pha == 'Y' & df.neo == 'Y',:]
neo_pha.to_csv('neo_pha.csv')
return neo_pha

def plot_scatter(df, params):

for i in params:
g = sns.FacetGrid(df, col = "class", hue = "pha")
g.map(sns.scatterplot, "moid", i, alpha = .7)
h = sns.FacetGrid(df, col = "class", hue = "pha")
h.map(sns.scatterplot, "H", i, alpha = .7)
g.add_legend()
h.add_legend()
plt.show()

neo = extract_neo(ad)
neo.head()

pha = extract_pha(ad)
pha.head()

print(neo.shape)
cols = list(neo.columns)
print(cols)

neo.describe()

new_ad = ad[ad['pha'].notna()]

33
new_ad = new_ad[new_ad['neo'].notna()]

sns.catplot(x = "neo", y = "pha", data = new_ad)

plt.show()

param_imp = ['diameter', 'albedo', 'e', 'a', 'q', 'i', 'tp']

neo[param_imp].info()

plot_scatter(neo, param_imp)

neo[param_imp].describe()

param_oth = ['om', 'w', 'ma', 'ad', 'n', 'rms']

neo[param_oth].info()

neo[param_oth].describe()

plot_scatter(neo, param_oth)

! pip3 install pyforest

import pyforest
import warnings
warnings.filterwarnings("ignore")
from sklearn import metrics

34
from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

def convert_to_bin(df, column):

df[column] = df[column].map({'Y': int(1), 'N': int(0), 'NaN': int(0)})
return df

neo_new = convert_to_bin(neo, 'pha')

neo_final = convert_to_bin(neo_new, 'neo')

params_imp = ['neo'] + ['pha'] + param_imp

params_all = params_imp + param_oth

neo_params_imp = neo_final[params_imp]
neo_params_imp

neo_params_all = neo_final[params_all]
neo_params_all

neo_params_imp.isnull().sum()

def fill_nan(df):
for column in df.columns:
if column == 'pha' or column == 'neo':
df[column].fillna(value = int(0), inplace=True)
else:

35
df[column].fillna(int(df[column].mean()), inplace=True)
return df

neo_params_imp = fill_nan(neo_params_imp)
neo_params_all = fill_nan(neo_params_all)

neo_params_all

neo_params_imp

plt.figure(figsize = (20,20))
sns.heatmap(data = round(neo_params_imp.corr(), 2), annot = True)
plt.show()

plt.figure(figsize = (20,20))
sns.heatmap(data = round(neo_params_all.corr(), 2), annot = True)
plt.show()

"""# Model Selection and Analysis"""

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

36
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,
GradientBoostingClassifier

def model_fit_score(models, df):

X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
np.random.seed(42)
model_scores = {}
for name, model in models.items():
model.fit(X_train, y_train)
model_scores[name] = model.score(X_test, y_test)
model_scores = pd.DataFrame(model_scores, index = ['Score']).transpose()
model_scores = model_scores.sort_values('Score')
return model_scores

models = {'LogisticRegression': LogisticRegression(max_iter = 10000),

'KNeighborsClassifier': KNeighborsClassifier(),
'SVC': SVC(),
'DecisionTreeClassifier': DecisionTreeClassifier(),
'RandomForestClassifier': RandomForestClassifier(),
'AdaBoostClassifier': AdaBoostClassifier(),
'GradientBoostingClassifier': GradientBoostingClassifier()}

37
model_scores_params_imp = model_fit_score(models, neo_params_imp)
model_scores_params_imp.sort_values('Score', ascending = False)

plt.figure(figsize = (20,10))
sns.barplot(data = model_scores_params_imp.sort_values('Score').T)
plt.title('Model Precision Score For Important Parameters')
plt.xticks(rotation = 90);
plt.show()

model_scores_params_all = model_fit_score(models, neo_params_all)

model_scores_params_all.sort_values('Score', ascending = False)

plt.figure(figsize = (20,10))
sns.barplot(data = model_scores_params_all.sort_values('Score').T)
plt.title('Model Precision Score For All Parameters')
plt.xticks(rotation = 90);
plt.show()

"""## Random Forest Classifier Model"""

from sklearn.metrics import accuracy_score, confusion_matrix,

classification_report

def rfc(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)

38
clf = RandomForestClassifier(n_estimators = 10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

"""### Results of RFC for Important Parameters"""

result_params_imp = rfc(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

"""### Results of RFC for All Parameters"""

result_params_all = rfc(neo_params_all)
print("Accuracy: ", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""## Feature Importance Analysis"""

from sklearn.ensemble import RandomForestClassifier

import numpy as np

39
import matplotlib.pyplot as plt

def fi_plot(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_
sorted_indices = np.argsort(importances)[::-1]
plt.title('Feature Importance')
plt.bar(range(X_train.shape[1]), importances[sorted_indices], align = 'center')
plt.xticks(range(X_train.shape[1]), X_train.columns[sorted_indices], rotation =
90)
plt.tight_layout()
plt.show()

"""### Feature importance for Important Parameters"""

fi_plot(neo_params_imp)

"""### Feature importance for All Parameters"""

fi_plot(neo_params_all)

"""## Gradient Boosting Classifier Model"""

40
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import KFold
from sklearn.ensemble import GradientBoostingClassifier

lr_list = [0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1]

X = neo_params_all.drop('pha', axis = 1)
y = neo_params_all['pha']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.2,
random_state = 42)
for learning_rate in lr_list:
gb_clf = GradientBoostingClassifier(n_estimators = 20, learning_rate =
learning_rate, max_features = 2, max_depth = 2, random_state = 0)
gb_clf.fit(X_train, y_train)
print("Learning rate: ", learning_rate)
print("Accuracy score (training): {0:.3f}".format(gb_clf.score(X_train,
y_train)))
print("Accuracy score (validation): {0:.3f}".format(gb_clf.score(X_val,
y_val)))

def gbc(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = GradientBoostingClassifier(n_estimators = 20, learning_rate = 0.5,
max_features = 2, max_depth = 2, random_state = 0)
classifier.fit(X_train, y_train)

41
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

"""### Results of GBC for Important Parameters"""

result_params_imp = gbc(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

result_params_all = gbc(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""## SVC Model"""

import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score,
classification_report

42
def svc_lin(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

def svc_rbf(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = SVC(kernel='rbf', random_state = 0, gamma = .01, C = 1)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

43
"""### Results of SVC for Important Parameters"""

result_params_imp = svc_lin(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

result_params_imp = svc_rbf(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

"""### Results of SVC for All Parameters"""

result_params_all = svc_lin(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

result_params_all = svc_rbf(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""## Logistic Regression Model"""

from sklearn.linear_model import LogisticRegression

44
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

def logrm(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
acc = logreg.score(X_test, y_test)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result

"""### Results of LRM for Important Parameters"""

result_params_imp = logrm(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')

"""### Results of LRM for All Parameters"""

45
result_params_all = logrm(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')

"""# ANN"""

from sklearn.neural_network import MLPRegressor

from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix

"""## Running for Important Parameters"""

def set_up_for_ann(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=25, batch_size=16)
_, accuracy = model.evaluate(X_test, y_test)
print()

46
print('Accuracy of test: %.2f' % (accuracy*100))

set_up_for_ann(neo_params_imp)

"""## Running for All Parameters"""

def set_up_for_ann2(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = Sequential()
model.add(Dense(12, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=25, batch_size=16)
_, accuracy = model.evaluate(X_test, y_test)
print()
print('Accuracy of test: %.2f' % (accuracy*100))

result_params_all_ann = set_up_for_ann2(neo_params_all)

import pandas as pd
df = pd.read_csv(filepath_or_buffer='sample_data/neo_data.csv',
parse_dates=['Close Approach Date', 'Close Approach Date (Full)'])
df['year'] = df['Close Approach Date'].dt.year

47
df.head()

df.info()

df.nunique()

from plotly.express import bar

bar(data_frame=df, x='Limited Name', color='Orbiting Body')

bar(data_frame=df, x='Limited Name', color='year')

from plotly.express import histogram

histogram(data_frame=df[df['year'] <2024].sort_values(by='Limited Name'),
x='year', color='Limited Name', nbins=124).show()
histogram(data_frame=df[df['year'] <2024].sort_values(by='Limited Name'),
x='year', color='Is Potentially Hazardous', nbins=124).show()

from plotly.express import scatter

scatter(data_frame=df, x='Limited Name', y='year', color='Miss Distance (km)')

scatter(data_frame=df, x='Limited Name', y='year', color='Is Potentially

Hazardous')

histogram(data_frame=df, x='Miss Distance (km)', color='Is Potentially

Hazardous',)

48
histogram(data_frame=df, x='Relative Velocity (km/s)', color='Is Potentially
Hazardous',)

scatter(data_frame=df, y='Relative Velocity (km/s)', x='Miss Distance (km)',

hover_name='Limited Name', size='Min Diameter (km)', color='year',
log_x=True)

scatter(data_frame=df, y='Relative Velocity (km/s)', x='Miss Distance (km)',

hover_name='Limited Name', size='Min Diameter (km)',
color='Is Potentially Hazardous', symbol='Orbiting Body', log_x=True)

scatter(data_frame=df, x='year', y='Miss Distance (km)', hover_name='Limited

Name', size='Min Diameter (km)',
color='Is Potentially Hazardous', log_y=True)

49
APPENDIX B

Figure B.1 NEO Representation

The output image depicts a scatterplot with miss distance on the y-axis and year on
the x-axis, showcasing the trajectory of various celestial objects over time. The
hazards, represented by red data points, indicate objects that have a close approach
to Earth, potentially posing a threat. Conversely, non-hazardous objects are
denoted by blue data points, indicating a safer distance from Earth.

The visualization allows for easy identification of hazardous objects based on their
proximity to Earth during specific time periods. Objects with a miss distance below
a certain threshold are highlighted in red, alerting observers to potential collision
risks. In contrast, blue data points signify objects that maintain a safe distance from
Earth, alleviating concerns of impact.

50
APPENDIX C
PUBLICATION STATUS

51
REFERENCES

1. G. Deva ram ganesh, P. Vidyullatha, M. Ravi krishna, S. T. Prapulla, A. Pavan

Saran and P. Ramya, "Machine Vision based Object Detection using Deep
Learning Techniques," 2023 3rd International Conference on Smart Data
Intelligence (ICSMDI), Trichy, India, 2023, pp. 466-470, doi:
10.1109/ICSMDI57622.2023.00088.

2. T. Hoffman et al., "Near-Earth Object Surveyor Project Preliminary Design,"

2023 IEEE Aerospace Conference, Big Sky, MT, USA, 2023, pp. 1-19, doi:
10.1109/AERO55745.2023.10115663.

3. H. Lin, C. Sun and Y. Liu, "OBBStacking: An Ensemble Method for Remote

Sensing Object Detection," in IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 16, pp. 2112-2120, 2023, doi:
10.1109/JSTARS.2023.3243168.

4. D. Khajuria, A. Sharma, N. Sharma and M. Mangla, "Classification and

Comparative Analysis of Earth's Nearest Objects using Machine Learning
Models," 2023 10th International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, India, 2023, pp. 16-23.

5. L. Z. Ladeira, F. Santos, L. Cléopas, P. Buteneers and L. Villas, "NEO-NDA:

Neo Natural Language Data Augmentation," 2022 IEEE 16th International
Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2022, pp.
99-102, doi: 10.1109/ICSC52841.2022.00021.J. Clerk Maxwell, A Treatise on
Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.

6. R. N. Ranaweera and T. Fernando, "Prediction of Potentially Hazardous

Asteroids using Deep Learning," 2022 2nd International Conference on Advanced
Research in Computing (ICARC), Belihuloya, Sri Lanka, 2022, pp. 31-36, doi:
10.1109/ICARC54489.2022.9753945.

7. R. J. Wainscoat, K. Chambers, R. Weryk, Y. Ramanjooloo, M. Huber and E.

Magnier, "Recent Advances in the Pan-STARRS Search for Near-Earth Objects,"
2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 2022, pp. 1-6,

52
doi: 10.1109/AERO53065.2022.9843625.

8. T. Hromakina et al., "NEOROCKS project: surface properties of small near-

Earth asteroids," in Monthly Notices of the Royal Astronomical Society, vol. 520,
no. 2, pp. 3143-3150, Sept. 2022, doi: 10.1093/mnras/stad330.

9. C. Rosu and V. Bacu, "Asteroid Image Classification Using Convolutional

Neural Networks," 2021 IEEE 17th International Conference on Intelligent
Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 2021,
pp. 3-10, doi: 10.1109/ICCP53602.2021.9733484.

10. V. Bahel, P. Bhongade, J. Sharma, S. Shukla and M. Gaikwad, "Supervised

Classification for Analysis and Detection of Potentially Hazardous Asteroid," 2021
International Conference on Computational Intelligence and Computing
Applications (ICCICA), Nagpur, India, 2021, pp. 1-4, doi:
10.1109/ICCICA52458.2021.9697222.

11. M. Veto et al., "Ball Aerospace Spacecraft and Systems for the Near-Earth
Object Surveyor Mission," 2021 IEEE Aerospace Conference (50100), Big Sky,
MT, USA, 2021, pp. 1-14, doi: 10.1109/AERO50100.2021.9438216.

12. L. Gentile, A. Capria, M. Conti, C. Moscardini and M. Martorella, "Resident

Space Object Passive Bistatic Radar Detection Using DVB-S2 Signals," 2018 19th
International Radar Symposium (IRS), Bonn, Germany, 2018, pp. 1-7, doi:
10.23919/IRS.2018.8448182.

13. R. A. Mohammed and M. Y. Alradi, "Model of planetary radar for tracking

near-Earth asteroids," 2017 Fifth International Conference on Aerospace Science
& Engineering (ICASE), Islamabad, Pakistan, 2017, pp. 1-4, doi:
10.1109/ICASE.2017.8374262.

14. C. P. Yang et al., "An architecture for mitigating near earth object's impact to
the earth," 2017 IEEE Aerospace Conference, Big Sky, MT, USA, 2017, pp. 1-13,
doi: 10.1109/AERO.2017.7943594.

15. N. Rowell, S. Parkes and M. Dunstan, "Image Processing for Near Earth Object
Optical Guidance Systems," in IEEE Transactions on Aerospace and Electronic

53
Systems, vol. 49, no. 2, pp. 1057-1072, APRIL 2013, doi:
10.1109/TAES.2013.6494399.

NEO Hazard Analysis & Visualization
No ratings yet
NEO Hazard Analysis & Visualization
18 pages
Analysing Earth Near Object & Visualizing Hazard
No ratings yet
Analysing Earth Near Object & Visualizing Hazard
5 pages
ML Project Final
No ratings yet
ML Project Final
33 pages
NEOs: Insights for Planetary Scientists
No ratings yet
NEOs: Insights for Planetary Scientists
17 pages
IAC 14, A3,4, x24688
No ratings yet
IAC 14, A3,4, x24688
9 pages
Machine Learning-Based Space Risk Management Asteroid and Solar Flare Prediction
No ratings yet
Machine Learning-Based Space Risk Management Asteroid and Solar Flare Prediction
6 pages
Smallwood 2004
No ratings yet
Smallwood 2004
9 pages
Morad
No ratings yet
Morad
12 pages
Data Mining 1AC
No ratings yet
Data Mining 1AC
37 pages
Nuevo Radiotelescopio de Arecibo
0% (1)
Nuevo Radiotelescopio de Arecibo
96 pages
ST Space 073E
No ratings yet
ST Space 073E
28 pages
Henghes B Thesis Final
No ratings yet
Henghes B Thesis Final
213 pages
A History of Near-Earth Object Research Tagged
No ratings yet
A History of Near-Earth Object Research Tagged
407 pages
Automatic Detection of Asteroids and Meteoroids
No ratings yet
Automatic Detection of Asteroids and Meteoroids
7 pages
Orbital Data Applications For Space Objects Conjunction Assessment and Situation Analysis 1st Edition Lei Chen Instant Download
No ratings yet
Orbital Data Applications For Space Objects Conjunction Assessment and Situation Analysis 1st Edition Lei Chen Instant Download
162 pages
Optical Surveys For Space Debris: Thomas Schildknecht
No ratings yet
Optical Surveys For Space Debris: Thomas Schildknecht
71 pages
67e1955abf385 PHHH
No ratings yet
67e1955abf385 PHHH
9 pages
NVO: The Key Questions and A Synopsis
No ratings yet
NVO: The Key Questions and A Synopsis
7 pages
Advanced Analysis and Visualization of Space Weather Phenomena PDF
No ratings yet
Advanced Analysis and Visualization of Space Weather Phenomena PDF
148 pages
10.1016@s0273 11770100493 8
No ratings yet
10.1016@s0273 11770100493 8
11 pages
Spacecraft Collision Avoidance Challenge - Design and Results of A Machine Learning Competition
No ratings yet
Spacecraft Collision Avoidance Challenge - Design and Results of A Machine Learning Competition
21 pages
Banda2014 PDF
No ratings yet
Banda2014 PDF
8 pages
Asteroid Impact Avoidance Strategies
No ratings yet
Asteroid Impact Avoidance Strategies
3 pages
Asteroid Impacts and Modern Civilization: Can We Prevent A Catastrophe?
0% (1)
Asteroid Impacts and Modern Civilization: Can We Prevent A Catastrophe?
20 pages
Orbital Debris and Climate Change
No ratings yet
Orbital Debris and Climate Change
11 pages
The IMPACT IMPERATIVE - Laser Ablation For Deflecting Asteroids, Meteoroids, and Comets From Impacting The Earth
No ratings yet
The IMPACT IMPERATIVE - Laser Ablation For Deflecting Asteroids, Meteoroids, and Comets From Impacting The Earth
62 pages
Near Space Objects
No ratings yet
Near Space Objects
22 pages
ATESTAT ENGLEZA Word
No ratings yet
ATESTAT ENGLEZA Word
11 pages
An Introduction To Space Exploration TPDas
No ratings yet
An Introduction To Space Exploration TPDas
102 pages
Acta Astronautica: E. Fantino, R. Flores, G. Donnarumma, D. Canales, K.C. Howell
No ratings yet
Acta Astronautica: E. Fantino, R. Flores, G. Donnarumma, D. Canales, K.C. Howell
12 pages
American Geophysical Union 2014: Examining Our Instrumented World For Greater Understanding - NESO (Poster)
No ratings yet
American Geophysical Union 2014: Examining Our Instrumented World For Greater Understanding - NESO (Poster)
1 page
Main Draft
No ratings yet
Main Draft
72 pages
Efficient Asteroid Trajectory Prediction
No ratings yet
Efficient Asteroid Trajectory Prediction
83 pages
NEO Chapter 1
No ratings yet
NEO Chapter 1
49 pages
STEM Project R3 Final
No ratings yet
STEM Project R3 Final
26 pages
Exploring The Universe From Near Space To Extra Galactic A Collection of Research Reviews On Contemporary Astrophysics and Space Science Banibrata Mukhopadhyay PDF Available
100% (1)
Exploring The Universe From Near Space To Extra Galactic A Collection of Research Reviews On Contemporary Astrophysics and Space Science Banibrata Mukhopadhyay PDF Available
92 pages
An Introduction To Space Exploration TPDas
No ratings yet
An Introduction To Space Exploration TPDas
100 pages
Analyzing Orbital Parameters To Classify Asteroids
No ratings yet
Analyzing Orbital Parameters To Classify Asteroids
48 pages
NASA Piloted Flight To A Near-Earth Object 2007
100% (2)
NASA Piloted Flight To A Near-Earth Object 2007
56 pages
Whidden 2019 AJ 157 119
No ratings yet
Whidden 2019 AJ 157 119
15 pages
Technical Evaluation Report
No ratings yet
Technical Evaluation Report
6 pages
NASA Earth and Space Science Fellowship (NESSF) Program - 2009
100% (1)
NASA Earth and Space Science Fellowship (NESSF) Program - 2009
11 pages
Disaster Management and Assesment Drone
No ratings yet
Disaster Management and Assesment Drone
24 pages
Asteroid Hazard Prediction Using Machine Learning A Comparative Analysis of Different Algorithms
No ratings yet
Asteroid Hazard Prediction Using Machine Learning A Comparative Analysis of Different Algorithms
5 pages
Threats From Neos
No ratings yet
Threats From Neos
16 pages
21MDT0131 VL2021220504623 Pe003
No ratings yet
21MDT0131 VL2021220504623 Pe003
67 pages
NASA Science Support Office - 2023 Annual Report - NP-2023-11-187-GSFC
No ratings yet
NASA Science Support Office - 2023 Annual Report - NP-2023-11-187-GSFC
36 pages
Neodys Sla FP
No ratings yet
Neodys Sla FP
26 pages
Orbit Analysis of Asteroid 6249 Jennifer
No ratings yet
Orbit Analysis of Asteroid 6249 Jennifer
16 pages
Detecting Natural Disasters Using Deep Learning
No ratings yet
Detecting Natural Disasters Using Deep Learning
12 pages
Space Math I PDF
No ratings yet
Space Math I PDF
48 pages
N.A.S.A Asteroid Mining Project - Areeb C5
No ratings yet
N.A.S.A Asteroid Mining Project - Areeb C5
59 pages
Committee For A Pilot Study On Database Interfaces, National Research Council
No ratings yet
Committee For A Pilot Study On Database Interfaces, National Research Council
144 pages
GLEX 2021,4,2,1, x63150
No ratings yet
GLEX 2021,4,2,1, x63150
23 pages
Vision and Voyages For Planetary Science in The Decade 2013 2022 Committee On The Planetary Science Decadal Survey Kindle & PDF Formats
100% (2)
Vision and Voyages For Planetary Science in The Decade 2013 2022 Committee On The Planetary Science Decadal Survey Kindle & PDF Formats
131 pages
Baietto Thesis Final 2
No ratings yet
Baietto Thesis Final 2
119 pages
Space Debris: Global Cleanup Strategy
No ratings yet
Space Debris: Global Cleanup Strategy
13 pages
686629main Exploration Goals and Destinations Report FINAL August 2012
No ratings yet
686629main Exploration Goals and Destinations Report FINAL August 2012
11 pages
The Impact of The Space Environment On Space Systems: Siii'ii
No ratings yet
The Impact of The Space Environment On Space Systems: Siii'ii
202 pages
AI-Enhanced ECG Diagnostics
No ratings yet
AI-Enhanced ECG Diagnostics
15 pages
ML Ques Bank For 2nd Unit PDF
No ratings yet
ML Ques Bank For 2nd Unit PDF
5 pages
COVID-19 Rapid Test Kit: GeneFinder
No ratings yet
COVID-19 Rapid Test Kit: GeneFinder
2 pages
For The Candidates Admitted From The Academic Year 2019-2020 Batch Onwards
No ratings yet
For The Candidates Admitted From The Academic Year 2019-2020 Batch Onwards
24 pages
Confusion Matrix ROC
No ratings yet
Confusion Matrix ROC
8 pages
Analysis of Social Communication Questionnaire SCQ
No ratings yet
Analysis of Social Communication Questionnaire SCQ
9 pages
PR - Exam - WS21 - 22 Sol
No ratings yet
PR - Exam - WS21 - 22 Sol
19 pages
Strain Gage Circuit Analysis
No ratings yet
Strain Gage Circuit Analysis
7 pages
Original Work
No ratings yet
Original Work
10 pages
Fraud Detection in Financial Transaction Project
No ratings yet
Fraud Detection in Financial Transaction Project
18 pages
Exploring The Use of Different Feature Levels of CNN For Anomaly Detection
No ratings yet
Exploring The Use of Different Feature Levels of CNN For Anomaly Detection
5 pages
Informatics in Medicine Unlocked: Aya Hassouneh, A.M. Mutawa, M. Murugappan
No ratings yet
Informatics in Medicine Unlocked: Aya Hassouneh, A.M. Mutawa, M. Murugappan
9 pages
Monografia Meningitis
No ratings yet
Monografia Meningitis
4 pages
The Bayesian Elastic Net
No ratings yet
The Bayesian Elastic Net
20 pages
Phast CFD Validation and Verification
No ratings yet
Phast CFD Validation and Verification
46 pages
The Strain Index To Analyze Jobs For Risk of Distal Upper Extremity Disorders: Model Validation
No ratings yet
The Strain Index To Analyze Jobs For Risk of Distal Upper Extremity Disorders: Model Validation
4 pages
Credit Risk Analysis Model
100% (1)
Credit Risk Analysis Model
3 pages
Philippines TB Control Report 2022
No ratings yet
Philippines TB Control Report 2022
197 pages
Verification Summary Report Template
No ratings yet
Verification Summary Report Template
4 pages
Games and Information Assignment 1
No ratings yet
Games and Information Assignment 1
3 pages
Measurement Uncertainty Guide
No ratings yet
Measurement Uncertainty Guide
26 pages
Appendicitis Score Sensitivity Study
No ratings yet
Appendicitis Score Sensitivity Study
5 pages
Modified 10062023 Rice Grain Project
No ratings yet
Modified 10062023 Rice Grain Project
7 pages
Kitai, Sale Et Al., Specifity of Joint Angle in Isometric Training
No ratings yet
Kitai, Sale Et Al., Specifity of Joint Angle in Isometric Training
5 pages
Advanced Assessment: Interpreting Findings and Formulating Differential Diagnoses. Fourth Edition. ISBN 0803668945, 978-0803668942
100% (30)
Advanced Assessment: Interpreting Findings and Formulating Differential Diagnoses. Fourth Edition. ISBN 0803668945, 978-0803668942
23 pages
EBM
No ratings yet
EBM
4 pages
Nciph ERIC16
No ratings yet
Nciph ERIC16
4 pages
Diagnosis of Leptospirosis Utilizing Modified Faine's Criteria
No ratings yet
Diagnosis of Leptospirosis Utilizing Modified Faine's Criteria
8 pages
FINAL UPDATED Apr 2024 PROMS and Screeners 1
No ratings yet
FINAL UPDATED Apr 2024 PROMS and Screeners 1
50 pages
MRA 203 T Extra Notes
No ratings yet
MRA 203 T Extra Notes
37 pages