Report II
Report II
VISUALIZING HAZARDS
A PROJECT REPORT
Submitted by
Amirthavarshini P M (913120205005)
Swathi K (913120205073)
in
INFORMATION TECHNOLOGY
1
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
i
ACKNOWLEDGEMENT
We thank the almighty for giving us moral strength to work on the project for the
past of months.
Our sincere thanks to our respected Principal Dr. P. ALLI for the encouragement
and motivation they offered us during the period of our research study.
We convey our thanks to our guide, Mrs. A.VINORA, for her innovative
suggestions and valuable guidance.
We would also wish to extend our sincere gratefulness to all faculty members of
the Department of Information Technology for their valuable guidance
throughout the course of our project. We also thank to our parents and friends
who provided moral and physical support.
ii
ABSTRACT
iii
TABLE OF CONTENT
ABSTRACT iii
LIST OF FIGURES v
1. INTRODUCTION 1
1.1. Overview 1
1.2. Asteroids and Comets 4
1.3. Introduction to Machine Learning 6
2. LITERATURE SURVEY 9
2.1. Research papers 9
2.2. Bottlenecks of Survey papers 13
3. METHODOLOGY 16
3.1. Dataset 16
3.2. Data Pre-Processing 17
3.3. Proposed System 18
3.4. Software Requirements 23
3.5. Hardware Requirements 23
5. CONCLUSION 30
APPENDIX A 32
APPENDIX B 50
APPENDIX C 52
REFEREENCES 54
iv
LIST OF FIGURES
4.3 Scatterplot 28
v
CHAPTER 1
INTRODUCTION
1
Additionally, Congress noted that similar NEOs have impacted Earth or entered
its atmosphere multiple times throughout history, indicating a recurrent threat that
continues to pose risks in the future. The discovery timeline of certain NEOs, often
identified only days before their closest approach to Earth, highlights the
challenges in timely detection and monitoring of these objects.
Despite recent discoveries of large NEOs, Congress expressed concerns about the
significant number of undetected NEOs, emphasizing the ongoing need for
comprehensive and continuous efforts in NEO detection and tracking.
The findings outlined in the NEO Survey Act highlighted the critical importance
of implementing a robust NEO Survey program. Such a program aims to enhance
detection, tracking, cataloging, and characterization efforts, ultimately improving
preparedness, early warning systems, and potential mitigation strategies to
mitigate the potential risks associated with NEO impacts on Earth.
1.1.3. Direction
The NEO Survey Act, as outlined in Appendix A, amended the Space Act of 1958
with specific directives regarding the responsibilities of the National Aeronautics
and Space Administration (NASA) in addressing the threat of near-Earth asteroids
and comets. Congress declared that safeguarding the general welfare and security
of the United States necessitates directing NASA's unique competence towards
detecting, tracking, cataloging, and characterizing these near-Earth objects. The
primary goal is to provide early warning and mitigation strategies for potential
hazards they pose to Earth.
Furthermore, the Act mandates the Administrator of NASA to plan, develop, and
2
implement a comprehensive Near-Earth Object Survey program. This program
focuses on detecting, tracking, cataloging, and characterizing the physical
properties of near-Earth objects that are equal to or greater than 140 meters in
diameter. The objective is to assess the threat posed by such objects to Earth's
safety. A key target of the Survey program is to achieve a 90 percent completion
rate of its near-Earth object catalogue, based on statistically predicted populations
of these objects, within 15 years from the date of the Act's enactment.
The findings outlined in the NEO Survey Act highlighted the critical importance
of implementing a robust NEO Survey program. Such a program aims to enhance
detection, tracking, cataloging, and characterization efforts, ultimately improving
preparedness, early warning systems, and potential mitigation strategies to
mitigate the potential risks associated with NEO impacts on Earth.
3
1.2. Asteroids and Comets
Asteroids and comets are the primary focus of discussion in this study due to their
potential hazard to Earth. Astronomers differentiate between these celestial bodies
based on their appearance. Asteroids typically appear as star-like points of light
when observed, while comets exhibit a diffuse appearance with visible tails.
The existence of asteroids was not known until roughly 200 years ago when
telescopes advanced enough to detect these small celestial objects. On the other
hand, people have been aware of comets since ancient times due to their distinctive
tails visible to the naked eye.
Asteroids are further categorized into different groups based on their orbits
relative to Earth:
- Apollos: Orbits cross Earth's orbit with a period exceeding 1 year.
- Atens: Orbits cross Earth's orbit with a period less than 1 year.
- Amors: Orbits entirely outside Earth's orbit.
- Interior Earth Objects (IEOs): Orbits entirely within Earth's orbit.
The distribution of these asteroid categories within the Near-Earth Object (NEO)
population is illustrated in Figure 1, providing insight into the varying orbits and
potential impact risks associated with different types
4
Figure 1.1. Frequency of NEOs by Size, Impact Energy, and Magnitude
The analysis presented in Figure 2 utilizes a constant power law to estimate the
number of Near-Earth Objects (NEOs) based on their size. This power law
approximation demonstrates a significant increase in the number of NEOs as the
diameter of the objects decreases by an order of magnitude. Specifically, there is
a hundred-fold increase in the number of NEOs as the diameter decreases.
It's important to note that while the estimates presented in Figure 2 are valuable,
they may have a margin of error within a factor of two or three in any given size
class. This margin of error is due to limitations in observations, particularly in size
classes where there are not enough data points to form a statistically valid sample.
5
In this report, the term PHO (Potentially Hazardous Objects) is used to indicate
potential threats among NEOs. It is noted that NEOs smaller than 1 km are
predominantly asteroids, as comets do not significantly contribute to the
population below this size threshold. This distinction helps in categorizing and
understanding the potential hazard levels associated with different sizes of NEOs,
aiding in risk assessment and mitigation planning.
6
1.3.1 Logistic regression (LR)
Decision Tree (DT) algorithms construct tree-like structures where internal nodes
represent feature tests, and leaf nodes represent class labels. DTs are intuitive and
easy to interpret, making them valuable for decision-making processes. They
recursively partition the data based on feature splits, leading to hierarchical
decision rules that classify instances into distinct classes.
Random Forest (RF) is an ensemble method that combines multiple decision trees
to improve predictive accuracy and reduce overfitting. Each tree in the forest is
7
trained on a random subset of the training data, and the final prediction is made by
aggregating the predictions of individual trees. RF mitigates the high variance
associated with deep decision trees, making it robust and suitable for various
classification tasks.
8
CHAPTER 2
LITERATURE SURVEY
9
3. H. Lin, C. Sun and Y. Liu, "OBBStacking: An Ensemble Method for
Remote Sensing Object Detection," in IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing
Introduction an innovative ensemble method called OBBStacking,
designed for geo-spatial sensing object identification using oriented bounding
boxes (OBBs). OBBStacking effectively combines OBB-based detection results
and leverages confidence indicators from deep learning object detectors. In
experiments conducted in 2021, OBBStacking outperformed the Gaofen Challenge
in mechanized Earth Observation Image Interpretation with High-Resolution,
showcasing enhanced performance on datasets like DOTA and FAIR1M.
4. D. Khajuria, A. Sharma, N. Sharma and M. Mangla, "Classification and
Comparative Analysis of Earth's Nearest Objects using Machine Learning
Models," 2023 10th International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, India, 2023
His research he uses machine learning to classify asteroids (from the NASA
Nearest Earth Objects dataset) as hazardous or non-hazardous, aiming to unveil
space mysteries. Visualization aids data analysis, culminating in an efficient
prediction model evaluation based on accuracy, showcasing machine learning's
efficacy in tackling astronomical challenges.
5. L. Z. Ladeira, F. Santos, L. Cléopas, P. Buteneers and L. Villas, "NEO-
NDA: Neo Natural Language Data Augmentation," 2022 IEEE 16th
International Conference on Semantic Computing (ICSC), Laguna Hills, CA,
USA, 2022
Introduced NEO-NDA (NEO Natural Language Data Augmentation), a
versatile tool for generating and rebalancing textual datasets. Unlike many existing
approaches, NEO-NDA operates across multiple languages, offering a wide range
of modifications to generate novel data samples. Their system significantly
10
enhances the performance of machine learning models, often doubling their
effectiveness compared to original datasets without augmentation.
6. R. N. Ranaweera and T. Fernando, "Prediction of Potentially Hazardous
Asteroids using Deep Learning," 2022 2nd International Conference on
Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, 2022,
They utilized deep neural networks to analyze the orbital data of Near-Earth
Asteroids (NEAs). By allowing the model to learn intricate patterns and features
within the dataset, the approach efficiently classifies objects as harmful or harmless
based on acquired insights. This enhances the efficiency of identifying and
characterizing celestial bodies, contributing to efforts aimed at preventing potential
impacts on Earth.
7. R. J. Wainscoat, K. Chambers, R. Weryk, Y. Ramanjooloo, M. Huber
and E. Magnier, "Recent Advances in the Pan-STARRS Search for Near-
Earth Objects," 2022 IEEE Aerospace Conference (AERO), Big Sky, MT,
USA, 2022,
They introduced the Pan-STARRS telescopes on Maui search for near-
Earth objects, with Pan-STARRS2 overcoming early technical challenges like
mirror coating and mount issues. With improved reporting latency and same-night
follow-up, it aids in discovering interstellar objects like 'Oumuamua and is poised
as a potent tool for identifying impacting asteroids.
8. T. Hromakina et al., "NEOROCKS project: surface properties of small
near-Earth asteroids," in Monthly Notices of the Royal Astronomical
Society, vol. 520, no. 2, pp. 3143-3150, Sept. 2022,
They explained about the NEOROCKS project at Haute-Provence
Observatory classified 42 NEOs, adding to a dataset of 93 objects, identifying
potential spacecraft-accessible targets like (163014) 2001 UA5 and 2017 SE19 for
planetary defense research and missions.
11
9. C. Rosu and V. Bacu, "Asteroid Image Classification Using Convolutional
Neural Networks," 2021 IEEE 17th International Conference on Intelligent
Computer Communication and Processing (ICCP), Cluj-Napoca, Romania,
2021,
Introduction of a convolutional neural network (CNN) model designed
specifically for classifying astronomical images to detect asteroids. Trained on
original data from the Telescope of Newton in La Palma, the system aims to
develop a mechanized system for accurate asteroid detection while minimizing
false negatives. Through training with solitary images, the model achieved a
remarkable 94.01% recall rate, showcasing its potential for precise asteroid
detection
10. V. Bahel, P. Bhongade, J. Sharma, S. Shukla and M. Gaikwad,
"Supervised Classification for Analysis and Detection of Potentially
Hazardous Asteroid," 2021 International Conference on Computational
Intelligence and Computing Applications (ICCICA), Nagpur, India, 2021.
Emphasized the growing importance of artificial intelligence (AI) in
space science, particularly for mitigating asteroid-related risks. Their study
involved a meticulous understanding of Potentially Harmful Asteroids (PHAs) and
employed supervised machine learning (ML) to assess asteroid hazard based on
specific parameters. Among various classification algorithms, Random Forest
stood out with an average F1-score of 99.22% and an exceptional 99.9% accuracy,
offering a robust solution for asteroid risk evaluation.
11. M. Veto et al., "Ball Aerospace Spacecraft and Systems for the Near-
Earth Object Surveyor Mission," 2021 IEEE Aerospace Conference (50100),
Big Sky, MT, USA, 2021.
Developed by Ball Aerospace for NASA's NEOSM, aims to detect >140-
meter Near-Earth Objects, enhancing planetary defense and informing policy with
a comprehensive survey leveraging Ball's spacecraft and infrared expertise from
12
NEOWISE. This collaborative effort combines scientific, engineering, and
institutional heritage to advance understanding of NEOs' physical properties and
their origins in the early solar system.
12. L. Gentile, A. Capria, M. Conti, C. Moscardini and M. Martorella,
"Resident Space Object Passive Bistatic Radar Detection Using DVB-S2
Signals," 2018 19th International Radar Symposium (IRS), Bonn, Germany,
2018,
This work explores the feasibility of a DVB-S2 passive multi-bistatic
radar for tracking Resident Space Objects (RSOs), contrasting its benefits with
active ground-based systems. Unlike optical systems, this radar operates day and
night, regardless of weather conditions, offering a robust monitoring solution for
Earth's orbit.[12]
13. R. A. Mohammed and M. Y. Alradi, "Model of planetary radar for
tracking near-Earth asteroids," 2017 Fifth International Conference on
Aerospace Science & Engineering (ICASE), Islamabad, Pakistan, 2017.
Discussed the increasing hazards posed by asteroids in the asteroid belt
located between Mars and Jupiter, within a range of 2.2 to 3.2 astronomical units
(AU). The study highlighted the use of planetary radars for tracking celestial
objects and safeguarding Earth. It detailed a simulation-based approach that
involved analyzing radar cross-sectional data and modeling signal processing using
MATLAB and Simulink. This approach allowed for the examination of moving
asteroids' behavior and contributed to strategies aimed at mitigating asteroid threats
.
13
Ganesh et al., presents a challenge due to its high computational resource
requirements. This limitation could hinder real-time applications or deployment in
resource-constrained environments, impacting the practicality and scalability of
the approach. Similarly, T. Hoffman et al.'s NEOS Project, while making
significant progress, may
encounter challenges related to the detailed design phase. The lack of specific
details about algorithms or methodologies used could impede a comprehensive
assessment of technical feasibility, potentially affecting the project's effectiveness
and future scalability.
Furthermore, the effectiveness of data augmentation techniques like NEO-NDA by
Z. Ladeira et al. may be influenced by their ability to handle extremely imbalanced
datasets or rare classes. Generalizability across different scenarios and datasets is
crucial for ensuring the reliability and applicability of these techniques in diverse
contexts.
Additionally, while CNN models like the one introduced by C. Rosu et al. for
asteroid detection showcase impressive recall rates, their generalization to diverse
datasets and environmental conditions may require further validation and
optimization for real-world deployment.Moreover, advancements in geo-spatial
sensing object identification, as proposed by H. Lin's OBBStacking method, may
vary in performance based on the complexity and variability of the data. Thorough
evaluation and benchmarking against diverse datasets are necessary to assess the
method's reliability and effectiveness across different scenarios. Lastly,
simulation-based approaches, as discussed in R. A. Mohammed's work on
planetary radars for asteroid tracking, may face challenges in accurately predicting
real-world scenarios, especially concerning unexpected asteroid behaviors or
events.
In conclusion, our proposed Advanced Analysis and Classification of Near-Earth
Objects (NEOs) project aims to overcome these identified bottlenecks effectively.
14
By leveraging a comprehensive approach that includes data quality assurance,
robust preprocessing techniques, and advanced machine learning algorithms such
as Random Forest, Neural Networks, and Gradient Boosting, we enhance
predictive capabilities while ensuring scalability, generalizability, and
interpretability. The project's focus on real-world applications, planetary defense
implications, and international collaboration further strengthens its efficacy in
advancing space science and AI for NEO analysis and classification.
15
CHAPTER 3
METHODOLOGY
3.1. Dataset
NASA's Near-Earth Object (NEO) Program manages a vast and essential dataset
crucial for understanding asteroids and comets that come close to Earth. This
dataset is multifaceted, comprising critical components like the NEO Confirmation
Page and the NEO Earth Close-Approaches Database.
In summary, NASA's NEO dataset is a vital tool for studying, monitoring, and
mitigating potential hazards from NEOs, ensuring Earth's safety from potential
impact events.
16
Figure 3.1. Neo dataset
Data preprocessing for NEO (Near-Earth Object) analysis involves several steps to
clean, transform, and prepare the data for analysis in Excel. Here's a structured
paragraph outlining the process:
The data preprocessing of NEO information in Excel begins with data collection
from reliable sources such as NASA's NEO Program database or other
astronomical repositories. The collected data typically includes attributes like NEO
designation, discovery date, observation arc, orbital parameters (e.g., semi-major
axis, eccentricity, inclination), estimated diameter, and potential impact
probabilities.
Once the data is imported into Excel, the first step is data cleaning, which involves
identifying and handling missing values, correcting errors, and removing
duplicates if any. This ensures the dataset's integrity and accuracy for further
analysis.
17
The next step is data transformation, where certain columns may need to be
formatted or calculated to derive new variables or metrics relevant to NEO
analysis. For example, converting dates to a standardized format, calculating
orbital velocities or impact probabilities based on provided data, or categorizing
NEOs based on size ranges.
The proposed system aims to enhance the accuracy and efficiency of Near-Earth
Object (NEO) classification using advanced machine learning techniques. By
leveraging state-of-the-art algorithms, including Random Forest, Neural Networks
& Gradient Boosting, the system aims to improve the predictive capabilities for
identifying Potentially Harmful Asteroids among NEOs. The system's
methodology involves feature extraction, preprocessing, and training the machine
learning models on a comprehensive dataset containing key NEO attributes. This
approach enables the development of a robust classification system capable of
categorizing NEOs based on various physical and orbital parameters. To address
18
challenges such as missing data and categorical variables, the system employs data
preprocessing techniques, ensuring the quality and uhcompleteness of the input
dataset. The utilization of machine learning algorithms facilitates more accurate
predictions, contributing to the early identification of PHAs and enhancing
planetary defense capabilities. The proposed system not only advances the
accuracy of NEO classification but also provides a scalable framework for ongoing
research and monitoring of celestial objects. By combining astronomical data with
cutting-edge machine learning methodologies, the system stands to significantly
impact our understanding of NEOs and improve our ability to assess potential
threats from these cosmic bodies.
The Advanced Analysis and Classification of Near-Earth Objects (NEOs) project
is a multidisciplinary endeavor aimed at enhancing our understanding of NEOs,
which are comets & asteroids that come in close proximity to the Earth. The
primary objectives of this project are to:
1. Data Collection and Preparation: The project begins with gathering and
processing a comprehensive dataset of NEOs. This dataset contains crucial
information about these celestial objects, including their names, sizes, albedo,
orbital parameters, and potential threat levels (Potentially Hazardous Asteroids -
PHAs).
19
characteristics. Particular attention is given to identifying PHAs that pose a
potential threat to Earth. This risk assessment is a crucial component of planetary
defense efforts.
4. Deep Learning and Machine Learning Models: The project explores the use of
deep learning and ML models to make predictions and classifications based on the
dataset. Random Forest, Artificial Neural Networks (ANNs) & Gradient Boosting
are employed to enhance the accuracy of risk assessment & classification.
6. Safety and Education: The project has implications for safety and education. By
sharing the results and insights with the public, educational institutions, and space
agencies, this initiative contributes to raising awareness about NEOs and the
importance of planetary defense.
7. International Collaboration: Given the global nature of the NEO threat, this
project may involve international collaboration, as organizations like NASA, ESA,
and other space agencies actively monitor NEOs.
Pseudo code:
20
Step 1: Load Dataset
neo_data = load_neo_dataset()
21
Step 1: Load Dataset with Hazard Information
neo_hazard_data = load_neo_dataset_with_hazard_info()
The above bar chart 3.1 shows the feature importance for important parameters
such as a-Semi-major axis,
q-Perihelion distance, i- Inclination (deg), e-Orbital eccentricity, diameter-
Diameter of the object (km) , tp- Time of perihelion passage, & albedo-Albedo
(reflectivity).
22
Figure 3.3 feature importance for all parameters
The bar charts 3.2 shows the feature importance of all the parameters such as a-
Semi-major axis , q- Perihelion distance, e- Orbital eccentricity, i- Inclination
(degrees), om- Longitude of the ascending node (degrees), w- Argument of
perihelion (degrees), ma- Mean anomaly (degrees), ad:- aphelion distance, n-Mean
motion (degrees per day), tp- Time of perihelion passage.
23
CHAPTER 4
Accuracy Score:
Accuracy is a measure of how often a classification model correctly predicts the
outcome. It is calculated as the ratio of the number of correct predictions to the
total number of predictions.
Formula: Accuracy = (Number of Correct Predictions) / (Total Number of
Predictions) * 100%
A higher accuracy score indicates that the model is making more correct
predictions, but it may not provide a complete picture if the dataset is imbalanced
(i.e., one class dominates the dataset).
Precision:
Precision is a measure of the model's ability to correctly identify positive instances
(True Positives) out of all instances predicted as positive (True Positives + False
Positives).
Formula: Precision = True Positives / (True Positives + False Positives)
Precision is essential in scenarios where the cost of false positives is high. A higher
precision indicates fewer false positives, but it may lead to more false negatives if
24
the model becomes too conservative.
Recall (Sensitivity):
Recall, also known as Sensitivity or True Positive Rate, measures the model's
ability to correctly identify positive instances out of all actual positive instances
(True Positives + False Negatives).
Formula: Recall = True Positives / (True Positives + False Negatives)
Recall is crucial when the cost of false negatives is high. A higher recall indicates
fewer false negatives, but it may lead to more false positives if the model becomes
too aggressive in predicting positives.
F1 Score:
The F1 score is the harmonic mean of precision and recall, providing a balance
between the two metrics. It is useful when there is an uneven class distribution or
when both false positives and false negatives are equally important.
Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
The F1 score ranges from 0 to 1, with 1 being the best possible score. It considers
both false positives and false negatives, making it a comprehensive metric for
model evaluation.
These evaluation metrics play a crucial role in assessing the performance of
classification models and determining which model is best suited for a specific task
based on its ability to balance accuracy, precision, recall, and F1 score.
25
Figure 4.1 Accuracy scores
The table 3.1 represents the accuracy of gradient boosting classifier, ada boost
classifier, Random forest classifier, decision tree classifier , logistic Regression, K
Neighbors classifier & SVC. Among this it is clearly concluded that Random forest
classifier has higher accuracy than other algorithms
26
Figure 4.2 Precision score for all parameters
The bar chart 3.3 shows the Precision score for all parameters, from that we observe
that random forest classifier has the highest precision score, so we do the analysis
by it.
27
Figure 4.3 Scatterplot
The figure 3.4 shows the representation of the Near-Earth object as non-hazardous
or hazardous using a scatterplot.
The NEO (Near-Earth Object) project has yielded important results and insights
into the characteristics and dynamics of these celestial bodies. Here's a summary
of the project's key findings and a discussion of their implications:
Epoch Data:
- The inclusion of epoch data allows researchers to track the positions of NEOs
at specific times, which is essential for predicting their future positions and
potential close approaches to Earth.
- The identification of PHAs is crucial for early warning systems and planetary
28
defense strategies. Knowing which NEOs pose a threat helps in devising strategies
to mitigate potential impacts.
- Epoch data enhances the precision of NEO tracking, enabling better
predictions for astronomical observations and potential space missions.
29
CHAPTER 5
One of the key contributions of the NEO project is the comprehensive data it
provides on the orbital and physical characteristics of NEOs. This data is crucial
for enhancing our preparedness and response capabilities in the event of a NEO
encounter. By understanding the precise movements and trajectories of these
objects, we can improve our early warning systems and take proactive measures to
mitigate potential impacts.
The inclusion of epoch data in the NEO project has significantly refined our ability
to predict NEO movements. Epoch data refers to the precise timestamps associated
with observations, allowing for more accurate predictions of an object's future path.
This enhancement is vital for improving our forecasting capabilities and providing
timely alerts about NEOs that may come close to Earth.
Moreover, the NEO project focuses on gathering more detailed and accurate data
on NEOs beyond their basic classification. This includes gathering information
about their size, shape, surface properties, and other relevant characteristics. Such
detailed data is instrumental in developing advanced prediction models that further
improve tracking and forecasting accuracy.
30
Collaboration is a cornerstone of the NEO project, as it actively engages with
international space agencies and observatories. This collaboration ensures a
broader dataset, combining observations and insights from various sources
worldwide. By pooling resources and expertise, the project can access a wealth of
information, contributing to a more comprehensive understanding of NEOs.
Lastly, the NEO project advocates for funding space missions dedicated to
studying NEOs up close. These
missions provide invaluable insights into the composition, behavior, and potential
threats posed by NEOs. By supporting such missions, the project aims to deepen
our understanding of these objects and strengthen our ability to safeguard Earth
from potential impacts.
31
APPENDIX A
CODE
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style = "ticks", color_codes=True)
ad = pd.read_csv('sample_data/dataset.csv')
print(ad.shape)
ad.head()
def extract_neo(df):
neo = df.loc[df.neo == 'Y',:]
neo.to_csv('neo.csv')
return neo
def extract_pha(df):
pha = df.loc[df.pha == 'Y',:]
pha.to_csv('pha.csv')
return pha
def extract_neo_pha(df):
32
neo_pha = df.loc[df.pha == 'Y' & df.neo == 'Y',:]
neo_pha.to_csv('neo_pha.csv')
return neo_pha
neo = extract_neo(ad)
neo.head()
pha = extract_pha(ad)
pha.head()
print(neo.shape)
cols = list(neo.columns)
print(cols)
neo.describe()
new_ad = ad[ad['pha'].notna()]
33
new_ad = new_ad[new_ad['neo'].notna()]
neo[param_imp].info()
plot_scatter(neo, param_imp)
neo[param_imp].describe()
neo[param_oth].info()
neo[param_oth].describe()
plot_scatter(neo, param_oth)
import pyforest
import warnings
warnings.filterwarnings("ignore")
from sklearn import metrics
34
from sklearn.metrics import accuracy_score
neo_params_imp = neo_final[params_imp]
neo_params_imp
neo_params_all = neo_final[params_all]
neo_params_all
neo_params_imp.isnull().sum()
def fill_nan(df):
for column in df.columns:
if column == 'pha' or column == 'neo':
df[column].fillna(value = int(0), inplace=True)
else:
35
df[column].fillna(int(df[column].mean()), inplace=True)
return df
neo_params_imp = fill_nan(neo_params_imp)
neo_params_all = fill_nan(neo_params_all)
neo_params_all
neo_params_imp
plt.figure(figsize = (20,20))
sns.heatmap(data = round(neo_params_imp.corr(), 2), annot = True)
plt.show()
plt.figure(figsize = (20,20))
sns.heatmap(data = round(neo_params_all.corr(), 2), annot = True)
plt.show()
36
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,
GradientBoostingClassifier
37
model_scores_params_imp = model_fit_score(models, neo_params_imp)
model_scores_params_imp.sort_values('Score', ascending = False)
plt.figure(figsize = (20,10))
sns.barplot(data = model_scores_params_imp.sort_values('Score').T)
plt.title('Model Precision Score For Important Parameters')
plt.xticks(rotation = 90);
plt.show()
plt.figure(figsize = (20,10))
sns.barplot(data = model_scores_params_all.sort_values('Score').T)
plt.title('Model Precision Score For All Parameters')
plt.xticks(rotation = 90);
plt.show()
def rfc(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
38
clf = RandomForestClassifier(n_estimators = 10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result
result_params_imp = rfc(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')
result_params_all = rfc(neo_params_all)
print("Accuracy: ", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')
39
import matplotlib.pyplot as plt
def fi_plot(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_
sorted_indices = np.argsort(importances)[::-1]
plt.title('Feature Importance')
plt.bar(range(X_train.shape[1]), importances[sorted_indices], align = 'center')
plt.xticks(range(X_train.shape[1]), X_train.columns[sorted_indices], rotation =
90)
plt.tight_layout()
plt.show()
fi_plot(neo_params_imp)
fi_plot(neo_params_all)
40
import pandas as pd
import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import KFold
from sklearn.ensemble import GradientBoostingClassifier
def gbc(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = GradientBoostingClassifier(n_estimators = 20, learning_rate = 0.5,
max_features = 2, max_depth = 2, random_state = 0)
classifier.fit(X_train, y_train)
41
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result
result_params_imp = gbc(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')
result_params_all = gbc(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score,
classification_report
42
def svc_lin(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result
def svc_rbf(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
classifier = SVC(kernel='rbf', random_state = 0, gamma = .01, C = 1)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
acc = accuracy_score(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result
43
"""### Results of SVC for Important Parameters"""
result_params_imp = svc_lin(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')
result_params_imp = svc_rbf(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')
result_params_all = svc_lin(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')
result_params_all = svc_rbf(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')
44
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
def logrm(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
acc = logreg.score(X_test, y_test)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
result = [acc, cm, cr]
return result
result_params_imp = logrm(neo_params_imp)
print("Accuracy:", result_params_imp[0], '\n')
print("Confusion Matrix:\n", result_params_imp[1], '\n')
print("Classification Report:\n", result_params_imp[-1], '\n')
45
result_params_all = logrm(neo_params_all)
print("Accuracy:", result_params_all[0], '\n')
print("Confusion Matrix:\n", result_params_all[1], '\n')
print("Classification Report:\n", result_params_all[-1], '\n')
"""# ANN"""
def set_up_for_ann(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=25, batch_size=16)
_, accuracy = model.evaluate(X_test, y_test)
print()
46
print('Accuracy of test: %.2f' % (accuracy*100))
set_up_for_ann(neo_params_imp)
def set_up_for_ann2(df):
X = df.drop('pha', axis = 1)
y = df['pha']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)
model = Sequential()
model.add(Dense(12, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=25, batch_size=16)
_, accuracy = model.evaluate(X_test, y_test)
print()
print('Accuracy of test: %.2f' % (accuracy*100))
result_params_all_ann = set_up_for_ann2(neo_params_all)
import pandas as pd
df = pd.read_csv(filepath_or_buffer='sample_data/neo_data.csv',
parse_dates=['Close Approach Date', 'Close Approach Date (Full)'])
df['year'] = df['Close Approach Date'].dt.year
47
df.head()
df.info()
df.nunique()
48
histogram(data_frame=df, x='Relative Velocity (km/s)', color='Is Potentially
Hazardous',)
49
APPENDIX B
The output image depicts a scatterplot with miss distance on the y-axis and year on
the x-axis, showcasing the trajectory of various celestial objects over time. The
hazards, represented by red data points, indicate objects that have a close approach
to Earth, potentially posing a threat. Conversely, non-hazardous objects are
denoted by blue data points, indicating a safer distance from Earth.
The visualization allows for easy identification of hazardous objects based on their
proximity to Earth during specific time periods. Objects with a miss distance below
a certain threshold are highlighted in red, alerting observers to potential collision
risks. In contrast, blue data points signify objects that maintain a safe distance from
Earth, alleviating concerns of impact.
50
APPENDIX C
PUBLICATION STATUS
51
REFERENCES
52
doi: 10.1109/AERO53065.2022.9843625.
11. M. Veto et al., "Ball Aerospace Spacecraft and Systems for the Near-Earth
Object Surveyor Mission," 2021 IEEE Aerospace Conference (50100), Big Sky,
MT, USA, 2021, pp. 1-14, doi: 10.1109/AERO50100.2021.9438216.
14. C. P. Yang et al., "An architecture for mitigating near earth object's impact to
the earth," 2017 IEEE Aerospace Conference, Big Sky, MT, USA, 2017, pp. 1-13,
doi: 10.1109/AERO.2017.7943594.
15. N. Rowell, S. Parkes and M. Dunstan, "Image Processing for Near Earth Object
Optical Guidance Systems," in IEEE Transactions on Aerospace and Electronic
53
Systems, vol. 49, no. 2, pp. 1057-1072, APRIL 2013, doi:
10.1109/TAES.2013.6494399.
54