Chapter 1 New
Chapter 1 New
INTRODUCTION
The pendulum of educational training has moved from traditional, lecture style, and fact
memorization techniques to exploratory, hands-on, guided learning, and back again throughout
the past decade. As we advance, learning has taken on a more project and problem-based
learning (PjBL and PBL) approach with kinesthetic for students. Not only are many schools
diving into PjBL/PBL, but there has also been an increase in incorporating digital technologies
The PjBL/PBL and technological shift has challenged the current structure of several education
programs and professional development to focus more on ‘doing’. This has led to the integration
of makerspaces into the conventional educational system and has enabled the advancement of
STEM education.
In recent years, the emphasis on Science, Technology, Engineering, and Mathematics (STEM)
education in the world has had a tremendous impact on the influx of enthusiasts into the field.
This has led to the almost universal preoccupation with STEM education to shape innovation and
development. In the USA, the 2013 report from the Committee on STEM Education stressed that
"The jobs of the future are STEM jobs," with STEM competencies increasingly required not only
many education systems, fueled by the actual shortages in the current and future STEM
1
workforce. It is common knowledge that STEM is densely populated by a majority gender and
people group. Descriptive statistics show that a smaller percentage of women and minorities
persist in a STEM field major as compared to male and nonminority student. This has limited the
employment shortages and STEM education in general and advancing global development.
Before the introduction of the current acronym, “STEM”, the National Science Foundation
(NSF) was using an acronym of “SMET” that referred to four distinct fields: Science,
Mathematics, Engineering and Technology. In recent years, STEM has been a buzzword among
stakeholders.
Despite its buzzword status, an ambiguity exists in the definition of STEM (Madden, Beyers, &
O’Brien, 2016). The ambiguity has led to different definitions and occupational applications
among stakeholders across the United States (Ntemngwa & Oliver, 2018), because several
programs within various scientific communities have utilized it. Thus, the definition differs
STEM and STEM education has gained considerable ground in this century being adopted
among individuals, people groups and institutions. Despite the wide use, “STEM education” is
often used interchangeably with the term “STEM” in the literature. However, STEM and STEM
education are two different terms having two different meanings, because STEM education
means a lot more than the four-letter acronym of STEM. As a result of this no standard definition
2
STEM is a curriculum based on the idea of educating students in four specific disciplines
The combining of the disciplines was a strategic decision made by scientists, technologists,
engineers, and mathematicians to combine forces and create a stronger political voice (Sezai
Kocabas, Burhan Ozfidan, Lynn M. Burlbaw 2019). While STEM education according to
where rigorous academic concepts are coupled with real-world lessons as students apply Science,
Technology, Engineering and Mathematics in contexts that make the connection between school,
community, work and the global enterprise establishing the development of STEM literacy and
One of the issues for researchers and curriculum developers lies in the different interpretations of
STEM education and STEM integration. There are variants of STEM which include and is not
limited to SMET, STEAM, METALS, STREAM, STEEM, THAMES, MINT etc. Although this
major this is especially the case. Women and minority groups are less likely to persist in a
STEM field major during college than their peer (National Science Board). It is believed that a
strong STEM workforce is important for future development. Thus, it is essential to understand
Over the last few decades, representation of women and minorities in STEM fields post-college
has increased, but gaps still remain (The National Center for Education Statistics,NCES). Much
3
of this may be due to supply – there are fewer women and minorities receiving bachelor’s
degrees in STEM fields. This is for two reasons: both groups are less likely to pick a STEM
major initially, and if they do, less likely to remain in that major (NCES).
This research is carried out in response to this with a view of increasing the STEM pipeline
should be a core goal of every stakeholder. Although it is still not well understood the factors
that affect persistence in STEM majors during college, hence this research is carried out.
The aim of the study is to develop a hybrid chain model that combines statistical models to
predict patterns in STEM Education among Minorities and to illustrate the results using
visualization techniques.
1. To appraise and select from a cross section of statistical models the best-fit models for
2. To combine the selected models into a hybrid chain model, validating each using standard
validation techniques and applying the new hybrid model to the data set.
3. To visualize the result of the data analysis performed with the hybrid chain model with
Scientific and technological innovations have become increasingly important as we face the
benefits and challenges of both globalization and a knowledge-based economy. The recruitment
4
of women to STEM fields has been a difficult battle historically with “pipeline” methods. In
order to accelerate global growth and development it is important to highlight the areas of
To achieve this, the study has scientifically explained the need for an increase in the population
of STEM professionals not just to the minority populace but also the majority as this will lead to
It is strongly believed that the more minority and underrepresented populace migrate towards a
field of study, the more attractive it becomes and encouraging to the general public.
Specifically, this study extends the current frameworks of gender and people groups by exploring
first sample data on professionals and undergraduates on the subtopic of women and minorities
The scope of the system involves cleaning of data acquired in various file extension formats like
csv, dat, db, sql, mbd, ddf, dta, oid to name a few and also includes test for granularity, analysis
using different statistical models and techniques. The analysis covers a wide range from
regression, association to correlation analysis. Information derived from this stage is then
communicated and presented to the target audience in form of graphs, area charts, scatterplot,
This research includes and is not limited to all patterns concern with visualization and
storytelling with data not excluding model appraisal and elimination. It also includes the
5
However, due to availability of data this research is formally conducted using data from Lagos
State Polytechnic Full time and Lagos State University. It is also assumed that the target
study.
Association: is a relationship between two random variables which makes them statistically
dependent. It refers to rather a general relationship without specifics of the relationship being
mentioned.
Correlation: Correlation is a statistical measure that indicates the extent to which two or more
variables fluctuate together. A positive correlation indicates the extent to which those variables
increase or decrease in parallel; a negative correlation indicates the extent to which one variable
K-12 education: for kindergarten to 12th grade, is an American expression that indicates the
range of years of supported primary and secondary education found in the United States, which
is similar to publicly supported school grades prior to college in several other countries, such as
Afghanistan, Australia, Canada, Ecuador, China, Egypt, India, Iran, Philippines, South Korea,
Turkey.
public/private facility for making, learning, exploring and sharing that uses high tech to no tech
tools.
6
Model: a model is a representation of an idea, an object or even a process or a system that is
disciplines that attempts to determine the strength of the relationship between one dependent
variable (usually denoted by Y) and a series of other changing variables (known as independent
variables).
SMET: Science, Technology, Engineering and Mathematics (STEM), previously Science, Math,
Engineering and Technology (SMET), is a term used to group together these academic
disciplines.
STEM: Science, technology, engineering, and math. An interdisciplinary form of all subjects.
Not excluding other subjects like Art that also play a large roll (we chose to use the acronym
academic concepts are coupled with real-world lessons as students apply science, technology,
engineering, and mathematics in contexts that make connections between school, community,
work, and the global enterprise enabling the development of STEM literacy and with it the
STEM integration: an effort to combine some or all of the four disciplines of science,
technology, engineering, and mathematics into one class, unit, or lesson that is based on
7
STEM jobs: are careers where STEM workers use their knowledge of science, technology,
engineering, or math to try to understand how the world works and to solve problems.
STEM pipeline: is the educational pathway for students in the fields of science, technology,
STEM workforce: the STEM workforce includes 74 occupations including computer and
mathematical occupations, engineers and architects, physical scientists, life scientists, and health-
related jobs such as healthcare practitioners and technicians (but not health care support workers
such as nursing aides and medical assistants). It includes workers with associate degrees and
Problem-Based Learning (PBL): Student-driven and teacher guided in-depth inquiry that
Project-Based Learning (PjBL): Student-driven and teacher guided in-depth inquiry that
Kinesthetic learning: or tactile learning is a learning style in which learning takes place by the
students carrying out physical activities, rather than listening to a lecture or watching
demonstrations.
holds a smaller percentage within a significant subgroup than the subset holds in the general
8
population. Specific characteristics of an underrepresented group vary depending on the
communicate a message. Visualization through visual imagery has been an effective way to
communicate both abstract and concrete ideas since the dawn of humanity.
9
CHAPTER TWO
LITERATURE REVIEW
Data exist in various forms and numerous capacities. The availability and adoption of newer
more powerful devices coupled with ubiquitous access to global networks has driven the creation
of new sources for data which can be independently managed searched and analyzed. Data can
be broadly classified into three categories based on the nature of its properties and structure.
They include Structured data, Semi-structured data and Unstructured data. These stratified
Structured data is data that adheres to a predefined data model usually stored in a traditional
tabular format with relationship between the different rows and columns. Common examples of
structured data are Excel files or SQL databases. Unstructured data does not have a predefined
data model and cannot be understood by the typically data storage programs. Unstructured
simply means that it is datasets (typical large collections of files) that are not stored in a
structured database format. Unstructured data has an internal structure, but it's not predefined
Semi-structured data is a form of structured data that does not conform with the formal structure
of data models associated with relational databases. can be stored in several ways: in
10
2.1.2 Data Management
Managing and analyzing data have always offered the greatest benefits and the greatest
challenges for organizations of all sizes and across all industries. The convergence of emerging
technologies and reduction in costs for everything from storage to compute cycles have
transformed the data landscape and made new opportunities possible. As all these technology
factors converge, it is transforming the way we manage and leverage data (Hurwitz, Nugent,
Halper, Kaufman 2013). Data management is an administrative process that includes acquiring,
validating, storing, protecting, and processing required data to ensure the accessibility, reliability,
and timeliness of data. It comprises all disciplines related to managing data as a valuable
resource.
Data Analysis is the systematic application of statistical and logical techniques to describe the
data scope, modularize the data structure, condense the data representation, illustrate via images,
tables, and graphs, and evaluate statistical inclinations, probability data, to derive meaningful
conclusions (Simran 2020). These procedures enable the retrieval of inference while eliminating
redundancy, chaos and ensuring integrity from the continual process of data generation.
It is a process of inspecting, cleansing, transforming and modeling data with the goal of
(Wikipedia 2020). Data analysis uses analytical and logical reasoning to gain information from
the data to find meaning in data so that the derived knowledge can be used to make informed
decisions.
11
2.2 Statistical Data Analysis
Statistics is a form of mathematical analysis that uses quantified models, representations and
synopses for a given set of experimental data or real-life studies. Statistics is basically a science
that involves data collection, data interpretation and finally, data validation. Statistical data
quantitative research, which seeks to quantify the data, and typically, applies some form of
statistical analysis. Quantitative data basically involves descriptive data, such as survey data and
observational data.
Statistical data analysis generally involves some form of statistical tools, the most well-known
Statistical tools are the Mean, Arithmetical average of numbers, Median and Mode, Range,
Dispersion, Standard deviation, inter quartile range, coefficient of variation, Regression etc.
Although these tools span a wide variety of applications, they do not each come without their
pitfalls.
Data in statistical data analysis consists of variable(s). Sometimes the data is univariate or
multivariate. Univariate data is a type of data which consists of observations on only a single
characteristic or attribute. The analysis of univariate data is thus the simplest form of analysis
since the information deals with only one quantity that changes. It does not deal with causes or
relationships and the main purpose of the analysis is to describe the data and find patterns that
Bivariate data is type of data involves two different variables. The analysis of this type of data
deals with causes and relationships and the analysis is done to find out the relationship among
the two variables. Multivariate data is when data involves more than one variable It is like
12
bivariate data but contains more than one dependent variable. The ways to perform analysis on
this data depends on the goals to be achieved. Some of the techniques are regression analysis,
abstraction that is like visual perception. Visualization has many definitions but the most
common one, which is found in literature, is the use of computer-supported, interactive, visual
representations of data to amplify cognition. Cognition means the power of human perception or
the acquisition or the use of knowledge. Visualization is the graphical representation that best
conveys the complicated ideas clearly, precisely, and efficiently. These graphical depictions are
easily understood and interpreted effectively. The main goal of visualization is to analyze,
The process of visualization can be broken into six steps which include: Mapping, Selection,
Presentation, Interactivity, Human Factors and Evaluation. These steps provide a systematic
pattern and structure for the representation of Data and information. Visualization uses various
techniques to represent data and information. These techniques are the major categories in which
1. Data Visualization: includes standard quantitative methods such as Tables, Pie Charts,
Area Chart, Line Graphs, etc. They are visual representations of quantitative data in
schematic form (either with or without axes), they are all-purpose, mainly used for
13
2. Information Visualization: such as semantic networks or tree maps, entity-relationship
diagrams, flow charts, Venn diagrams, dataflow diagrams. It is defined as the use of
interactive visual representation of data to amplify cognition. This means that the data is
3. Concept Visualization: like a concept map or Gantt chart; these are methods to elaborate
(mostly) qualitative concepts, ideas, plans, and analysis through the help of rule-guided
4. Metaphor Visualization: like metro map or story template are effective and simple
templates to convey complex insights. Visual Metaphors fulfil a dual function, first they
position information graphically to organize and structure it. They convey insight of
6. Compound Visualization: consists of two or more of the formats. They can be complex
A. Persistence of women and minorities in STEM field is it the school that matters.
Persistence in any of the STEM majors is much lower for women and minorities suggesting that
this may be a leaky joint in the STEM pipeline for these two groups of students. Descriptive
14
analysis statistics show that a smaller percentage of women and minorities persist in a STEM
field major as compared to male and non-minority students. Regression analysis shows that the
differences in preparation and educational experiences of these students explains much of the
ii. There is little evidence that having a larger percentage of female STEM
STEM major.
rates.
The underrepresented minorities do not integrate into STEM academic community at the same
rate as their non-minority counterparts. This research longitudinally examined the integration of
underrepresented minorities into the STEM community by using growth- curve analysis to
measure the development of TIMIS's key variables (science efficacy, identity, and values) from
junior year through the postbaccalaureate year. From the research it was clearly shown that
15
quality mentorship and research experience occurring in the junior and senior years were
positively related to student science efficacy, identity and values at the same time period.
16
CHAPTER THREE
RESEARCH METHODOLOGY
Before advancing into the research and analysis of the problem space adequate research was
carried on material contained research conducted on the problem space in various countries
This is as a result of the initial lack of evidence to support the claims of previous study on the
research project within the specified country. After careful feasibility study, it was discovered
that notable scholars have worked on similar research projects on the problem space. These
research works were done using sample populations, premade data or even purely based on
speculations. And such the rationale behind this study as the previous ones were not adequate to
One of the most comprehensive study done on the problem space was performed by Ugwuanyi
and Mathematics Education in Nigeria: A case of structural Equation modeling using a sample of
255 undergraduate students in universities in Enugu state Nigeria. The study used statistical
analytical tools like questionnaires and interest scale for data collection, structural equation
modeling statistical approach was used during the analysis of data and the result tested using
Root Mean Square Error Approximation (RMSEA) and Confirmatory Factor Index (CFI). In his
recommendation he depicted that the support of parents for academic influence greatly improves
17
Another study was conducted by Simon Walter Umoh on the problems and prospects of effective
at the educational practices in Nigeria today and how the delivery at our K-12 Educational
system has impeded the performance and interest of students in STEM courses pipeline. He
clearly stated the problems of STEM delivery in schools such as Deficient curriculum, poor
Teacher supply, unavailability of teaching facilities and overloaded syllabus to name a few. He
then gave the recommendation of only professional teachers allowed to take STEM courses and
O.A. Akinsowon and F.Y. Osisanwo Conducting a study on Enhancing Interest in Science
Technology and Mathematics for Nigerian Female folk showed the general statistic of the influx
and educational level of the female gender as compared to their counterpart. They also gave the
rationale behind the need to close the gap. Noting the factors causing the lack of interest such as
individual interest factor, the teachers' factors, curriculum development and the home factor. The
study then gave accommodation on how is close the gap to increase the gender representation of
According to the first study performed by Ugwuanyi Okeke on the Determinants of University
Careful analysis showed that the study did not consider the limitations of the statistical model
used in the analysis of data gathered. Also, the data of student population used was biased and
there was no consideration of the role of professionals and mentors in influencing STEM
18
Simon Walter Umoh conducted a study on the problems and prospects of effective Science
Technology Engineering Mathematics Education Delivery in Nigeria. This study was also a
milestone towards the education of stakeholders regarding the problem space. After careful
consideration it was discovered that the study was conducted on pure speculation without
Finally, the study conducted by O.A. Akinsowon and F.Y. Osisanwo Conducting on Enhancing
Interest in Science Technology and Mathematics for Nigerian Female folk seemed the closet to
the research topic due to its direct similarity to the problem space. The research was conducted
with premade data from the national bureau of statistics and there was little statistical data
analysis done. Also, the data available was from 2013 and so there is need for a new analysis to
be performed.
From the analysis of the pre-existing system. It can be deduced the little scientific research done
on the analysis of the gender roles in the STEM pipeline. There has also been no information on
This study is planned towards a detailed scientific analysis of the problem space, adequately
modelling and simulating a view of the result through visualization techniques. Using the many
model approach this study aims to combine three different analytical modeling techniques. This
will ensure the result is less biased and make up for the pitfalls of each individual model.
19
3.6 Research Methods
i. Tools: The software that will be used during this research include: R Studio, GNU
Octave, Excel, and Tableau. The model proposed is the many model approach and will
ii. Source materials: The source of the research materials was gotten through the help of
the internet through various sources like Google scholar, Z-library, Wikipedia to name a
few.
This research uses primary source of data gotten from the Lagos state polytechnic
information communication center and which will also be supported by pre-made data from
The first Step is data gathering which will be gotten from the institutional repository of Lagos
State polytechnic and the Internet after this the data is inspected and cleaned. This stage
eliminates chaos, anomaly and redundancy and normalizes it based on the first to third
normalization rules.
20
Then the data is queried using the research questions prepared before now and then results are
collected and analyzed using the statistical analysis techniques applying various models to
After correlation of the result of the analysis, the information is then displayed using 3D
The new system is to be implemented on student data from Lagos State polytechnic Full Time on
a 3-year period showing the correlation patterns, gender roles and the effect of role model
The new system uses statistical techniques to derive information from the data gathered such
techniques are used to derive patterns and show relationships between collected data. Then
21
3.9 Activity Diagram
Activity diagrams are graphical representations of the workflow activities. The diagram below
22
3.0 Flowchart Diagram
The diagram below shows a diagrammatic representation of the processes involved in the
research.
23
CHAPTER FOUR
Data Analysis is the science of evaluating data using statistical analytical tools to discover
6. Validation of model.
Data Gathering and Sorting: This was carried out using Excel and other Spreadsheet packages.
Data from the National Bureau of Statistics was organized in tables but there were still anomalies
associated with it. Also, data from the school database was raw and it had to be sorted,
Data Cleaning and Inspection: This was done in a spreadsheet environment, where the data
was normalized and placed into categories. Also, all missing data were replaced with a non-null
value of zero and then categorized with the rest of the data.
24
Model Elimination and Appraisal: Four models were initially selected namely, Markov Chain
Monte Carlo, Linear Regression Model, Logistic Regression Model and Correlation Model.
a. Markov Chain Monte Carlo Model: This model was discarded after careful study, due to its
characteristics
b. Linear Regression Model: This model was selected based on its assumptions and its ability
to predict.
c. Logistic Regression Model: This model was also selected based on its assumptions and its
d. Correlation Model: This model was selected based on its ability to determine the
Model Selection and Statistical Methods: After the appraisal of the models, three out of the
four models were selected based on their individual characteristics, assumptions and parameters
as the best fit for the problem space. The models were implemented in the following order.
a. Correlation model: It defines the degree of similarity between two variables and monitors
their variation. This research employed three types of correlation model. The Pearson,
25
rxy = Pearson r correlation coefficient between x and y
n = number of observations
Pearson correlation assumes that both variables are normally distributed and have a bell
curved shape.
ii. Spearman correlation: This is used to calculate the degree of association between
variables. It does not hold any assumption about the distribution of the variable. The
formula is
n= number of observations
iii. Kendall Correlation: is used in the test of statistical association based on the ranks of
26
Nc = number of concordant pairs
b. Linear Regression Model: models the relationship between two variables by fitting a linear
equation to the data. A linear regression model uses an equation of a line Y = a + bX. Y is
the dependent variable while X is the independent variable. Linear Regression model
c. Logistic Regression Model: is used to model the probability of a certain event occurring.
27
Hybrid Chain Model: This is the final model derived from the pipeline of the three models used
in the analysis.This involved the process of creating a data pipe line in which the input of one
process is the output of the other. The data set for each institution is first run through a
correlation analysis test of its individual parameters to determine the most correlated variables
and the level of relationship. After the final result is gotten out it is then validated using the AIC
test to determine the accuracy of the test. Then the selected parameters is run through a Linear
Statistical Methods: The Statistical Methods used for the research include Skewness, Mean
Validation of model: Each model was validated to confirm its accuracy and performance. After
validation only two models performed satisfactorily, Correlation model and Linear Regression
model. The Akaike information criterion (AIC) was used for the test of the parameters of the
model. It involves a comparative analysis between different possible data to determine the best
fit for the data. The AIC is calculated from the number of independent variables used to build the
28
Coding and Debugging: This was carried out using the R Studio IDE. The codes for each model
were written using the R programming language and each line of code was interpreted with R
4.0.3 and debugged based on the syntax and semantics of the language.
This research uses data from two sources. The first is from Lagos State Polytechnic database
which contains raw data of student enrollment. The source contains data from a five-year period
among students across departments. Data was then collected on the number of male and female
students enrolled in the different departments of the polytechnic, dividing the departments into
The second source of data was from the National Bureau of Statistics which contained
Longitudinal data on Lagos State University for a period of three years. This data surveyed
students in their various departments and the staff of the institution. The data was placed in
The gender composition of the students in their various departments is measured as the average
percentage of STEM undergraduate majors. The percentage of STEM undergraduate majors that
are females is normalized, to avoid measuring a general trend at the university for there to be
more or fewer women. To measure the female faculty members available to serve as mentors for
undergraduate students interested in STEM one would ideally want the faculty gender
composition of the STEM field departments. A higher percentage of faculty members that are
female would therefore mean more opportunity for female students to identify with female role
models in the field. However, data on faculty gender composition by department is not readily
29
available for all years needed here. This led to the exclusion of some parts of the data set
available.
Descriptive Statistics are shown in tables 1 and 2 of the two institutions of study. Table 1 shows
the distribution of Lagos State Polytechnic male and female students' population with the
summary of the table. The summary shows the standard deviation of the population and the level
of skewness at 1.09 which is highly skewed and proves a normal multivariate distribution. Panel
A of the table shows the distribution of the sum of male and female students across all
departments with the total number female students superseding the male students with no
correlation between the sums. Chart A of the table shows the distribution of the departments and
the total number of students also showing the summary of the male and female values.
Table 2 shows the shows the distribution of Lagos State University male and female students'
population and its summary. The summary shows a skewness of 0.45 which is symmetrical.
Panel A of the distribution shows the total number of students in the distribution, the total male
students are more than their female counterparts in this distribution. Chart A of the table shows
the distribution of the departments and the total number of students also showing the summary of
Statistics show that departments like Engineering have a higher male percentage of students
compared to Management Sciences. Table 3 and 4 show the classification of the students based
on categories and departments. Illustrating which departments fall under the STEM classification
and visualizing the tables data alongside faculty members of each institution.
30
4.2.1 Statistical Methods and Models
The descriptive Statistics indicate a variance in population density across departments. The data
show that the average number of females in managerial courses is considerable higher than those
in traditional STEM courses. To effectively predict the student's population in the categories, the
following Statistical models were employed is a hybrid chain model which combines the
following.
The correlation coefficient was measured to determine the relationship between the variable and
their relative movement. Pearson, Spearman and Kendall rank correlation were used in the
analysis of data to show correlation. Fig 4.1 and 4.1.1 shows the correlation analysis of student's
population in the institutions. The result of this analysis was then validated using an AIC test to
determine the best fit model parameters which was then implemented in the regression model.
Linear Regression model was used to predict future trends in the population.
The correlation analysis was done using three correlation coefficient parameters Pearson,
Spearman and Kendall. The data sets showed different results for the correlation test. Pearson
correlation for Lagos State Polytechnic data set showed positive correlation relationship between
the female and male students, a weak positive relationship between the academic male staff and a
weak negative relationship between the nonacademic male staff and the female students. For the
31
Spearman and Kendall correlation, both the female academic staff and the male non academic
staff showed correlation although Spearman had a stronger correlation of both positive and
negative.
The data set passed to a validation criteria the AIC which determines the best-fit model
parameters for the regression model. Lagos State Polytechnic data set AIC test determined the
criteria that best predicts the data set. The result gave an accuracy of 52.4% for the minimum and
maximum accuracy value. The Lagos State University data set showed a different behaviour.
The correlation test for all three correlation coefficient showed a strong positive relationship
between the male and female students and a very week correlation between the staff and the
minority students. The best-fit AIC test for the data set was a combination of parameters. The
4.3.1 Inference
The distribution of female minority students in STEM and Non-STEM fields of study is highly
dependent on
ii. The departments and categories of the fields of study (STEM and Non-STEM).
Also, the distribution of male and female staff members does not necessarily impede the influx
32
4.3.2 Assumptions
i. The data set is assumed to represent the population of students and staffs in higher
The relationship between the parameters that show a positive correlation is illustrated using a 3D
Scatter plot in Graphs 3 and 4. Graph 5 and 6 show a Box plot of the parameters in view and
Graphs 7 and 8 show a Density plot which describes the level of skewness of the parameters.
Graph 9 and 10 illustrated the relationship between parameters in the data set for both
institutions.
33
CHAPTER FIVE
5.1 Summary
At the end of the project the data sets were explored and analyzed. The analyzed data was then
The first chapter introduced the concepts of STEM and STEM Education. It described the aim
The next chapter described Data Analyses, Data Visualization and Statistical Modeling.
Explaining how the concepts fit together in the research work. It also described the work of two
The third chapter gave an extensive detail on the methods used in the research, explaining the
limitations of past research and the justification of this research work. It also showed the detail of
The next chapter explained in detail the process of the research work, describing each step and
showing the results of the research making references to the diagram and figures.
The last chapter, which is the present chapter, summarizes the previous chapters, concluding the
5.2 Conclusion
This research examines the relationship between female students in STEM fields of study, and
other variables in the data set. The paper examines cross-sectional and longitudinal data on
34
students in two institutions across three years of study, carefully appraising different models to
determine the best-fit model. The data set is passed through a Hybrid Chain model pipeline
Results show a positive correlation between the male and female students. The AIC determined
the best-fit for Linear Regression model, before the prediction analysis of the data set. This result
was then visualized using techniques such as density plot, box-plot and 3d scatter plot.
35
REFERENCES
Akinsowon, O. A., & Osisanwo, F. Y. (2014). Enhancing interest in sciences, technology and
mathematics (STEM) for the Nigerian female folk. International Journal of Information
Science, 4(1), 8-12.
Griffith, A. L. (2010). Persistence of women and minorities in STEM field majors: Is it the
school that matters? Economics of Education Review, 29(6), 911-922.
Hackman, S. T., Zhang, D., & He, J. (2021). Secondary school science teachers’ attitudes
towards STEM education in Liberia. International Journal of Science Education, 1-24.
Hurwitz, J. S., Nugent, A., Halper, F., & Kaufman, M. (2013). Big data for dummies. John Wiley
& Sons.
Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business
professionals. John Wiley & Sons.
Kocabas, S., Ozfidan, B., & Burlbaw, L. M. (2019). American STEM Education in Its Global,
National, and Linguistic Contexts. EURASIA Journal of Mathematics, Science and
Technology Education, 16(1), em1810.
Madden, L., Beyers, J., & O'Brien, S. (2016). The importance of STEM education in the
elementary grades: Learning from pre-service and novice teachers’ perspectives. The
Electronic Journal for Research in Science & Mathematics Education, 20(5).
Ntemngwa, C., & Oliver, S. (2018). The implementation of integrated science technology,
engineering and mathematics (STEM) instruction using robotics in the middle school
science classroom. International Journal of Education in Mathematics, Science and
Technology, 6(1), 12-40.
Page, S. E. (2018). The model thinker: what you need to know to make data work for you.
Hachette UK.
Ralph, R., MacDowell, P., Lee, Y. L., & Ng, D. (2020). STEM Education for Girls: Perspectives
of Teachers During a Makeathon. In Overcoming Current Challenges in the P-12
Teaching Profession (pp. 73-95). IGI Global.
Sainz Sujet, P. (2020). Review of Data Visualisation: A Handbook for Data Driven Design: Data
Visualisation. A Handbook for Data Driven Design, by Andy Kirk, LA, Sage
publications, 2019, 312 pp., $106.93 (hardcover), ISBN: 978-1-5264-6892-5. Structural
Equation Modeling: A Multidisciplinary Journal, 1-3.
Shanshan, H. (2020). Inspiration from STEM Education Research in American Primary and
Secondary Schools (No. 2063). EasyChair.
Simran, K. (2020, May 14). What is Data Analysis? Methods, Techniques & Tools. Hackr.io
blog. https://hackr.io/blog/what-is-data-analysis-methods-techniques-tools.
36
Tsupros, N., Kohler, R., & Hallinen, J. (2009). STEM education in Southwestern Pennsylvania:
Report of a project to identify the missing components. Unpublished Report. Pittsburgh,
PA: Intermediate Unit, 1.
Ugwuanyi, C. S., & Okeke, C. I. (2020). Determinants of university students’ interest in science,
technology, engineering and mathematics education in nigeria: a case of a structural
equation modeling. International Journal of Mechanical and Production Engineering
Research and Development, 10 (3): 6209–6218. http://dx. doi.
org/10.24247/ijmperdjun2020590.
Umoh, S. W. (2016). Problems And Prospects Of Effective Science, Technology, Engineering
And Mathematics (Stem) Education Delivery In Nigeria. Knowledge Review, 35(1), 2-
13.
Zykina, A., Kaneva, O., & Sharun, I. (2020, January). Application of the descriptor approach for
clustering entities from education sector. In Journal of Physics: Conference Series (Vol.
1441, No. 1, p. 012184). IOP Publishing.
37
APPENDIX A
38
Table 1
Panel A of Table 1
Chart A of Table 1
39
Table 2
Panel A of Table 2
Chart A of Table 2
40
Table 3
Panel B
Table 4
41
Chart B
42
Fig 7.1 Lagos State Polytechnic Correlation Results
43
Fig 7.3 Lagos State Polytechnic Linear Regression model
44
Lagos State University Data Analysis Results
45
Fig 7.3.3 Lagos State University Linear Regression model
46
Graph 1 Lagos State polytechnic Logistic Result
47
Graph 3 Lagos State polytechnic 3D Scatter plot
48
Graph 5 Lagos State polytechnic Box Plot
49
Graph 7 Lagos State polytechnic Density Plot
50
Graph 9 Lagos State polytechnic Scatter Plot Matrix
51
APPENDIX B
library(car)
library(readr)
library(e1071)
library(AICcmodavg)
setwd("C:/Users/Ogunbekun/Desktop/ProjectApp/app")
View(laspotechfinal)
mydata
# Correlation Analysis
# Pearson Correlation
cor(mydata$`ACADEMIC STAFF
52
FEMALE`, mydata$`TOTAL FEMALE`)
cor(mydata$`NON-ACADEMIC
cor(mydata$`NON-ACADEMIC
# Spearman Correlation
c("spearman"))
cor(mydata$`ACADEMIC STAFF
cor(mydata$`NON-ACADEMIC
cor(mydata$`NON-ACADEMIC
# Kendall Correlation
53
cor(mydata$`ACADEMIC STAFF MALE`, mydata$`TOTAL FEMALE`, method =
c("kendall"))
cor(mydata$`ACADEMIC STAFF
cor(mydata$`NON-ACADEMIC
cor(mydata$`NON-ACADEMIC
mydata)
54
fecat.mod <- lm(mydata$`TOTAL FEMALE` ~ mydata$CATEGORIES, data = mydata)
mydata$`ACADEMIC STAFF
MALE` + mydata$`NON-ACADEMIC
fecat.mod, fesch.mod,
55
"combine1.mod", "combine2.mod", "combine3.mod", "combine4.mod")
fecat.mod
summary(lm(fecat.mod))
set.seed(100)
summary(fecat.mod)
AIC (fecat.mod)
predicteds=femalepred))
head(actual_preds)
actual_preds
56
mape <- mean(abs((actual_preds$predicteds - actual_preds$actuals))/actual_preds$actuals)
min_max_accuracy
mape
correlation_accuracy
# Visualization of data
par(mfrow=c(1,2))
boxplot.stats(mydata$`TOTAL MALE`)$out))
boxplot.stats(mydata$`TOTAL FEMALE`)$out))
par(mfrow=c(1,2))
boxplot.stats(mydata$`TOTAL MALE`)$out))
boxplot.stats(mydata$`TOTAL FEMALE`)$out))
par(mfrow=c(1,2))
57
plot(density(mydata$`TOTAL MALE`, main="Density plot: male", ylab="Frequency",
# Scatter Matrix
STAFF
# 3D Scatter Plot
FEMALE`
58
# Lagos State University Data Analysis
library(car)
library(readr)
library(e1071)
library(AICcmodavg)
setwd("C:/Users/Ogunbekun/Desktop/ProjectApp/app")
View(lasufinalcopy)
# Correlation Analysis
# Pearson Correlation
# Spearman Correlation
c("spearman"))
59
cor(lasufinalcopy$`TOTAL FEMALE`, lasufinalcopy$`NON-ACADEMIC STAFF MALE`,
method = c("spearman"))
method = c("spearman"))
# Kendall Correlation
method = c("kendall"))
method = c("kendall"))
lasufinalcopy)
lasufinalcopy)
60
fesch.mod <- lm(lasufinalcopy$`TOTAL FEMALE` ~ lasufinalcopy$DEPARTMENTS, data =
lasufinalcopy)
"fesch.mod",
set.seed(100)
61
trainingrowindex <- sample(1:nrow(lasufinalcopy), 0.8*nrow(lasufinalcopy))
summary(combine2.mod)
AIC (combine2.mod)
predicteds=femalepred))
head(actual_preds)
min_max_accuracy
mape
correlation_accuracy
# Visualization of data
# Scatter Plot
62
scatter.smooth(x = lasufinalcopy$`TOTAL MALE`, y = lasufinalcopy$`TOTAL FEMALE`,
# Box Plot
par(mfrow=c(1,2))
boxplot.stats(lasufinalcopy$`TOTAL MALE`)$out))
boxplot.stats(lasufinalcopy$`TOTAL FEMALE`)$out))
par(mfrow=c(1,2))
boxplot.stats(lasufinalcopy$`TOTAL MALE`)$out))
boxplot.stats(lasufinalcopy$`TOTAL FEMALE`)$out))
# Density plot
par(mfrow=c(1,2))
63
plot(density(lasufinalcopy$`TOTAL FEMALE`, main="Density plot: female",
FEMALE`), 2))))
# Scatter Matrix
# 3D Scatter Plot
64
setwd("C:/Users/Ogunbekun/Desktop/PROJECTDATACOLLECTION")
library(readr)
View(lasufinalcopy)
table(lasufinalcopy$ACTION)
set.seed(100)
library(smbinning)
65
factor_vars <- c("DEPARTMENT", "CATEGORIES", "YEAR", "ACTION")
if(class(smb) !="character"){
66
if(class(smb) !="character"){
iv_df
= "logit"))
library(InformationValue)
summary(logitMod)
vif(logitMod)
library(plotROC)
plotROC(testdatab$ACTION, predicted)
67
sensitivity(testdatab$ACTION, predicted, threshold = optCutOff)
68