0% found this document useful (0 votes)
9 views13 pages

Document

The document outlines the history and evolution of big data from early data collection methods to modern machine learning applications. It explains the data science life cycle, the importance of data warehousing, and differentiates between OLAP and OLTP systems. Additionally, it discusses the applications of data analysis, machine learning, and text mining in various fields such as business, healthcare, and finance.

Uploaded by

refixcare.in
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Document

The document outlines the history and evolution of big data from early data collection methods to modern machine learning applications. It explains the data science life cycle, the importance of data warehousing, and differentiates between OLAP and OLTP systems. Additionally, it discusses the applications of data analysis, machine learning, and text mining in various fields such as business, healthcare, and finance.

Uploaded by

refixcare.in
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

PHASE ONE:

1.write about the history and evolution of big data.

History and Evolution of Big Data

A. Early Days (Pre-1970s): Data collection began with mainframe


computers using punch cards and magnetic tapes. The relational
database model (1970) revolutionized structured data storage.
B. Data Warehousing (1980s-1990s): Businesses began centralizing data
for analysis. Tools like SQL enabled querying large datasets, and the
first data warehouses were developed for decision-making.
C. Internet Era (1990s-2000s): The rise of the internet led to exponential
growth in unstructured data (emails, websites). Search engines like
Google introduced scalable storage and retrieval methods

D. Big Data Revolution (2000s): Frameworks like Hadoop (2006) and


MapReduce enabled processing of massive datasets. Cloud computing
provided scalable, cost-effective storage.

D. Modern Era (2010s-Present): Machine learning, IoT, and real-time


analytics advanced Big Data capabilities. AI-driven tools now process,
analyze, and derive insights from diverse datasets efficiently.

2. Explain the data science life cycle.

The data science life cycle involves the following key stages:

A. Define the Problem: Identify the objective, research questions, and


success criteria.
B. Data Collection: Gather relevant data from various sources.

C. Data Preparation: Clean, preprocess, and transform data for analysis.

D. Exploratory Data Analysis (EDA): Analyze data to discover patterns,


trends, and relationships.

E. Modeling: Build and train predictive or analytical models.

F. Evaluation and Deployment: Test model performance using metrics,


deploy to production, and monitor for ongoing effectiveness.

This iterative process ensures actionable insights and robust solutions.

3 . Explain the application of data analysis.

Applications of Data Analysis

A. Business Decision-Making: Improves strategies by analyzing sales,


customer behavior, and market trends.

B. Healthcare: Enhances diagnostics, personalizes treatments, and aids in


disease prediction.
C. Marketing: Provides insights into customer behavior for targeted
advertising and retention strategies.

D. Finance: Detects fraud, manages risks, and analyzes market trends.

D. Government: Helps design policies, allocate resources, and manage


crises effectively.

E. Supply Chain: Optimizes inventory, reduces costs, and improves


delivery efficiency.

PHASE TWO:

1. Differentiate between OLAP and OLTP?

A. Purpose:

OLAP: For analysis and decision-making.

OLTP: For managing daily transactions.

B. Data:

OLAP: Historical and aggregated data.


OLTP: Real-time, operational data.

C. Queries:

OLAP: Complex, read-intensive queries.

OLTP: Simple, write-intensive queries.

D. Schema:

OLAP: Multidimensional (e.g., star schema).

OLTP: Normalized relational schema.

E. Users:

OLAP: Analysts and decision-makers.

OLTP: Operational staff and customers.

F. Examples:

OLAP: Data warehouses, BI tools.


OLTP: Banking, e-commerce systems.

2. Explain data warehousing and it’s types?

Data Warehousing is the process of collecting, storing, and managing data


from multiple sources in a central repository for analysis and decision-
making. A data warehouse supports business intelligence by organizing data
for easy querying, reporting, and insights.

Characteristics:

A. Subject-Oriented: Focused on key business areas like sales or


customers.

B. Integrated: Combines data from various sources into a unified format.

C. Non-Volatile: Data is stable and not frequently updated.

D. Time-Variant: Stores historical data for trend analysis.

Types of Data Warehouses:

A. Enterprise Data Warehouse (EDW): A centralized repository for the


entire organization.
B. Operational Data Store (ODS): A real-time data warehouse for short-
term decision-making.

C. Data Mart: A smaller, department-specific subset of a data warehouse.

3. Differentiate between descriptive analysis and predictive


business analysis?

A. Definition:

Descriptive Analysis: Summarizes historical data to understand past trends


and patterns.

Predictive Analysis: Uses historical data and models to predict future


outcomes.

B. Objective:

Descriptive: Answers “What happened?”

Predictive: Answers “What is likely to happen?”


C. Techniques:

Descriptive: Data aggregation, visualization (charts, graphs).

Predictive: Statistical modeling, machine learning, forecasting.

D. Output:

Descriptive: Insights into past performance.

Predictive: Probabilities and future scenarios.

E. Use Cases:

Descriptive: Sales reports, website analytics.

Predictive: Customer churn prediction, demand forecasting.

F. Dependency:

Descriptive: Relies solely on historical data.

Predictive: Combines historical data with algorithms for forecasting.


PHASE THREE:

1. Why is data warehousing is important?

A. Centralized Data Storage: It consolidates data from multiple sources,


ensuring all data is stored in one place for easy access and
management.

B. Enhanced Decision-Making: By providing historical and current data in


a structured format, it supports accurate and timely decision-making
for business strategies.

C. Improved Data Quality: Data warehousing involves data cleaning,


integration, and standardization, which enhances the quality and
consistency of data.

D. Faster Data Retrieval: It is optimized for quick querying and analysis,


saving time compared to transactional systems.

D. Support for Business Intelligence: Data warehouses provide the


foundation for advanced analytics, reporting, and visualization tools
that drive insights.

E. Scalability: They are designed to handle growing volumes of data,


making them suitable for businesses of all sizes.
F. Historical Analysis: A data warehouse stores historical data, enabling
trend analysis, forecasting, and performance evaluation over time.

G. Operational Efficiency: By offloading analytical queries from


transactional systems, data warehousing improves the efficiency of
operational databases.

This concise yet detailed answer highlights its relevance in organizational


success.

2.What is machine learning?

Machine Learning (ML) is a subset of artificial intelligence that enables


systems to learn and improve from experience without being explicitly
programmed. It uses algorithms to identify patterns in data and make
decisions or predictions based on it.

Key Points:

A. Definition: Machine learning involves developing algorithms that allow


computers to learn from and act on data.

B. Purpose: It aims to improve the system’s performance over time as it


processes more data.
C. Types of Learning:

Supervised Learning: Learning from labeled data.

Unsupervised Learning: Identifying patterns in unlabeled data.

Reinforcement Learning: Learning through rewards and penalties.

D. Applications: Examples include spam email filtering, recommendation


systems, and speech recognition.

2. What are the application of machine learning?

A. Understand the Question: Carefully read the question to determine


what is being asked. Identify keywords and concepts.

B. Plan Your Answer: Break it into logical parts—introduction, main body,


and conclusion. Ensure all parts directly address the question.

C. Content: Provide:

Key points: Highlight 4-6 major ideas or arguments (depending on the depth
needed).
Detail: Explain each point with relevant examples, facts, or reasoning.

Relevance: Stick to the topic and avoid unnecessary information.

D. Structure:

Introduction : Briefly state your understanding of the question.

Body : Each point should have a clear explanation and evidence or examples.

Conclusion : Summarize your key arguments or findings.

E. Clarity and Conciseness: Write clearly, avoiding overly complex


language.

3. What is text mining?

Text mining, also known as text analytics, is the process of extracting


meaningful information, patterns, and insights from unstructured text data. It
combines techniques from fields such as natural language processing (NLP),
machine learning, and statistics to analyze and interpret large amounts of
text.

Key Components of Text Mining:

A. Text Preprocessing: Cleaning and preparing text by removing


stopwords, stemming, lemmatization, and tokenization.

B. Information Extraction: Identifying entities, relationships, or specific


information from text.

C. Text Classification: Categorizing text into predefined groups or classes.

D. Sentiment Analysis: Determining the sentiment or emotion conveyed in


the text.

E. Topic Modeling: Discovering hidden themes or topics in text data.

F. Clustering: Grouping similar texts together based on patterns.

Applications:

Customer Feedback Analysis: Understanding opinions in reviews or surveys.

Fraud Detection: Analyzing documents or messages for suspicious content.


Healthcare: Extracting insights from clinical notes or research papers.

Business Intelligence: Analyzing social media for brand perception.

By transforming unstructured text into structured formats, text mining helps


organizations make data-driven decisions.

You might also like