Data Science Overview
answer
Here are the answers to the questions shown in the image, each suitable for a 5-mark
answer:
1. Explain the term Data Science. Write down applications of it.
Answer:
Data Science is an interdisciplinary field that uses scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It
combines statistics, computer science, and domain expertise.
Applications:
Predictive analytics in healthcare (disease prediction)
Customer behavior analysis in marketing
Fraud detection in finance
Recommendation systems (e.g., Netflix, Amazon)
Autonomous driving systems
2. Explain the life cycle of Data Science Project.
Answer:
The Data Science life cycle includes the following stages:
1. Problem Definition: Understand the business problem.
2. Data Collection: Gather relevant data from various sources.
3. Data Cleaning: Handle missing values, outliers, and inconsistencies.
1/4
4. Data Exploration: Analyze patterns, trends, and relationships.
5. Feature Engineering: Create relevant variables for modeling.
6. Model Building: Apply machine learning algorithms.
7. Evaluation: Assess model performance using metrics.
8. Deployment: Implement the model in production.
9. Monitoring and Maintenance: Ensure model accuracy over time.
3. Describe about Web scraping and Web crawling.
Answer:
Web Scraping is the process of extracting data from websites using tools or scripts. It helps
gather specific information like product prices, reviews, etc.
Web Crawling is the automated process of browsing the web to index content from
websites, typically used by search engines.
Difference:
Scraping focuses on data extraction, while crawling focuses on link traversal.
Scraping targets specific pages; crawling spans across multiple sites/pages.
4. Write a R program to explain web scraping.
Answer:
# Load required libraries
library(rvest)
# URL of the webpage
url <- "https://example.com"
# Read HTML content
webpage <- read_html(url)
# Extract specific data, e.g., headings
headings <- html_text(html_nodes(webpage, "h2"))
2/4
# Print extracted headings
print(headings)
5. Write a R program to collect data from a table in MySQL Database.
Answer:
# Load required package
library(RMySQL)
# Connect to MySQL database
conn <- dbConnect(MySQL(), user='root', password='password', dbname='your_db',
host='localhost')
# Fetch data from table
data <- dbGetQuery(conn, "SELECT * FROM your_table")
# Print data
print(data)
# Disconnect
dbDisconnect(conn)
6. Write a R program to create and insert data in a table in MySQL Database.
Answer:
# Load package
library(RMySQL)
# Connect to database
conn <- dbConnect(MySQL(), user='root', password='password', dbname='your_db',
host='localhost')
3/4
# Create table
dbSendQuery(conn, "CREATE TABLE IF NOT EXISTS students (id INT, name VARCHAR(50))")
# Insert data
dbSendQuery(conn, "INSERT INTO students VALUES (1, 'John Doe'), (2, 'Jane Doe')")
# Disconnect
dbDisconnect(conn)
7. Write a short note on text mining.
Answer:
Text mining is the process of deriving meaningful information from text. It involves steps like
text preprocessing, tokenization, removing stop words, stemming, and applying NLP
techniques to extract patterns. Applications include spam detection, sentiment analysis, and
document categorization.
8. Write a short note on sentiment analysis.
Answer:
Sentiment analysis is a technique in text mining that determines the emotional tone behind
a body of text. It classifies text as positive, negative, or neutral. It's widely used in social
media monitoring, product review analysis, and customer feedback evaluation.
Let me know if you need these in a document format or as a printable PDF!
4/4