1)What is Big Data. Explain the types of Big Data.
Discuss the
challenges for Big Data Analytics.
A) Big Data refers to the massive volumes of structured and unstructured data
generated from various sources at high velocity. This data is often too large and
complex for traditional data processing systems to manage effectively.
Types of Big Data
Big Data can be categorized into three primary types:
   1. Structured Data:
          o   This is data that is organized in a predefined manner, usually in
              rows and columns, making it easily searchable and analyzable.
          o   Examples: Databases, spreadsheets, and any data that can be easily
              entered into relational databases.
   Advantages:
      Easy to Analyze
      High Accuracy
   Disadvantages:
      Limited Flexibility
      Inability to Capture Rich Information
   2. Unstructured Data:
          o   This type of data lacks a predefined format or structure, making it
              more complex to analyze.
          o   Examples: Text documents, images, videos, social media posts, and
              emails.
   Advantages:
      Rich Insights
      Flexibility
Disadvantages:
      Difficult to Analyze.
      Data Quality Issues
   3. Semi-Structured Data:
          o   This type contains elements of both structured and unstructured
              data. While it may have organizational properties to separate data
              elements, it does not fit into a strict schema.
          o   Examples: JSON, XML, and NoSQL databases.
   Advantages:
      Balance of Structure and Flexibility
      Ease of Data Integration
   Disadvantages:
      Complexity in Analysis
      Inconsistent Formats
Challenges for Big Data Analytics
Despite its potential, Big Data analytics faces several challenges:
   1. Data Quality and Cleansing:
          o   Ensuring that the data is accurate, consistent, and cleaned is critical.
              Poor data quality can lead to incorrect insights and decision-
              making.
   2. Data Integration:
          o   Combining data from different sources (structured and unstructured)
              can be difficult, especially when these sources use various formats
              and protocols.
   3. Storage and Management:
          o   Storing vast amounts of data efficiently while maintaining
              performance is a significant challenge. This includes choosing the
              right technology stack and managing the costs associated with
              storage.
   4. Scalability:
          o   As the volume of data grows, systems must be able to scale
              effectively without sacrificing performance. This requires robust
              architecture and planning.
   5. Data Privacy and Security:
         o   Protecting sensitive data and ensuring compliance with regulations
             (like GDPR) represents a major challenge, particularly with the
             increase in data breaches.
   6. Skill Gap:
         o   There is often a shortage of skilled professionals who can analyze
             Big Data effectively. This includes data scientists, analysts, and
             engineers familiar with Big Data technologies.
   7. Real-time Processing:
         o   Analyzing streaming data in real-time poses technical challenges,
             as traditional data processing tools may not be able to handle high-
             velocity data streams effectively.
   8. Interpreting Data:
         o   Deriving actionable insights from complex datasets can be
             daunting, especially when visualizing the data or when decision-
             makers lack data literacy.
2) Define Business Intelligence and How the business intelligence
systems implemented.
A) Business intelligence or BI is a set of practices of collecting, structuring,
and analyzing raw data to turn it into actionable business insights. BI considers
methods and tools that transform unstructured data sets, compiling them into
easy-to-grasp reports or information dashboards. The main purpose of BI is to
support data-driven decision-making.
Business intelligence process: How does BI work?
The whole process of business intelligence can be divided into five main stages.
   1. Data gathering involves collecting information from a variety of sources,
      either external (e.g., market data providers, industry analytics, etc.) or
      internal (Google Analytics, CRM, ERP, etc.).
   2. Data cleaning/standardization means preparing collected data for
      analysis by validating data quality, ensuring its consistency, and so on
      (please check the linked articles for more details.)
   3. Data storage refers to loading data in the data warehouse and storing it
      for further usage.
   4. Data analysis is actually the automated process of turning raw data into
      valuable, actionable information by applying various quantitative and
      qualitative analytical techniques.
   5. Reporting involves generating dashboards, graphical imagery, or other
      forms of readable visual representation of analytics results that users can
      interact with or extract actionable insights from.
   Advantages of BI:
         Data driven decision making
         Improved efficiency
         Enhanced visualization
         Data mining
         Real time analytics
   Disadvantages of BI:
         High Costs
         Complexity
         Data Overload
         Dependency on IT
         Security and Privacy concerns
3)What are advantages and disadvantages of Big Data Analytics?
A) Advantages:
1. Smarter Decisions: By analyzing large amounts of data, companies can
make more informed choices, leading to better strategies and outcomes.
2. Personalized Experiences: Understanding customer preferences allows
businesses to tailor products and services to individual needs, enhancing
satisfaction.
3. Boosted Efficiency: Big data helps identify areas where operations can be
streamlined, saving time and resources.
4. Competitive Edge: Access to comprehensive data insights enables
companies to stay ahead in the market by quickly adapting to trends.
5. Innovation Opportunities: Analyzing data can reveal gaps in the market,
inspiring the development of new products or services.
Disadvantages:
1. Privacy Concerns: Collecting vast amounts of personal data can lead to
security risks and potential misuse if not handled properly.
2. High Costs: Implementing and maintaining big data systems can be
expensive, requiring significant investment in technology and talent.
3. Data Overload: With so much information available, it can be challenging to
filter out irrelevant data and focus on what's important.
4. Quality Issues: Not all data collected is accurate or useful; relying on poor-
quality data can lead to faulty conclusions.
5. Complex Analysis: Interpreting big data requires specialized skills and tools,
which may not be readily available to all organizations.
4)Describe characteristics of Big Data or 5v’s of Big Data.
A) Characteristics of Big Data:
   o   Volume
   o   Veracity
   o   Variety
   o   Value
   o   Velocity
   Volume
   o   The name Big Data itself is related to an enormous size. Big Data is a vast
       'volumes' of data generated from many sources daily, such as business processes,
       machines, social media platforms, networks, human interactions, and many
       more.
   o   Facebook can generate approximately a billion messages, 4.5 billion times that
       the "Like" button is recorded, and more than 350 million new posts are
       uploaded each day. Big data technologies can handle large amounts of data.
   Variety
   o   Big Data can be structured, unstructured, and semi-structured that are being
       collected    from     different sources.     Data  will   only   be    collected
       from databases and sheets in the past, But these days the data will comes in
       array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.
The data is categorized as below:
Structured data: In Structured schema, along with all the required columns. It
is in a tabular form. Structured Data is stored in the relational database
management system.
Semi-structured: In Semi-structured, the schema is not appropriately defined,
e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction
Processing) systems are built to work with semi-structured data. It is stored in
relations, i.e, tables.
Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations have
much data available, but they did not know how to derive the value of data
since the data is raw.
Quasi-structured Data:The data format contains textual data with inconsistent
data formats that are formatted with effort and time with some tools.
Veracity:
Veracity means how much the data is reliable. It has many ways to filter or
translate the data. Veracity is the process of being able to handle and manage
data efficiently. Big Data is also essential in business development.
For example, Facebook posts with hashtags.
Value:
Value is an essential characteristic of big data. It is not the data that we process
or store. It is valuable and reliable data that we store, process, and
also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the speed
by which the data is created in real-time. It contains the linking of
incoming data sets speeds, rate of change, and activity bursts. The primary
aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media sites,
sensors, mobile devices, etc.
5)Discuss the advantages and disadvantages of Business Intelligence.
A) Advantages of Business Intelligence
   1. Data-Driven Decision Making:
           o   BI provides access to valuable data and insights, enabling
               companies to make informed decisions based on facts rather than
               intuition.
   2. Improved Efficiency:
           o   BI tools automate data analysis and reporting, saving time and
               allowing employees to focus on more important tasks, which
               boosts productivity and overall performance.
   3. Enhanced Visualization:
           o   BI tools create easy-to-read charts, graphs, and dashboards that
               help businesses quickly understand their performance and identify
               trends.
   4. Data Mining:
         o   BI systems analyze large datasets to uncover hidden patterns and
             insights, helping businesses make proactive decisions and stay
             competitive.
  5. Real-Time Analytics:
         o   With BI, companies can access up-to-date information instantly,
             allowing for quick responses to market changes and timely
             decision-making.
Disadvantages of Business Intelligence
  1. High Costs:
         o   Implementing BI systems can be expensive, including the costs of
             software and training, which might be a burden for smaller
             businesses.
  2. Complexity:
         o   BI tools can be complicated to set up and use, especially for those
             who are not tech-savvy. Proper training and support are essential.
  3. Data Overload:
         o   Access to vast amounts of data can lead to confusion. Companies
             need to focus on quality and relevance to avoid being overwhelmed
             by unnecessary information.
  4. Dependency on IT:
         o   BI systems often require technical IT support for implementation
             and maintenance, which can cause delays and create bottlenecks in
             accessing data.
  5. Security and Privacy Concerns:
         o   Storing sensitive data in central databases raises security risks,
             making it important for companies to implement strong measures
             to protect against breaches.
6)Discuss Evolution of Big Data.
A) 1. Early Days of Data Management (1950s - 1970s)
     Data Collection Begins: This period involved the early use of computers
      to collect and manage data. Businesses started using basic databases and
      mainframes to store information.
2. The Rise of the Internet and Data Explosion (1990s - 2000s)
      Internet Growth: The internet became popular, leading to a rapid
       increase in data from websites, emails, and online transactions. More data
       was generated than ever before.
3. Emergence of Big Data Technologies (2000s - 2010s)
      Big Data Defined: This period saw the introduction of technologies that
       could handle large amounts of diverse data (like Hadoop), making it
       easier to store and process Big Data efficiently.
4. Advancements in Data Analytics and Machine Learning (2010s - 2020s)
      Sophisticated Analysis: Companies began using advanced techniques
       like machine learning and predictive analytics to gain deeper insights
       from their data, allowing for better decision-making.
5. Current Trends and Future Directions (2020s and Beyond)
      Future Focus: Businesses are now concentrating on real-time data
       analysis, artificial intelligence, and how to manage data effectively to stay
       ahead in the market.
7) Differentiate between structured, unstructured and semi-structured data.
A)
Structured Data            Unstructured Data         Semi structured Data
1) Organized in a          1) Lacks a specific       1) Contains elements of
predefined format, often   format or structure,      both structured and
in rows and columns.       making it more            unstructured data; has
                           complex.                  some organization but
                                                     not a strict schema.
2) Databases (e.g.,      2) Text files, emails,      2) JSON, XML, NoSQL
SQL), spreadsheets,      social media posts,         databases.
CSV files.               images, videos.
3) Stored in tabular     3) Requires more            3) Can be stored in both
forms (e.g., relational  flexible storage            database systems and
databases).              solutions (e.g., file       file formats, depending
                         systems).                   on the structure.
4) Easy to analyze using 4) More challenging to      4) Easier to analyze than
traditional tools (like  analyze; requires           unstructured data, but
SQL).                    advanced tools and          may need specialized
                         techniques (like NLP).      tools for complete
                                                     analysis.
5) Low complexity;         5) High complexity due    5) Moderate complexity;
straightforward data       to vast variability and   some organization helps
management.                lack of format.           but may still require
                                                     parsing.
6) Easily searchable       6) Harder to search;      6) Can be searched more
using standard query       often needs indexing or   easily than unstructured
languages.                 advanced searching        data, especially if
                           tools.                    properly tagged.
7) Financial data,         7) Social media           7) Log files, web data
transaction records,       analysis, customer        feeds, and data from
customer databases are     feedback, multimedia      APIs are some of the
some of the uses.          content are some of the   uses.
                           uses.