0% found this document useful (0 votes)
208 views42 pages

Data Literacy

data

Uploaded by

debopriyab073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
208 views42 pages

Data Literacy

data

Uploaded by

debopriyab073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 42
J. INTRODUCTION Modern world is the world of te data-ruled-world, it becomes import ‘That is where ‘data literacy’ comes Hteracy, its impact, importance and ri A Dolo Security, Wy Bost sciinolagy AL and Date In fa anit for ¢ to pict ole in Al ure. In this Privacy and Al Pretices for Cyber Security act, data ru the world. In thig) to understane d befriend data, will learn about data sveryone jon, you 1.2 WHATS DATA LITERACY ? 7 Data Litera In today's digital world, data is everywhere, and it’s pats cy ' essential to know how to work with it Data literacy is the Data Literacy is the ability, ability to understand, analyze, and communicate with to understand, analyze, aiid t because it communicate witht ~aiaiag data, effectively. Data literacy helps you think critically, understand the world of data. Data literacy involves these three things 1. Reading Data. It refers to the abil may be available in various formats such as © 2. Working with Data. It refers to the process of collecting and managing data; checking facts and spotting misleading information in it; storing and transmitting it in appropriate formats. 3, Communicating with Data, It refers to the process of understanding and interpreting data ; spotting trends and patterns in the data; tabulating, and reporting and presenting data ‘in different and diverse formats. is important make better clioi lity to read, anal (Fig. 1.1) and data in, which umbers s, Charts ete. Reading Data ‘Theory and Analysis Data Literacy with Dar na with | Collection and ki i ‘Management Reporting Figure 1.1 Data Literacy Components cy is one of the most important skills needed today. It opens doors to endless. . The impact and importance of data literacy can be summarized through the following benefits it provides, 1, Spot Data Trends. Data literacy helps us analyze situations, spot trends, and ; Predict outcomes. For instance, by examining historical data and spotting the data trends, we can understand the causes of significant events like economic recessions. 2. Foster Critical Thinking. With data literacy, instead of accepting information at face value, we learn to question its source, reliability, and potential biases. This saves us from misinformation. 3. Make Informed Decisions, With data skills, we can gather and analyse relevant information. This helps us make choices based on facts, and not on assumptions or guesses. > Communicate Effectively. Data literacy empowers us with factual power. We can use data to support our arguments and ideas. With the help of Data visualization, we can present information more clearly. y . Solve Complex Problems, Many modem problems involve large amounts of data. Data literacy makes it easier to break down bulk data, spot patterns in it, identify trends and solve challenging problems. Overall, data literacy empowers us to navigate our data-driven world confidently. 1.4 HOWTO BECOME DATA LITER Becoming proficient in dat: Data Literacy Process Framewo: Step 1 : Identify | In this step, all the data & information collection happens, whether it's numbers, words, | or pictures, e.g., if we're studying the weather, the data might include temperature readings, rainfall amounts, wind speeds and so on. avolves following a structured process called the It involves najorly the following three steps : So, to become data literate, as part of Identify step, you need to learn to : Understand data types (numerical, categorical, etc.). © Get familiar with common data visualizations. Data visualisation is the graphical or visual representation of information/data, such as graphs or charts such as, bar charts, scatter plots etc. © Practice reading and interpreting data from various sources. died to uncover analysing sale® cones need improvements ie. data literate, as part of Analyse step, you nee aT data from different sources: Question the data souy removing redundancie’ information and handle them. ig and errors, Look for potential biases, id create visualizations of the data, ‘peplore & Visualize examine patterns al through the lens of questions (ig rom this data?” For instance, ify 1 the data to identify areas curated and visualised data goes tterns tell us?”, “what can we learn f are test scores, you might intespret students need extra help. So, to become data literate, ag Determine what insights can be drat T& Share findings about the data, such as how can # scenarios ? ‘ Consider alternative interpretations or explanations. as part of Interpret step, you need to learn to + ym from data information be applied in reall Remember, practice makes perfect! The more you ‘engage with data, the more proficient you'll become i at interpreting and leveraging its insights. re 1.2 Data Literacy Process Framework {data all around us and data governing the decisions affecting our lives, Data become super important. These are twin pillars of safeguarding our anyone without the decrypt fe ae pene Data Security Thus, pect involves : pata Security refers to protecting data from unauthorized access, theft, or taking steps for sal damage data and information from takin i unauthorized access, theft, or ° \g Measures like encryption, access controls, firewall iteration, and _—_involves and network security to prevent data breaches. preventives measures to Stop data breaches. 1.5.2 Data Privacy Data privacy refers to controlling the access to our personal information and its use. It's like setting boundaries to protect our digital identity from exploitation or misuse. Consider the data you share on social media platforms. (8) By adjusting privacy settings, you can control who sees your posts, photos, and personal details, ensuring they're only accessible to trusted individuals. Data Privacy involves : ® Concerns for safeguarding personal or sensitive information from misuse. ® Practices like data minimization and user consent. Control over how one's personal data and digital v - Data Privacy footprint is collected and used. ad Data Privacy refers to controlling the access to our personal information and its The AI systems need the vast amounts of data to learn and use. make decisions. Whether it’s training data for machine Jeaming algorithms or personal information used for targeted advertising, maintaining robust data security and privacy measures is crucial to prevent unauthorized access or misuse of data. 1.5.3 Data Security, Privacy and A\ So, we can say that the relationship between Al and data security/privacy is complex Fig. 1.3. AL raises Data Privacy concerns _AI provides enhanced Data Security ‘ATS requirement and use of data raises concems about | But AI can also enhance security measures for data : data privacy : ‘* Machine learning models can detect cyber threats or 4 Al systems require large amounts of data for training | anomalies. and operation; data may include personal or sensitive | x AI can help identify and protect against data information. breaches. raises privacy concerns about how this data is collected and used. stems may have vulnerabilities that can be about individual rights and AL ae b gn) governance and ethical AI principles s impact on security and privacy landscapes Figure 1.3 Relationship between AT and Data Security/Privacy. and privacy have a mixed relation with AI. Data used to train AI should 5 __ secure and from authentic sources. Comprised data can impact the performance of Al. At ‘same time, AI can play an important role in detecting and protecting against datg ‘While Al presents new challenges, there are established best practices for protecting data 7 and privacy. Here are some best practices for cyber security : 1. Use Strong Passwords ‘Use unique, complex passwords for each of your accounts. Avoid using easily guessabl ips ant to take backup of your work and data regularly. Always maintain offtine ‘Important data, This protects against ransomware or data loss incidents. e Encryption for Stored Sensitive Data r iys encrypt sensitive data stored on devices or in transit. a" " _ 6 Visit Secure Sites and Use Secure Connections While browsing online, always make sure to visit secure sites over secure connections. ‘Secure sites’ URLs start with HTTPS and have a padlock sign on the status bar to depict it. Also, make sure not to use public free WiFi’s unless you are sure about their security measures. Preferably use VPNs when possible. 7, Limit Access and Sharing of Sensitive Data Only share personal or confidential information when necessary, Always make sure to Testrict access to data on a “need-to-know” basis. 8, Educate Yourself and Others Always, stay updated on latest cyber threats and best practices. It is equally important to Taise awareness about data security and privacy for ourselves and our community. By following these guidelines and prioritizing data security/privacy, one can protect oneself and one's information in our increasingly digital world. With this we have come to the end of this session. Let us quickly revise what we have learnt so far. ¢c L heck Point al Multiple Choice Questions 1. Data literacy refers to the ability to (@ Write computer code () Understand and analyze data (c) Create databases (2) Operate software programs 2% Which of the following is NOT an example of how data literacy can be beneficial ? @ Fact-checking claims and spotting misinformation ‘@ Understanding trends and patterns in your community (¢) Making decisions about your health based on data jriting computer programs in the Data Literacy Process Framework is : () Analyze and Communicate (a Analyzing data for insights ofthe following isa benefit of data literacy ? {@) Helps in making informed decisions Enhances critical thinking skills {o) Enables effective communication wil (@ All of these F. Which of the following is NOT a step in ae (o) Import and Tidy Data (Explore and Visualize (@ Analyze and Communicate (@) Build Machine Learning, Models '§& What does it mean to “tidy” data? (@) Remove any errors or inconsistencies (6) Organize data in a structured format (© Visualize data in charts or graphs 9, Which of the follovcing is a measure for ensuring data security ? (a) Obtaining user consent (b) Data minimization (©) Encryption (@) Anonymization 10. Data privacy aims to: (@ Prevent unauthorized access to data {) Give individuals control over their personal data (6) Protect data from being lost or damaged (@ Detect and respond to cyber threats Ii. Which of the following is NOT a potential security risk associated with AI systems ? (a) Vulnerabilities that can be exploited _(b) Manipulated data (c) Challenges to existing privacy regulations (a) Improved detection of cyber threats 42. What does “Privacy by Design” refer to ? (@) Developing AI systems with privacy in mind from the start (@ Implementing strict access controls for sensitive data ( Obtaining explicit consent from users for data collection (@ Regularly updating privacy policies and terms of service . Which of the following is NOT a best practice for cyber security ? (6) Using strong passwords and mult-fator authentication ‘) Keeping software and systems updated with latest security patches ‘cautious with emails and links from unknown sources information freely over unsecured channels ith data the Data Literacy Process Framework ? (@) Interpret data and draw insights (b) Public Wi-Fi network (a) Unenerypted email ik “What is the purpose of limiting accoss to sensitive data ? {@) To enhance data security (®) To comply with privacy regulations {@) To facilitate data sharing and collaboration {@ Both (a) and (b) a7, Which of the following is NOT a way to protect data privacy ? {@) Using data minimization techniques (@ Obtaining user consent for data collection {@ Implementing strong encryption measures (@ Sharing personal data freely with third parties 18. What does it mean to “fact-check” information using data literacy skills? (@) Verifying the accuracy of claims or information using data (®) Creating visualizations to present data effectively (© Organizing and cleaning data before analysis @ Applying statistical techniques to analyze data 19. Which of the following is NOT a critical question to ask when practicing data literacy ? (@) What is the source of the data? (6) Are there any potential biases or errors in the data ? (6) How much is the storage cost of data ? (@ Are there alternative interpretations or explanations for the data ? 20, What does it mean to “explore” data in the Data Literacy Process Framework ? (@) Gather and organize data from various sources (@ Examine pattems and trends within the data {0 Draw insights and communicate findings from the data (@) Validate the accuracy and completeness of the data 21. Which of the following is NOT a potential risk associated with data breaches ? ij (@) Financial losses (b) Reputational damage (© Legal consequences (@) Improved cybersecurity measures __ 2. What is the purpose of maintaining offline backups of important data ? A fo) To protect against ransomware or data loss incidents —— abit eee es Wd _ with ata effectively Dhaips you _____ lars aru spot misleading Information 7 Framework guides you through working with data effectively, Fconcems safeguarding. _. from misuse. dowolves measures like encryption, access controls, and network security, al of data literacy is to enable decision-making Tard ____ are two Key steps in the Data Literacy Process Framework. helps prevent __- : By Design” refers to developing Al systems with security and privacy in mind from yp _____ passwords is one of the best practices for cyber security. ‘Installing ____ is important for maintaining system security. 2 Be cautions with emails and links from _ sources. @ ___ to protect sensitive data stored on devices or in transit. HE Comect to websites and use when possible for secure connections. ; Stay updated on the latest and best practices for cyber security. ____ about data security and privacy in your community. ‘Dasta literacy empowers you to navigate our world confidently True | False “Diste literacy is only important for data scientists and analysts. Developing data literacy is a one-time process. 3. Data literacy can help you make better decisions based on facts. ‘4. Data security and data privacy are the same thing. ‘Al systems do not require any personal or sensitive data. | Data literacy only involves reading and interpreting charts and graphs. data literate can help you make better decisions based on data. Literacy Process Framework does not include any steps related to data visualization. rity and data privacy are completely unrelated concepts, do not require any data for training or opecation. ms do not pose any challenges to existing data privacy regulations passwords is not an important practice for cyber security. information over unsecured email or websites. ups of data can protect against ransomware attacks. Q Personal data freely with third parties. aims using data is an example of applying data literacy skills. tential Wiases or errors in data is not a critical question in data literacy. involves examining patterns and trends within the can only result in financial tosses, but not legal or reputational consequences. | ‘oneself and others about cyber security is important for increasing awareness. Competency Based Questions 1, Your local community organization is planning a fundraising event, and they have collected data ‘on past attendance, donation amounts, and demographics of supporters. Apply steps to analyze this data and provide actionable insights to the organization. Your analysis should help them ‘determine the optimal event format, target audience, and marketing strategies to maximize Particpation and fundraising efforts. How would you accomplish this ? (@) Ignore the data and plan the event based on personal preferences. (® Analyze the data but provide vague or irrelevant insights. {© Apply the Data Literacy Process Framework, analyze the data, and provide actionable insights for event planning, {@) Outsource the data analysis to a third-party without providing any guidance. 2, Investigate the use of Al in various industries (¢.g,, healthcare, finance, transportation). What would be the best approach to use this report {@) Ignore the potential concerns and focus only on the benefits of Al. (@) Investigate the use of AI but fail to provide any guidelines or recommendations. (©) Develop a set of guidelines or recommendations without investigating the use of Al or potential concerns (@ Investigate the use of Al, identify potential concerns, and develop guidelines or recommendations to address 3. Examine a real-world dataset related to a topic of your interest (e.g, environmental issues, social trends, global health). Which of the following approaches would you follow to present your analysis of the same highlighting the importance of critical questioning and data literacy skills. (@ Examine the dataset without any critical analysis or alternative interpretations (@) Propose alternative interpretations without examining or analyzing the dataset. (©) Critically analyze the data, identify biases/limitations, propose alternative interpretations, and present findings. é (@) Ignore the dataset and present personal opinions on the topic. To investigate the role of data literacy in combating misinformation and fake news, which of the following approaches could have been used to identify and counter the spread of misinformation, Investigate the role of data literacy without any real-world case study Analyze a real-world case study general way sate the role of data literacy, analyze a real-world case study, and present findings and on (A), Data literacy involves understanding and analyzing various Pee of data, ‘graphs, and charts i sson (R), Data literacy is the ability to read and interpret data in only numerical formats, {A}. The Data Literacy Process Framework ia linear process with no interconnected Reason (R). The Data Literacy Process Framework includes steps for exploring, visualizing communicating data. ‘Assertion (A). Data security measures, ‘unauthorized access to data. “Reason (R). Data security and data pr “Assertion (A). AI systems can enhance ce! anomalies. [Reason (R). Al systems themselves cannot introduce any security risks or abilities, "5. Assertion (A). Implementing “Privacy by Design” principles involves developing Al systems wig privacy considerations from the outset. "Reason (R). Privacy by Design is a concept that is useful in establishin, privacy protections controls. rds and enabling multi-factor authentication are best praia such as encryption and access controls, help p terms that refer to the same conc rivacy are interchangeable cures by detecting cyber threat tain security meas 1g robust data security andl | 6 Assertion (A). Using strong passwo for cyber security. Reason (R). Regularly updating software and systems with thi \e latest security , Asserion (A), Educating oneself and others about cyber threats and best practices can # "awareness and promote better security practices. ‘Reason (R). Raising awareness about data security ‘and not the general public. and privacy is only relevant for IT profe literacy is the ability to understand, analyze, and communicate with data effectively. involves three things: Reading data, Working with Data, and Communicating with benefits lite: spotting date trends, fostering critical thinking, helping in communication, and helping in solving complex problems. 4 eseieoey toes linn a siruclure process called fhe Dele refers to taking steps for safeguarding data and inform lation from un ‘alteration, and involves preventives measures to stop data breaches, lauthorized access, refers to controling the access to our personal information and its aa security and privacy have a mixed relation with AI. Data us; sources. Comprised data can impact the performance of Gaportont role n detecting and protecting against data breaches, Best Patios for Over Security include: using strong passwords, keeping systems and software updated, tng cnutious with emai and links, backing up data regularly, using encryption for stored sensitive data, visiting secure sites and using secure connections, limiting access and! sharing of data, and keep ‘educating oneself about latest threats and protection measures, Spun Time 1, What is data literacy ? Ans. Data literacy is the ability to understand, analyze, 2 Wiyis data literacy important in today's world ? Ans. Data literacy is important because it helps individuals think Posed on facto, communicate effectively using data, amounts of data. ed t0 train AT should be secure, from Ai. At the same time, At can play an and communicate with data effectively, critically, make better decisions and solve complex problems involving large Describe the three main steps of the Data Literacy Process Framework Bas. The three main steps are: (i) Import and Tidy Data (gather and organize data), (ii) Explore and Meedize (@xamine patterns and create visualizations), (fi) Analyze and Communicate (draw insights and share findings), 4. Differentiate behoeen data security and data privacy, Ans, Data security refers to protecting data from unauthorized access, theft, or damage. Data Privacy concerns safeguarding personal or sensitive information from misuse and. giving individuals Control over how their data is collected and used. 5, How can AI enhance data security measures ? Ans, Alcan enhance security measures by using machine leaming models to detect cyber threats, anomalies, or potential data breaches 6 What are some potential security risks associated with Al systems ? Ans. AI systems may have vulnerabilities that can be exploited, be susceptible to adversarial attacks with manipulated data, and challenge existing data privacy regulations, What is the purpose of “Privacy by Design” ? Ans. Privacy by Design refers to developing Al systems with privacy considerations and ections in mind from the very beginning, rather than as an afterthought. ;j ity i it important to use strong passwords and enable multifactor authentication? ; Using strong, unique passwords and enabling multi-factor authentication re be ty as they help prevent unauthorized access to a ’ heeping softoare and systems updated with the latest security ppalches, ‘installing. software updates and security patches helps address 4 es and protect against potential cyber threats or atacks. ‘ ‘of data encryption in ensuring data security ? , tion is used to t unauthorized access (0 sensitive data by x n Marc ely be ed with he cnec derypton key. ing #5 it recommended to limit access to sensitive data on a “need-to-know" basis ? miting access to sensitive data on a need:to-know basis enhances data security and jyy ly With data privacy regulations by restricting, access toonly those who require it for their wo E How om educating oneself and thers about cyber security promote betler practices? " Ans. Educating oneself and others about cyber threats, best practices, and the importance of {Security and privacy can increase awareness and encourage the adoption of safer practices, analyze, and communicate with data effectively. ‘to our personal information and its use. jing data and information from unauthorized access. representation of information and data. _ 1 Which ofthe following is NOT a benefit of data literacy ? {@) Helps in making evidence-based decisions {) Enhances problem-solving skills (6) Enables effective storytelling with data _ @ Improves artistic abilities | 2 What is the first step in the Data Literacy Process Framework ? (@ Explore and Visualize (@ Import and Tidy Deta (0 Analyze and Communicate (@) Interpret and Conclude 3. Data security zims to protect data from : __(@) Being accessed by authorized individuals wers individuals to make _based on facts and data. (decisions) the Data Literacy Process Framework involves organizing and structuring (F) a security and data privacy are Interchangeable terms that refer to the same concept. (F) do not require any data for training or operation. (F) StTONg encryption measures is a best Practice for protecting data privacy. (T) by Design principles focus solely on implementing strict access controls for sensitive data. (F) punique and complex passwords for all accounts is not an important cyber security practice. (F) iblic Wi-Fi networks for convenience. (F) iterate and how it can impact various aspects of life, DDeserbe the process of applying the Mata Literacy Process Fraunework 10 analyze a dataset. & Discuss the relationship between Al and data Privacy concerns, providing specific examples, Be Dses: two potential issues related to As relation with data security and | What measures can individuals and ‘organizations take to enhance data DWiy és it important to be cautious with emails and links from unkno’ privacy concerns. security 7 wn sources 2 2B. ‘Descibe the purpose and importance of maintaining offline backups of important data. Bow can data literacy skills help individuals do fact-check claims and spot misinformation ? Discuss the role of critical questioning in developing data literacy. Explain the potential consequences of data breaches for individuals and organizations. B How can education and awareness-raising efforts contribute to better cyber security practices ? Describe the steps you would take to protect your personal data privacy online. J Tes of On A Dota Acquisition A Bost Practices for Acquiring Dato J Features of Dato and Data Preprocessing _ INTRODUCTION tin the previous session that toda d ato te types of data is essential for making informed decisions Griving progress. From the moment data is generated, to its interpretation and app ‘every step in the data journey plays a crucial role. ati | In this session, we shall talk about various types of data, data acquisition, data proces | and the importance of effective data interpretation. eg y's world is data driven world, ty 22 TYPES OF DATA Data comes in many forms, each serving a distinct purpose in analysis and decisigy. taking. Although data is available in multiple forms, broadly it can be divided into yg primary types of data. 1. Quolitetive Dato Qualitative data provides insights into the characteristics, attributes, and qualities qf) phenomena. Qualitative data is often subjective and descriptive. For instance, following are all examples of qualitative data Customer feedback, such as comments, reviews, and testimonials provide qualitaie insights into customer satisfaction, preferences, and experiences. | | © Interview transeripts, such as conversations with individuals or focus groups yi qualitative data, offering perspectives, opinions, and personal stories. '® Social media sentiment, such as posts, comments, and discussions on social me "platforms reveal qualitative insights into public opinion, trends, and sentiment. ative data describes qualities or characteristics F phenomena. following are all examples of qualitative data : ata, sich as transactional records, revenue figures, and purchase history provide quantitative insights into sales performance, trends, and patterns, @ Sensor readings, such as measurements of temperature, pressure, and humidity tollected by sensors for monitoring environmental conditions, @ Financial metrics, such as stock prices, market indices, and financial statements PP tamish quantitative insights into economic indicators, investment performance, and eal financial health. Qualitative data is also known as Categorical data (Quantitative 4 Qualitative Data ‘and Quantitative data is known as Numerical data. Quantitative Data is data You have already read about these in section 4.2.1 involving numbers and measuring page 119-120. variables and Qualitative Data is data tnvolving descriptive, non- ‘sumerical information. gS also known as Categorical data and iis known as numerical data, Data Quantitative Data represents quantities or numeric values. TR cannot be measured oF «: 2 It can be measured and expressed numerically, eg., ‘alows, opinions, feelings, descr counts, measurements, scores, ratings. MR collected through observations, inte It is collected through structured data collection questions. methods like surveys, sensors. It is objective and statistical analysis is possible. It provides precise, measurable, and testable data, Its analysed sing statistical and computational methods. Ie answers “how many” and “how much” types of questions, "| cuantitative data types: Number, percentages, S96 anni VAL INTELLIGENCE. (Supplement 2.3 DATA ACQUISITION 2 that are used to ¢9 cesses, methods of SIEM ciment or analyse Data Acquisition refers to pr inforiation related to a certain theme. or OR/@HTSS °° from diverse suites ing info ives collecting yaa Acquisition phenomenon. Acquiring data invol =. various methods tailored to specific objectives ad pe Acquisition rel contexts. ves, methods OF You have already read about common approaches t©ghat_ 37 data acquisition, in section 5.3 page 136. informe oieci, pe .e common data or analyse some pheno! Let us quick recall about some of th acquisition methods, here = ' ctured question? x : 1, Surveys. case; a ae eintons, preferences, pehaviors, and demoarephics, or’ te tions between researchers and partiq. sstions and probing inquities observation and recording | or controlled settings, ction 5.3 page 136 forthe cies administered to individuals gy ematic fi i ies involve syst Observational stu eo surl i a in 1 of behaviors, interactions, and phenomen: There are many other ways of acquiring data. Please refer to s¢ came. 2.4 BEST PRACTICES FOR ACQUIRING DATA ictices to ensure validity, reliability, 3. Acquiting jh-quality data requires using the best pra‘ y it is recommended to use following essential guidelines : 4. Define Clear Objectives. Clearly state the aes } purpose, goals, and research questions (as per Gected outcomes) that should give clear guidance to 3 ; Be elect appropriate the data acquisition process. abode for the context of data collection, objectives of data ‘collection, and the target audience from which the e Ensure data quality younger audience the questions and languages J eee a } collected Proied pa | older or professional people. porioes J ‘questionnaires, interview guides, observation protocols, Maintain ethica | j and experimental procedures with careful attention as Figure 2.1 Best Practices for nd ethical integrity. For effective data acquisition, 2. Select Appropriate Methods. Choose date collection methods and techniques that are suitable Design robust instruments data is being collected. For example, to get data from at ‘ “Design Robust Instruments. Develop survey Data Acquisition. Seision 2: ACQUIRING DATA, PROCESSING, AND INTERPRETING DATA Mae te ate aalt¥: Implement measures to validate, verify, and clean the minimize errors, inconsistencies, and biases, e so gape Privacy. Respect confidentiality, anonymity, and informed Maintain Bthi eguard the rights and privacy of research participants. Foal Poles tats. Adhere to ethical guidelines, codes of conduct and governing research conduct, integrity, and transparency. Note ke High quality data is valid, reliable, and of ethical integrity. 25 FEATURES OF DATA AND DATA PREPROCESSING ATA PREPROCESSING Raw data often contains errors, inconsistencies, and missing values that require preprocessing to enhance quality, accuracy, and usability. Thus, data is preprocessed to clean it and make it appropriate for use, Refer to section 4.2, page 119 that has already talked about data features and ways to preprocess data | Some common data preprocessing techniques are : ‘Daa Preprocessing (@ Data Cleaning ii) Data Reducti Data Preprocessing refers Be tans {#) SCY to the process of making data (ii) Data Transformation (iv) Data Integration _ appropriate for use by removing 1. Dato Cleaning discrepancies in it. Data cleaning involves identifying and correcting errors, inconsistencies, and anomalies in yaw data so it’s easier to understand and work with and become more accurate and reliable, ¢.g., imagine you have a list of student names and ages, but some ages are missing. You'd fill in the missing ages or remove the incomplete entries to make sure your data is complete with every student entry having an age. {Date Cleaning Data Cleaning is a process i i js of identifying and correcting % Removing duplicate records. Identifying and eS Aliminating duplicate entries or observations to Shomalies in raw data, prevent redundancy and maintain data integrity. ‘There are multiple data cleaning methods, such as Handling missing values. Imputing missing values or deleting incomplete records to mitigate the impact of data gaps on analysis and interpretation. ® Standardizing formats. Converting data into consistent formats, units, and structures to facilitate comparison, aggregation, and analysis. 2 Dota Reduction : Data reduction is a technique used to reduce the size of a dataset while still preserving the most important information. aes i table format or repre: . ‘transformation involves converting raw data into a suital S eoreqeaene ‘or modeling. There are multiple ways of doing so, cee are: @ Normalization. This refers to scalin¢ distribution to remove differences in magi 4 Encoding categorical variables. This refers to converting categorical variables ing for computational analysis and modeling, 9 of numerical data to 2 common range nitude and facilitate comparison, ‘numerical or binary representations @ Feature engineering. This refers to creating new Data Transformation ‘features or variables from existing data to capture Data Transformation ref relevant patterns, Jonships, or trends for to the process of converting aa _— data into a suitable format. eee representation for a Det visualization, or modeling. — Data integration involves combining data from multiple sources or formats into a unified Gataset for analysis, reporting, or decision-making. There are multiple ways of integrating data, some of these are: © Merging datasets, which is combining datasets with common identifiers or keys te ‘enrich data with additional attributes or variables, eg., if you have sales data in one “spreadsheet and customer data in another, you'd merge to create a single dataset with both sales and customer information. ° Joining tables, which is linking relational databases or tables based on shared fields ‘or relationships to consolidate related information for analysis. @ Concatenating files, which is appending or concatenating files with ean ‘Similar structures or formats to create comprehensive dataset for analysis Data Data Data Cleaning Preprocessing Tiensfomation Figure 2.2 Data Preprocessing. ) Data Interpretation Analysis analysis is like examining clues to (pattems, trends, or answers to qu ent test scores. You might analy: if there's a relationship betwee solve a mystery, It involves studying your data to estions. For instance, suppose you have a dataset of ze the data to see es Data Ai m study time and Gece is i of est performance by Comparing scores of students Data Analysis is a process who studied a lot versus those who studied less, apply many techniques on data to find/extract trends, correla- tions, outliers, and variations that convey a meaning or point toa specific result. Descriptive analysis shows what happened. It describes data , analysing sales data to get sales numbers for each employee and the Data analysis takes place in many forms : ® Descriptive analysis answers What happened ? nu ysis. Diagnostic analysis finds why something happened, e.g., hospitals suddenly start having increased number of patients. Descriptive analysis may find that many hospital patients had the same virus symptoms, so the virus caused the patient increase. © Diagnostic analysis answers Why did it happen ? (iii) Predictive Analysis. Predictive analysis predicts what might happen in the on data patterns, e.g., a product sells best in September and October each year, sales are predicted for those months next year. ® Predictive analysis answers What might happen in the future or, numbers. It involves m. di he story behind the Tr it. For example nd | gen interpret it as that stegitt et Fe rapgesting a positive relat; - ~ ‘Data Interpretation academic achievement. bet a Data Interpretation jp a eemeniial for extracting i sense of the anaigh ss from your data and using them * data, drawing conclusions and problem-solving. deriving actionable insighge inform decision-making Interpretation problem-solving ition takes place in various ways * : ; . * ing and interpreting data yi |. EDA involves analysing an ee eat and insights without preconceived hypotheses, | n testing hypotheses, confi | data analysis (CDA). CDA focuses 0 ig ‘ ae E isting knowledge using statistical methods and inferentj) | | alysis, Predictive analysis aims to forecast future trends, behaviours wv a ‘on historical data patterns and predictive modeling techniques, data analysis. Diagnostic analysis focuses on understanding the underlying ‘Gatvers, or factors contributing to ohserve patterns, anomalies, or outcomes jn Dato Interpretation epretation is important for extracting actionable insigh driving organizational success. Data interpretation is i any benefits, such as : n decision-making instead of relying on gut feelings alone, pattems, trends, and issues to solve problems and optimize ts, informing portant and performance of employees, products, strategies etc. to foster by pinpointing potential risks and developing plans: Past data to find areas to reduce expenses. : based on historical data trends for bett studying about: eer pre} both data forms as given th: ‘Qualitative vs Quantitative Data Handling and Processing e table below. Qualitative Data Non-numercal or catego ical information, such as Sescriptions, opinions, observations ete, Subjective or qualitative aspects of a phenomenon. " eet nie a sy al ob , factual iteltigence for smarter decision-making, forecasting, optimization roeesing and Tea Aetins, we can summarise it Quantitative Data Numerical information that can be measured or counted. Objective or quantitative aspects of a phenomenon. Words, texts, images, or codes ; ‘can be orga- nized into categories, themes, or patterns or patterns, Nnumbers or numerical values ; can be ‘organized into tables, graphs, charts, or Interviews, focus groups, observations, or open- ended survey questions, statistical summaries. Surveys, experiments, or structured observations, Identifying patterns, themes, or commonalities, patterns, or trends. Focuses on numerical relationships, Coding, content analysis, or discourse analysis. Computations, statistical tests, modeling. In-depth explanations, rich descriptions, and contextual insights. Numerical measurements, statistical relationships, and quantifiable results, Coreckpoint ee Not easily generalizable to a larger population (findings may be specific to the studied context). Multiple Choice Questions 1, Qualitative data deals with (a) Descriptions and qualities () Mathematical calculations “Quantitative data deals with: | partons and qualities Generalizable for a larger population. (&) Numeric values and quantities (@ Statistical analysis (#) Numeric values and quantities (@) Colors and characteristics (b) Test scores (©) Focus | (d) Content (d) Testable () Measurable (d) Exploratory @) Objective ©) Descriptive is NOT a method of data acquisition ? (d) Data storytelling p @) Sensors (@) Observations fy the data analysis recess? . ae ) Data interpretation (id) Data processing ¢ (b) Data acquisition iG () Interpreting the data (d) Making decisions based on data data preprocessing ? (0) Data cleaning (d) Data storytelling ( Performing operations on the data {d@) Interpreting the data Visualizing the data he following is NOT a data processing operation ? ormatio (b) Data modeling (d) Data integration yn is important ? () Quality control (d) Anticipation of future trends ‘communication [is NOT a reason why data interpretatior collection of analysis describes or summarizes data using statistics ? - analysis (b) Predictive analysis five analysis (d) Prescriptive analysis ‘analysis determines the reason behind an occurrence ? (b) Descriptive analysis analysis (d) Prescriptive analysis ;analysis ‘uses data to make projections about the future ? analysis () Diagnostic analysis (d) Prescriptive anal answers which question ? | we do about it ? (b) What happened ? n in the future ? (d) Why did it happen ? recommendations based on other analysis types ? 0 (b) Diagnostic analysis (d) Prescriptive analysis of a diagnostic analysis question a happened?” () “Why did sales decrease last quartet 2” pemild we do about it 2° (@) "What ate the projected sales for next year 2” ich industry 7 (©) Manufacturing (d) All of these ws for future actions? ctive analysis is commonly used in whi G@iiesitore (0) Finance dh type of analysis provides recommend, {) Descriptive analysis (b) Diagnostic analysis {@) Predictive analysis (4) Prescriptive analysis B Which of the following is NOT a benefit of data interpretation? {@) Informed decision-making (b) Risk management (@) Performance assessment Fill in the Blanks data deals with descriptions and qualities : data deals with numeric {¢) Data acquisition values and quantities 5 Qualitative data is in natu @ Quantitative data isin nature I Data preprocessing involves ___ the data for Processing. & Detadeaning is a step in data 7, The first step in the da analysis process is data § Sensors are a method o 9 Data normalization is sai! Aghia $9. The purpose of is to remove errors and inconsistencies, is tl { combining data from multiple sources, 2 analysis describes or summarizes data using statistics, #8 analysis aims to determine the reason behind an occurrence. He analysis uses data to make projections about the future 5, analysis provides recommendations based on other analysis types. Pe analysis is used to identify the root cause of a problem, ‘True { False D Oialitative data is useful for exploring new phenomena. Quantitative data is useful for testing hypotheses and generalizing findings. tive data is objective in nature. it data is subjective in nature. Processing involves performing operations on the data, interpretation is not important for quality control processes, 4 analysis aims to determine the reason behind an.occurrence, » y//), i ostic analysis is used to identify the root cause of a p d through structured surveys: {s collected through methods like observatio alps answer questions like “How any?” and “how much?” or quantitative data include numbers and percent : Idata is useful for testing hypotheses ane! generalizing findings sssing involves analyzing the data leaning is a stop in data processing: aggregation involves performing operations on the dal ‘Data visualization is an example of a data processing op? Sata interpretation is important for data Competency Based Questions ance of students across diff help teachers and aninistal ins and interviews, ages. acquisition 1. Your school has collected data on the academic perform and grade levels. Which of the following approaches can faformed decisions about curriculum planning and student support {a) Create a presentation with te {@) Create visualizations without any analysis or inter Ne) Analyze the data and creat informative viewalatins, °F {@ Ignore the data and rely on personal experie > A company is analyzing customer feedback to impr social media comments, and customer support interactions Tpethoris would be most effective for acquiring this diverse dataset? (@) Conducting in-person interviews with a select group of ¢ (@) Scraping data from random websites for a broader perspective {c) Integrating data from various sources including surveys, socal media APIs, and CRM systems @ Ignoring social media comments as they might not be relevant 3, A research team is studying the effects of climate chan; i cn. ge on biodiversity in a particular reg “They have collected raw data from weather stations, satelite images, and field surveys, What is rst step they should take in preprocessing this data ? (@) Removing oulies and anomalies (0) Converting satellite images into nui () Aggregating data from different sources into a single dataset (@ Conducting statistical analysis on the rave data ‘team wants to analyze the purchasing beha Ee ceca es of customers, They have a r timestamps of transactions. oe Peay canta? P ions. What preprot ¢ amounts to a standardized range xt-based explanations of the data. pretation. Jaining how they can aif snces and opinions. ove its products. They collect data from onli Which of the following sustomers imerical dati Session 2: ACQUIRING DATA, PROCESSING, AND INTERPRETING DATA Assertion & Reasoning Questions aT Reasoning Questions a questions, a statement of assertion (A) is followed by a statement of reason (R). choice as ; th A and R are true and R is the correct explanation of A, th A and R are true but R is not the correct explanation of A. Anis true but R is false (or partly true, ‘Ais false (oF partly true) but R is true, (¢) Both A and R are false or not fully true. ‘ iipertion (A). Descriptive analysis summarizes data using statistics, Reason (R). Descriptive analysis to determine the conclusion about the distribution of data. f Asstion (A). Qualitative data deals with numeric values and quantities Reason (R). Quantitative data deals with descriptions and qualities, 4 Ascatton (A). Structured surveys are a method of collecting qualitative data, Reason (R). Qualitative data is subjective in nature, i Assation (A). Qualitative data helps answer questions like “how many?” Reason (R). Quantitative data helps answer questions like “ & Assttion (A). Quantitative data is useful for testing hypotheses and generalizing findings. Reason (R). Qualitative data is useful for exploring new phenomena & Ascertion (A). Data cleaning is a step in data preprocessing, Reason (R). Data preprocessing involves analyzing the data, J, Sezertion (A). Data agurogation is an example of a data preprocessing operation. Reason (R). Data processing involves and “how much?” ‘why?” and “how?” ), performing operations on the data. § Assertion (A). Data interpretation is important for deriving outcomes, Reason (R). Data visualization is useful for communicating insights. Cc Us REVISE Data can be divided into two prim Qualitative data describes quali Hy types: Qualitative data and Quantitative data. 5 or characteristics of some entity or phenomena, Guantitative data represents information about something through numerical values, Guaitative data is also known as categorical data and quantitative data is known as mumerial data. Common data acquisition methods include surveys, questionnaire, interviews, High quality data is vali, reliable, and of ethical integrity. Pata reprocessing refers to the process of making cata appropriate for use by removing discrepancies in it, Some common data preprocessing techniques are : Data Cleaning, Data Reduction, Data Transformation, ‘nd Data Integration, observations and many others, Data cleaning isa process of identifying and correcting errors, inconsistencies, and anomalies in raw data. Pata reduction is a processor a set of techniques used to reduce the size of a dataset while stil preserving the most important information. * Pata transformation refers to the process of converting raw data into a suitable format or repres Gnalysis, visualization, or modeling. ‘Processing refers to manipulating, analyzing, and Interpreting data to extract derive insights. ey + Data Analysis and Date Interpretation. q sis process of apply many techniques on data to find/extract trends, core hat convey a meaning or point toa specific result ton, -can be descriptive, diagnostic, predictive, ai Dae ‘analysis answers : What happened? ; Diagnostic analysis answers: Why dig ip ‘enalysis answerse What might happen in the future’; Prescriptive analysis any we do about it? a ‘Doe interpretation involves mating sense of the analyzed data, drawing conclusions, ang ‘ectionadle insights to inyorm decision-making and problem-solving. | "> Bifective dato interpretation is important for extracting actionable insights, informing decision, ‘and driving organizational success. Mahng tative and quantitative data, How are these alternetively known as? ‘Ans Qualitative data deals with descriptions and qualities, while quantitative data deals y, ‘pumeric values and quantities that canbe measured, Qualitative datais also known as Categoria, an } ‘and quantitative data is known as Numerical Data. | 2 Prooide ax example of qualitative « | ‘Ans. Customer reviews, interview transcripts, and case studies are examples of qualitative dag | How s quantitative dats collected ? ‘Ans Quantitative data is collected through structured methods like surveys, sensors, and di | collection instruments. 4. Give an example ofa predictive enalysis question. | Ans. An example of a predictive analysis question is “What are the projected sales for thene, | ‘quarter based on historical daia trends 2” | 5. What is the purpose of data cleaning in data preprocessing ? | “Ans. The purpose of data cleaning in data preprocessing is to remove errors, inconsistencies, and inaccuracies from the data 6. Gine an example of 2 data processing operation. | ‘Acs An example of data processing operation is data aggregation, which involves summarizing or rolling up data. | 7. Why és dats interpretation important ? | tee geri oe formed deviciormaking, isk managencte ‘control, performance assessment, and anticipating future trends kased on data insights. “8 Mention 2 method of data acquisition. ___ Ans: A method of data acquisition is using sensors to collect data, such as sensor readings tf Internet of Things (IoT) data. some examples of various types of date analysis, . Some examples of various types of data analysis are : a ‘The most popular ice cream flavors based on total sales. ‘Customers buy more when there are promotions/discounts. avs mall traffic is expected to increase by 10% based on local population growth 'a new store location in the fastest growing neighborhood. a Seven 2 ACQUMNG DATA, MOCESSNG, AND INTERPRETING DATA SWAT 4 normalization in data Preprocessing ? oemalizaion is data prepraconing age oF ForMat For consistent analyse, step that involves scaling of transforming data to Practical Session HL Trend Analyaia’s Analyzing Cafeteria Food Sales Trends tafigure out the trend of sates of fod items in school canteen or cafeteria, Students can collect data iy eekly 0s of difere food Items inthe school cfeterla over «peri of te (eg, ome seneater onde yor), They should then analyze the daa to dentytrenda, uch as {@ Whi fod tome are most popular and least popular? How do sales vary across diferent days of the u asing or decreasing oF months ? (49, there an inc trend in the sales of certain items over time ? p do sales trends correlate with factors like (ue) How ‘Sehution. Sample Data : Me table below shows the daily sales (in units) of three ria for one month (20 schoo! days) iehool cafeteria for one mont y' Day Pizza Burgers Salads ‘weather, school events, or holidays 2 food items (Pizza, Burgers, and Salads) in the 1 45 62 18 2 52 58 2 3 48 65 20 20 38 72 5 Interpretation and Results @ The most popular item is Burgers, with the {i) Salads have the lowest average daily popular item. (i) There is a slight! awareness camp highest average daily sales of around 62 units. sales of around 20 units, indicating they are the least Y increasing trend in Salad sales over the month, possibly due to health ns or seasonal changes, WG) Burger sales peak on Fridays, while Pizza sales are (©) Overall, there isa sales on Mondays relatively consistent throughout the week. cyclical weekly pattern in sales, with higher sales on Fridays and lower g errors, inconsistencies, and anomalies in raw data an The process of combining data from multiple sources or formats into @ unified dataset for Be Feporting, or decision-making A process or a set of techniques used to reduce the size of a dataset while stil preserving the information ‘The process of converting raw data into a suitable format or representation for tion, or modeling Data describing qualities or characteristics of some entity or phe Data representing information about something t In (0) Howe many? and How much ? (0) What are the characteristicn ? () Tent, midio, video, Images (d) Measurements and scores (@) Numbers, percentages, frequencies (@ Audio recordings operation? (@) Data collection (if) Data visualization (6) Compliance and reporting (@) All of these ? {@) What might happen in the future 2 6 What happened 7 (d What should we do about it ? ‘Diagnostic analysis aims to answer which question? (@) What happened? () What might happen in the future? What should we do about it? (@ Why did it happen? © Diata acquired from sensors is an example of: : | Qualitative data () Quantitative data (€) Descriptive data__(#) Prescriptive data TBE Wee of te flowing isa step in data preprocessing? ‘Quatitative data beips answer questions like ___and —__.("wshy’ how?" ) = (“how 2 Quantitative data helps answer questions like _and >*, “how mei “The purpose of data cleaning in the preprocessing stage is to (@) Remove errors andi (@ Visualize the data (@ Date integration (@) Data normalization used to identify the root cause of a problem? (b) Predictive analysis at (@) Diagnostic analysis example of __daia. (qualitative) ple of _ data. (quantitative) is an example of _data. (quantitative) ‘

You might also like