From the course: Excel with Copilot: AI-Driven Data Analysis

Text analysis with advanced analysis in Copilot

From the course: Excel with Copilot: AI-Driven Data Analysis

Text analysis with advanced analysis in Copilot

- [Instructor] Excels text processing capabilities, particularly for tasks like natural language processing, have traditionally been limited, but the introduction of Python in Excel significantly expands these capabilities. And now with Copilot and advanced analysis, gaining insights from text data becomes much more accessible. To follow along with this demo, you can open the advanced-analysis-text-analysis-start workbook. We're going to be working with a well-known movie review data set, which contains 50,000 movie reviews, preclassified as either negative or positive in sentiment. We'll start by using Copilot to pass our data into advanced analysis and see what we can uncover. Let's head to Copilot and we will go to advanced analysis. We'll see the same messages here. Let's start our advanced analysis. And again, what you see here might be completely different than what I get. So let's roll the dice and see what happens. Now, almost every instance of advanced analysis, or really every instance I've seen, starts with just reading the data in from the table that's done up here with the XL function. If you see that from this first cell in the report here. You can learn more about how Python and Excel gets data from Excel with Python and Excel courses from the LinkedIn library. We do see a preview of the data set here. So just to make sure that this data has been read properly into Python in Excel, and now it looks like it is going to attempt to perform some exploratory data analysis on the reviews. Okay, so it looks like Copilot has completed our analysis here. And again, you could continue with some of these suggested prompts. Let's take a look at what the output logs as it were here. We see a bit of information about what Copilot gave us. Visualizations, here are the visualizations. We'll go over those in just a second, but I do want to see something here. You'll notice that there actually was an error that Python threw when executing the code generated from Copilot. So Copilot generated some code, Python ran it. Python encountered an error, and Copilot is actually smart enough to take that error and rewrite it and tell you what the error was and you probably don't care, right, 'cause it's fixed. But that just goes to show how intelligent these tools are, right? That they're able to really fix their own mistakes here. Let's go ahead and move down here to the resulting plots. So the first one, we get a bar plot. It's showing us the distribution of positive versus negative reviews. It displays it as a proportion. So we're looking at like a 50/50 split of positive versus negative reviews. That could be really good to know if we're trying to build any predictive models, we want to have a sense of how common it is to result in either category. So it looks like it's about 50/50 there. Okay, review length distribution. So this is pretty interesting. It shows us the overall distribution of how long each review is. Maybe we want to break this down further and now is a longer review tending to be positive or negative in nature, right? So there's some different things we could do as it relates to the review length. The last thing we have down here is a word cloud. Now what this is trying to do is visualize how common certain words show up in the data here and it makes sense. Things like film, movie, and so forth are are common. I think it might be easier just to plot this as like a bar chart or something. People often criticize word clouds as being kind of hard to read and not really conveying that much actionable information. They look kind of cool, but really hard to make heads or tails of them sometimes. So it's interesting that in this case, Copilot gave us plots, right? Whereas in the last example we had to ask for those specifically. You want to make sure that you're using visualizations and tables and analyses together with Copilot and Python. So again, good to understand and have good questions in mind already for what might be important. So one thing that I'm going to do is let's specify something a little further. I'm going to ask Copilot to provide the most common 15 words for either sentiment, positive or negative. (keyboard clacking) Okay, now it looks like in this case we do get an error that Copilot is not able to fix itself. I am heading over to cell A40 and it looks like we have a timeout error. So what happened here is this Python code is running, the results are being executed on the cloud and being sent back to our workbook and it looks like there was just some kind of timeout. This is a larger data set, so we may need to just try again. I'm going to go ahead and just ask this again and let's see if we can get lucky. This is a bigger data set, so I'm not too surprised that we're going to get a timeout here. I was able to run this earlier, so let's see what happens this time. Okay, now it looks like we were able to get a result here. However, preview is not exactly in a very readable format right now. We could try to get a preview of it, but all we're seeing is this data frame value here. So what I'm going to try to do this in this case, is ask Copilot to visualize this. Let's see if we can get a nice plot of these results. (keyboard clacking) Okay, so it looks like this works. Let's zoom in here a bit and you will see the top 10, or I'm sorry, the top 15 most common positive words, right? Top 15 most common words in negative reviews, to be more specific. You can take a look at these, this does go to show how tricky it can be to analyze text and understand the meaning of text data as you're going to see here, things like good and great do show up in the positive reviews, however good actually shows up in the negative reviews as well. So we'll leave it at that and you can explore this further if you wanted to perhaps ask Copilot to build a predictive model asking can it predict whether a movie review is classified as positive or negative? You could try that, but as you're seeing here, you really need to be able to roll with the punches and ask different things of Copilot, whether you're getting a timeout, whether the output isn't quite in a format that's easy to understand, right? Having that inquisitive mindset and really knowing the analyst toolbox of different ways to format the data, different ways to interpret the data, different diagnostics that you might want to run before you take an analysis's gospel is going to be really important as you work with Copilot and Python, 'cause there's so many directions that this tool can take. So you just want to make sure that you can get exactly what you want via these prompts.

Contents