POWER BI
Task of data analyst
1) Prepare
2) Model
3) Visualise
4) Analyse
5) Manage
Implications of using direct query
Benefits:
Frequency changing data
Need near real time
Setup an automatic schedule also
Large Data Volumes (not large people , data)
Multi dimensional data
Limitations:
Dependent on data source performance
Security between source and destination
Limited modelling capabilities
Limited transformation features
Power Query
Finds abnormality in data
Cardinality
Uniqueness of values in your column
More unique more cardinality
Unique does not repeat itself in data
Data import errors
Timeout expired : loads of time to load
Query timeout :
Couldn’t find data formatted as table : convert it into a table and pull it
Could not find the file: location changed , (ex pull from desktop)
Data type errors
Data Profiling
FINDING DATA ANOMALIES is important to keep the distribution of the data
uncomplicated.
Easy to work
Column quality: valid error and empty text then number wrong
Column profile: value distribution
Column distribution:cardinality (uniqueness)
Cleaning and combining queries
Properties- query name
- Changes in properties side reflect on the queries side too (eg changing
name)
- Naming query properly to understand is vital the name needs to make
sense to u at least
Applied steps:- u can see what changes u have made and these will be made
when future data is also loaded
Common transformations
(transform tab)
- Pivot and unpivot columns are most imp
- Replace values:- eg can be used in the warehouse warehouse error
(add column tab)
Used to add columns
Diff types can be seen under it
(Home tab )
Remove rows and columns
Remove other columns- leaving selected columns all other columns are removed
Data types
The abc and 123 on column names based on first 1000 rows
Combining queries
Append- one on top of other u will add the tables
Eg:- table 1 has 300 rows and table 2 has 100 the total rows u see will be 400
Problem :- duplicates are not eliminated
Used only when columns are matching
Merge- similar data will be merged together
Solves the problems that occur in append
types :- check wa group for venn dig representation
1) left outer :- all the data in left column is there but only matching data of right
column is added (left table is given priority)
6) Right outer:- same thing but instead of left table right table is given priority
7) Full outer :- combining both the tables with matching rows matched and other
rows as null (rarely used as its confusing) and remaining are stacked down
we can see all the data but matching ones can be seen more clearly
8) inner :- only matching columns are seen
9) Left anti:-only unmatched columns from left table can be seen
10) Right anti:-only unmatched columns from right table can be seen
Pivot and unpivot
pivot:-
Add or remove table structure to meet your aggregation needs (addition
subtraction)
Summarization of data
2 types:- aggregation of count
Aggregation of sum or avg
Unpivot :-
does ulta of what pivot does basically expats the summarised data
Makes column data into rows form (i think)
Converting 2d to flat- rows and columns (two colours wala)
Pivot Columns Aggregation on Count-
Counts the colour of blocks
Aggregation on Sum/ average-
Adds values corresponding to the blocks or finds the average
Query Folding
Pushing data transformations to the source for better performance and
efficiency
(sends steps to source before it loads so it doesn't slow up the power query )
Pulling data [ ideal time to do query folding ]
Does Not support
1. Adding index column
2. Merging and appending (do before this step)
3. Changing data type
Modelling data in Power BI -
process of designing data flows
2 types of data tables-:- facts and dimension tables
Fact tables:- activities or events happening in an organisation its the actual data it has
observational data
Dimensions table:- the tables that provide the details they answer the questions that
may arise when reading data in fact tables
Star schemas:- fact table in centre and related dimension tables around it (ideal)
Relationships
Making hierarchy, u can make many but there should not be any ambiguity
A single thick line implies that its an active relationship and at one time only one can be
active
One to many and other relationships are officially known as cardinality too
Here many is denoted by * and one as 1
Ideally preferred:- one to many where one ke side par dimension and many side par fact
and single relationship not both ways
Circular relationship:- (not ideal)
Relationship dependencies:- eg sales[total cost] = sales[quantity]*sales[price]
Here total cost is dependant on qty and price
The data should be connected in a proper relationship so as to make it work
Hierarchy:- order in which u place data
creates subdivisions and makes it easier and more connected as we can go to other levels
of hierarchy through one graph only and go deepest level of dataset
It makes it grouped and now u dont have to make multiple graphs a u make 1 hierarchy
and just dive deeper in it
Summary
1. Power BI Tasks: Prepare, model, visualise, analyse, and manage data.
2. Direct Query: Provides real-time data and handles large volumes but is limited by data
source performance and security.
3. Power Query: Identifies data anomalies, with key aspects including column quality
and distribution.
4. Cardinality: Measures the uniqueness of values in a column.
5. Data Transformation: Includes appending (stacking tables) and merging (combining
tables with various join types).
6. Pivot/Unpivot: Pivoting turns rows into columns for aggregation, while unpivoting
does the reverse.
7. Query Folding: Enhances performance by performing transformations at the data
source but has limitations on certain transformations.
8. Data Modeling: Involves creating fact tables (quantitative data) and dimension tables
(contextual data), ideally using a star schema.
9. Relationships: Define connections between tables, with preferred one-to-many
relationships.
10. Hierarchy: Organises data in levels, facilitating detailed exploration and simplified
reporting.
DAX:
1. Data analysis expressions
2. It's a library of functions and operators and a list of functions , like in excel we
have vlookup, sum , etc. More powerful than excel in power Bi
3. To build formulas
4. Create calculated measures columns and tables
5. Creates calculated columns,measures and calculations for same data in diff
context
6. Measures of 3 types:- implicit, explicit and quick measure
New table : create a date table
Quick Measure : basic column , when yk what to do but not
how to do its like a guide for beginners
Explicit : created by default , eg: sales
Implicit : it is to be created , eg: calc near profit
Calculated columns vs measures:-
calculated column:-
1) makes values for each row in tables
2) Store values in the .pbix files
3) Consumes space and is visible in table view
4) Increases model size
5) Row by row calculation can impact performance
6) Must be referenced with measures for reuse
Measures:-
1) calculates on demand
2) its not visible in report view it only comes into action when u put it in visual
3) Calculated based on filters
4) Does Not affect model size
5) DAX expression may still be suboptimal
6) Can reference other measures directly for reuse
Advance dax concepts:-
Filter:- putting a condition/ criteria , grouping, measure is split
Measures are contextually different or dynamic depending on filter
Max 255 filters can be added (recommended to use less than 10 like 3 per measure)
calculate() function:- u will define what to calculate
Adjusts how measure interprets data fillers enabling context control , filter in this will
override visual filter
Userelationship() function :- to make measures with inactive relationships
Semi additive measures:- context dependent as in result vary based on dimensions
Visuals
Visuals:
1. Type- Different combinations of visuals
2. Number- Amount of visuals on the desktop file, limit to 3-4 visuals for one
report. The lesser the better
3. Position- Placement of the visuals. Based on the focus of the audience on
different parts of the screen
4. Size- The adjustments of size in the visual based on the importance
5. Interaction- How audience interacts with the data
6. Hierarchies- Levels of data to be shown
Effective Visualisation:
The visualisation of your data depends on the personality and perception of the audience
that is viewing it rather than the creator.
Types of Visuals:
● Table vs matrix visual- Table shows logical rows and columns for each data
value, it shows repetitive aggregate data as well whereas, a matrix shows it in the
form of a hierarchy avoiding redundancy.
● Bar Charts- Stacked or clustered bar diagrams. Stacked- Sub divided bar diagram
and clustered- double column bar diagram.
● Pie and Donut Chart- ykwim. Ignore pie charts as much as possible, it is pretty
(useless) because it only recognises percentages. Donut charts are the same as pie
charts except it's empty in the centre.
● Treemaps- Displayed as nested rectangles summing to a total. They fall in the
same category as pie and donut diagrams.
● Combo Charts- Combo charts show the correlation between two data sets
● Card- Displays a single value in a card.
● Funnel Chart- Displays a linear process where the data starts with leads and is
filtered downwards
● Waterfall- Shows data in waterfalls matalb upar neeche hota hai. Helps tracking
growth, profits, sales, etc. using a concept similar to candlesticks in shares.
Importing vitals from AppSource
● Some are certified visuals while some may be uncertified. Ensure you only import
certified visuals as uncertified visuals may lead to your computer getting hacked.
Certified visuals have a blue tick
● DIY Visuals- create your own visuals (chatgpt prompt can create visuals if you
enter the correct prompt in java script)
Format and configure visualisations
cross highlighting:- when u select a bar from one graph and the data related to it is
highlighted in other graphs and the non corresponding data is dimmed
cross filtering:- when u click on 1 column the data which is not related will get hidden
(slicer does this usually)
what to keep in mind while formatting
● what info do they require
● what are their preference in colour schemes
● the person's character/personality
Other Tools:
● Text Box
● Button
● Book Mark
● Blank
● Shapes
● Images
Bookmark
preserves the slicer settings of that page to revisit later
it captures
● current page
● Filter (already set with bookmarks)
● Sort order from A-Z
● Change the slicer
● Drill Location
Layer Order and Tab Order
Selecting the order in which layers will apply
Selecting the order in which elements will be highlighted when we click on tab
Drill Through
Leverage of AI
BINNING OF DATA : grouping data into categories or fields
Outliers :
1. Points away from the bunch.
2. To identify the outliers for scatter chart is called clustering,
3. making groupings , similar data points in groups
4. U dont need to click on on bar charts
5. It automatically group them in 3 parts , low , medium and high
KPI :
1. Key performance indicator
2. Its showcases the goal that is been set
3. You will find on visualisation pane
4. When you hit on kpi visual
5. Trend axis is behind the number
6. At the last there is target
CUSTOM VISUALS: Box , whiskers chart meow
Deploy and Manage