Tell story of your data using Power BI

Dayana Benny
Analytics Vidhya
Published in
9 min readApr 18, 2021

--

Learn how to develop your skills to build simple data models and reports within Power BI desktop: Beginner’s Guide — Part 1

In this article, dataset containing the percentage of children who received the 2nd dose of measles-containing vaccine (UNICEF Data Warehouse data) combined with countries dataset obtained from a EdX course for a simple Data Analysis…

Image credits : Comfreak, https://pixabay.com/

In many Data Analytics jobs, we can see some sort of requirement as mentioned below:

“Knowledge of statistical software and data visualization software (Power BI) is desirable.”

So, let’s get to the point…

What is Power BI? Is it just a piece of software? No, It is a big collection of services to model, analyze, and visualize the data you have. Moreover, BI is an acronym for “Business Intelligence” which includes the methods of collection, storage, processes, analysis. Any domain which need to perform decision making can utilize the Power BI’s “power” to transform the raw data into knowledge and then to inform and support decisions.

What kind of data modeling is done there? One of the main task we need take care is that our data has to be consistent and accurate as much as possible to avoid getting wrong results when we analyze the data later. For that purpose, we may need to remove some non-informative data or change the structure of the data.

Power BI is employed by fetching data from a data source and then load the data into a data model. Then that model make you able to tell the story of your data, i.e. build data visualizations. Those reports can be saved as PBIX files.

Power BI Desktop is a free application from Microsoft and you can download it here.

“Go from data to insight to action with Power BI Desktop. Create rich, interactive reports with visual analytics at your fingertips for free ” — Microsoft

What about importing and shaping data? Is there any tool for the same? Yes, Power Query enables the connection to one or more data sources and do all of the data preparation steps for building the data model.

Let’s see how it works…

Power Query import the data, shape it utilizing your data preparation stages and load into the final data model. This data model is kept in Power BI inside x-velocity in-memory analysis engine. It is a database which acts as a tool inside Power BI. The x-velocity in-memory analysis engine , not only stores you data model, but also aids in great data compression. Thus it is very useful for dealing with big data models as well as guaranteeing good performance.

After we built our data model, to build the visualizations of this securely stored compressed data, Report Editor in Power BI Desktop is utilized.

Why data modeling is important?

credits: z0man, https://pixabay.com/

As I said before, “don't underestimate the power of Power BI”… Data visualization capabilities (like the attractive gems and icing on the cake) comprise of only 25 percentage of the Power BI’s “power”..! The below part is the real cake where we can see the combination of raw ingredients, the connection between the layers and the structure of the cake. Only the data analyst is seeing this underneath part (the structure of the cake) and it is the data model.

The three-step process of data modeling in Power BI:

Data modeling, image created by me

Step 1: Power BI has many connectors (like Power Query) that allow you to connect to the data sources

Step 2: Transformation of the data is possible by data cleaning, joining any data sources together, aggregating and restructuring the data.

Step 3: Load data into Power BI model and after that we are creating our visualizations.

So let’s build a basic report now…

The dataset I used here for the demonstration is the data I extracted from UNICEF Data Warehouse for the indicator: “Percentage of children who received the 2nd dose of measles-containing vaccine, as per administered in the national schedule”.

Preview of dataset extracted from UNICEF Data Warehouse

Get the data from source and you can take a sneak peek at your data. You can do loading or transforming or cancel data at this point. If you wish to do no transformations, you can do load data right away. But most of the cases, we need to do some data cleaning and other preprocessing for our data… So you need to choose Transform option while getting data.

Once transform option is clicked, Power Query will be opened. Right now, we just executed the first query that connect to the data source. You can see the tools for transforming the data in Power Query. You have the options to manipulate datasets using Python or R. Once you make a transformation to your dataset, you can also view how the dataset looked prior to that transformation.

Choose columns to keep in the dataset

You can remove unwanted columns by using the menu in the top left corner (highlighted in yellow). You can unselect all columns if you need to minimize the size of the data model and can choose which all columns to keep in the dataset. In order to improve the performance of the report, you should shrink your data as much as possible. In right side of the window, you can see the Applied Steps and if you need to undone, it is possible. Suppose, you think you need to add more columns later, you can also do it by clicking on the gear icon near the applied step.

Another way to edit the dataset is using Formula Bar (view tab > check formula bar). You can view the code for each Applied Step in formula bar (in M language).

I am also analyzing another dataset of countries that I obtained from EdX platform, as part of a very interesting course on Analyzing and Visualizing Data with Power BI. If you are a beginner in Power BI, this course is very useful to kick start your journey to become a Data Analyst.

DavidsonX’s Analyzing and Visualizing Data with Power BI ‘s first week of course in EdX explains in detail how to obtain a Report similar to one that I created as given in the figure below.

Report created in Power BI Desktop

Data from different sources can be combined using “Manage Relationships” option in Power BI. More details are available here.

Add new columns after performing Divide operation on two columns

We can rename the column names in Power Query just clicking on the column name and replace the text there. To perform adding of new column with some derived values there is an option in Power Query. To divide two columns, here, the order of selection of columns is important. So first Population column is selected and then Area_SqKm. After that, Divide option is selected. Then new column is created with the values obtained after performing the division operation (This way (Implicit Measure)is not recommended and it is mentioned here for just familiarizing with the capabilities of Power Query).

Instead of calculating this kind of results here, we can use Measures (dynamically calculated fields in the report) option or Calculated columns in POWER BI , which are the preferred and recommended ways of doing for not to add more data into the model and to keep the dataset size as lean as possible.

After performing the transformations, you can click on Close and Apply option in Home of Power Query. Then we will be in Report Editor of Power BI Desktop. When analyzing complex models later, if you want to go back to Power Query, to transform data, it is possible by clicking on Transform Data tab in Ribbon of the Power BI.

Transform : changing data type in Power Query

Modeling tab extends the option for adding new Measure and Parameters into the dataset. The theme of the report can be tweaked using the options in the View tab.

The white space on our report page where we visualize our data is the Report Canvas. Existing visualizations can be modified using Visualizations pane. All tables, columns and measures can be seen in Fields pane. Measures are created using DAX (Data Analysis Expressions) language. These flexible code aids not only in data analysis, but also in fast calculations.

Drag and drop the fields from Fields Pane into the center of the Report Canvas or you can just select the fields to be in report by clicking on them. You can determine an axis and make others as legend. If more subjects are there, you can use stacked bar chart by changing the chart type from choosing that from Visualizations pane. The size of visible area of chart can be increased by stretching the corners of the visual section.

In Power BI, by default many visualizations can be interactive without taking any extra efforts for it. If you already kept a chart selected and try to add a new one, it will alter the already selected chart into the new chart you have clicked. To prevent this, it is essential to UN-SELECT the chart you already selected prior to adding a new chart to the report canvas.

I renamed Population column to Pop and to add new measure, you can select corresponding table in Field pane and click on New measure option from Measure Tools. You can write DAX formula in formula bar now:

Population = SUM(countries[Pop])

where countries is the table name. Similarly,

Area = SUM(countries[Area_SqKm])

Then by clicking on Quick Measures option (A wizard that generates a measure by writing DAX instead of manual writing of code in formula bar) from Measure Tools, Population density can be calculated as given in the below figure.

Quick Measure — Power BI

Suppose divide by zero scenario occurs, it can be tweaked by using optional arguments (put 0 in this case) in formula bar. So Explicit measure that is created by writing a DAX formula in formula bar is always the preferred one.

If you want to filter out the outliers, Filtering option in Filters pane can be used and it is of two types : Page level filters (control everything on the page) and Report level filters (control entire report) . Another kind of filter is Visual level filter (only control one visual). The option Top N filter is used to filter a visual to the top or bottom-ranked values depending on a particular measure (example: 50 countries that are least densely populated).

Another powerful tool in Power BI is a Slicer which puts a filter directly on the report page and it is not utilizing the Filters pane where you are allowed to configure them so that they only have an effect on some visuals on a page.

Some part of the story that I need to tell you now:

When I explored the data, I just found that only 4 % of children in Mali (African Continent) got 2nd dose of measles-containing vaccine in 2019.

2019 Data : Regions were only less than 50 % of children who vaccinated with the 2nd dose of measles-containing vaccine

In this article, authors demonstrate that there is a significant relationship between Measles-Mumps-Rubella (MMR) Titers and COVID-19 severity (Inverse correlation). So as a next step, I am planning to analyze such relationships in future by considering these regions in the data analysis. So stay tuned… Bye for now :)

--

--