This is the fifth in the set of 6 parts of Pandas tutorials.
|1.Creating Pandas data structures|
|3.Indexing and Selecting data|
|4.Merge and Concat|
|6.Grouping and Summarizing|
While doing exploratory data analysis (EDA), we often work with derived metrics. For example: If we have profit and price, we want to see the profit percentage. If we have cricket matches won and lost, we might want to look at total matches played. These kinds of metrics require arithmetic operations to be performed on columns.
Here, I created two simple datasets with electronic product sales; let us see how we can derive metrics from these dataframes.
The data is the total sale of 5 important electronic categories in an outlet for first 2 weeks of a month.
Note: Observe that we have set 2 columns as label
Suppose we want to calculate the total sales in 2 weeks. We can simply add the 2 dataframes using ‘df1.add(df2, fill_value = 0)’.
This operation works same as adding 2 NumPy arrays. Argument ‘fill_value’ is given to handle ‘NaN’ values. If there are any such values, we are asking to replace them with 0. In our case, there are no such values.
div() operator: Suppose, we want to calculate profit per quantity?
Here, there are 2 new syntaxes to learn. First, we did column wise division. We picked up individual columns from a dataframe and made a division on them. Second, we added a new column (‘Profit per quantity’) to the existing dataframe while performing the operation.
We can also do such kind of operations on columns from 2 different dataframes. Suppose, we want to see what is the percentage of profit per week in total profit:
In this example, we used data from 3 different dataframes to compute profit percentages. Try to add this data as additional columns in the ‘df_sales’ without creating a new dataframe.
Likewise, try to calculate the following:
- Difference in sale from week 1 to 2 in each category. Use ‘Total Sale in Lakhs’ column.
- What is the cost price per quantity? Use ‘Total Sale in Lakhs’, ‘Profit in lakhs’ and ‘Quantity’ columns.
While working on machine learning algorithms, these kind of measurable and meaningful properties are derived as ‘features’ to make algorithms work efficiently.
Apart from these, there are also other operator-equivalent mathematical functions that you can use on Dataframes. Below is a list of all the functions that you can use to perform operations on two or more dataframes.
- sub(): –
- mul(): *
- floordiv(): //
- mod(): %
- pow(): **
In summary, we have seen how arithmetic operations are performed on dataframes and their columns. Next, Let us go a bit deeper and see how we do categorical analysis using group by and summarizations using simple built in functions.
Next! Sixth tutorial: Grouping and Summarizing (Part 6)