In an ideal world we will always work with perfect data sets. However, this is never the case in practice. There are many cases when working with quantitative data that you will need to drop or modify missing data. We will explore strategies for handling this throughout this lesson.
The DataFrame We'll Be Using In This Lesson
We will be using the
np.nan attribute to generate
NaN values throughout this lesson.
Np.nan #Returns nan
In this lesson, we will make use of the following DataFrame:
df = pd.DataFrame(np.array([[1, 5, 1],[2, np.nan, 2],[np.nan, np.nan, 3]])) df.columns = ['A', 'B', 'C'] df
Pandas has a built-in method called
dropna. When applied against a DataFrame, the
dropna method will remove any rows that contain a NaN value.
Let's apply the
dropna method to our
df DataFrame as an example:
Note that like the other DataFrame operations that we have explored,
dropna does not modify the original DataFrame unless you either (1) force it to using the
= assignment operator or (2) specify
We can also drop any columns that have missing values by passing in the
axis=1 argument to the
dropna method, like this:
In many cases, you will want to replace missing values in a pandas DataFrame instead of dropping it completely. The
fillna method is designed for this.
As an example, let's fill every missing value in our DataFrame with the
Obviously, there is basically no situation where we would want to replace missing data with an emoji. This was simply an amusing example.
Instead, more commonly we will replace a missing value with either:
- The average value of the entire DataFrame
- The average value of that row of the DataFrame
We will demonstrate both below.
To fill missing values with the average value across the entire DataFrame, use the following code:
To fill the missing values within a particular column with the average value from that column, use the following code (this is for column
In this lesson we explored the
fillna methods for dealing with missing data in pandas. After working through some practice problems, we will discuss how to group a DataFrame's elements according to a certain characteristic next.