How To Filter A Dataframe In Python

DataFrame is a collection of heterogeneous or different types of data element which is represented in tabular form that is captioned with rows and columns. Dataframe is mutable or changeable in size and it can be created for any list or dictionary or array or file data via importing pandas library. Following syntax is used to create a dataframe for a dictionary.

DataFrame Creation Code:

# Python program for the dataframe creation
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe)

Output:

    Name     Branch   Gender   Age
0  Akash         IT        Male      24
1  Akshay         IT       Male       32
2  Bhawna        CSE    Female    20
3  Chetan        CSE     Male        38
4  Deepak  Mechanical   Male      27

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed.

Approaches: How to filter Dataframe

We can filter DataFrame based on column or row name as well as values, using several approaches that are listed below.

  • Approach-1: Using Filter Function
  • Approach-2: Using Query Function 
  • Approach-3: Using loc and iloc Function 
  • Approach-4: Using DataFrame method

Approach-1: Using Filter Function

DataFrame data can be filtered using the Filter function of DataFrame class. With the help of Filter function data may be processed or filtered in many ways.

Selecting column data by name: Following code demonstrates filtering of dataframe column by its name.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe.filter(items=['Name', 'Age']))

Output:

Name    Age
0   Akash   24
1  Akshay   32
2  Bhawna   20
3  Chetan   38
4  Deepak   27

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed using item option of filter function.

Selecting column data by regular expression: Following code demonstrates filtering of dataframe column by regular expression.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe.filter(like='Na', axis=1))

Output:

Branch
0         IT
1         IT
2        CSE
3        CSE
4  Mechanical

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed using regular expression option of filter function.

Selecting row data by string: Following code demonstrates filtering of dataframe rows by regular expression.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe.filter(like='Na', axis=1))

Output:

Name
0   Akash
1  Akshay
2  Bhawna
3  Chetan
4  Deepak

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed using like option of filter function.

Approach-2: Using Query Function

With the help of the Query function, we can filter the dataframe on the basis of many constraints such as specific column value.  Python code to demonstrate dataframe filtering using the query function is shown below.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print( dataframe.query('Name == "Akshay" & Age ==32 '))

Output:

Name Branch Gender  Age
1  Akshay     IT   Male   32

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed using using query function. Query function process or filter data on the basis of student name and age i.e. Akshay and 32.

Approach-3: Using loc and iloc Function

With the help of loc and iloc functions, we can filter dataframe data. Loc function filters dataframe rows on the basis of rows id or name. iloc function filter dataframe rows on the basis of row index or range of row index:

Python code for loc: To demonstrate loc function following python code is written.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print( dataframe.loc[0]) # filters dataframe through row id 0

print( dataframe.loc[0:2]) # filters dataframe through range of rows with all columns

Output 1:

Name      Akash
Branch       IT
Gender     Male
Age          24
Name: 0, dtype: object

Output 2:

Name Branch  Gender  Age
0   Akash     IT    Male   24
1  Akshay     IT    Male   32
2  Bhawna    CSE  Female   20

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed using  loc functions. Loc function process or filter data on the basis row id or name.  

Python code for iloc: To demonstrate iloc function following python code is written.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
         	'Branch':['IT','IT','CSE','CSE','Mechnical'],
         	'Gender':['Male','Male','Female','Male','Male'],
         	'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
# It filters dataframe through row index 0 to 1 where 2 is excluded.
print( dataframe.loc[0:2])
# It filters the dataframe through a range of rows.
print( dataframe.loc[:4])
# It filters the dataframe through a range of rows with a range of columns.
print( dataframe.iloc[0:2,0:2])

Output 1:

Name Branch Gender  Age
0   Akash     IT   Male   24
1  Akshay     IT   Male   32

Output 2:

Name Branch  Gender  Age
0   Akash     IT    Male   24
1  Akshay     IT    Male   32
2  Bhawna    CSE  Female   20
3  Chetan    CSE    Male   38

Output 3:

Name Branch
0   Akash     IT
1  Akshay     IT

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • Data of dataframe is displayed using  iloc functions. iloc function process or filter data on the basis row index 

Approach-4: Using DataFrame method

With the help of the DataFrame method, we can filter data via specifying column names. The python code to demonstrate the dataframe method is given below.

# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'], 
             'Branch':['IT','IT','CSE','CSE','Mechanical'],
             'Gender':['Male','Male','Female','Male','Male'],
             'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
# It filter dataframe through according to column name.
print( dataframe[['Name']]) 
# It filter dataframe through according to column name i.e. Name and Age where age > 25.
print( dataframe[['Name', 'Age']][dataframe['Age']>25]) 
# It filters the dataframe through column value and display average of students.
print( 'Average Age of Student is: ',dataframe['Age'].sum()/dataframe['Age'].count())

Output 1:

Name
0   Akash
1  Akshay
2  Bhawna
3  Chetan
4  Deepak

Output 2:

Name  Age
1  Akshay   32
3  Chetan   38
4  Deepak   27

Output 3:

Average Age of Student is:  28.2

Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named StudentData is created which contains the name, branch, gender and age of students.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
  • The first print statement  filter dataframe through according to column name i.e. Name. It displays the names of all students.
  • The second print statement  filter dataframe through according to column name i.e. Name and Age. It displays the name and age of students whose age is greater than 25.
  • The last print statement filters the dataframe through column value and displays the average of students.

Best Approach- Using DataFrame method

In the context of numerical computations on data, the DataFrame method is convenient and effective. It filters data via specifying column name. It also enables various numerical functions such as sum(), count(), len() etc.

Sample problem related to filtering dataframe in Python

Sample Problem-1: Using Approach-1

Problem Definition: Create a python code that demonstrates creation of a dictionary which contains data of Product as product name, company name, quantity and price. Load dictionary data into dataframe using pandas library and answer the following question.

Question- To  display the quantity and price of each product.

# Python program for the dataframe filtering using approach-1
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'], 
             'CompanyName':['PeterEngland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
             'Quantity':[20,10,3,4,6,2],
             'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
print(dataframe.filter(items=['Quantity', 'Price']))
Quantity  Price
0        20   1200
1        10   3000
2         3    200
3         4   3500
4         6   4500
5         2    500

Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named Product is created which contains the product name, company name, quantity and price of products.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
  • Data of dataframe is displayed using  the item option of filter function.

Sample Problem-2:Using Approach-2

Consider sample problem-1 and answer the following question.

Question: Find the details of products whose quantity is greater than 5 and price is greater than 1500.

# Python program for the dataframe filtering using approach-1
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'], 
             'CompanyName':['PeterEngland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
             'Quantity':[20,10,3,4,6,2],
             'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
print( dataframe.query('Quantity>5 &  Price>1500 '))

 Output:

ProductName CompanyName  Quantity  Price
1       Jeans       Levis        10   3000
4       Shoes        Bata         6   4500

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named Product is created which contains the product name, company name, quantity and price of products.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
  • Data of dataframe is displayed using   using the query function. Query function filter data of datagrame on the basis of quantity and price. It displays product details whose quantity is greater than 5 and price is greater than 1500.

Sample Problem Using Approach-3

Consider sample problem-1 and answer the following question.

Question: Filter products in a given range i.e start from 0 to end with 2 indexes of row of dataframe.

# Python program for the dataframe filtering using approach-1
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'], 
             'CompanyName':['PeterEngland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
             'Quantity':[20,10,3,4,6,2],
             'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
print( dataframe.loc[0:2]) # It filters dataframe through range of rows i.e. row id 0 to 2 (inclusive)with all columns

Output:

ProductName   CompanyName  Quantity  Price
0       Shirt  PeterEngland        20   1200
1       Jeans         Levis        10   3000
2         Cap    Lee Cooper         3    200

 Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named Product is created which contains the product name, company name, quantity and price of products.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
  • Data of dataframe is displayed using the loc function.

Sample Problem Using Approach-4

Consider sample problem-1 and answer the following question.

Question: Find average price of all products using column name of dataframe

# Python program for the dataframe filtering using approach-4
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'], 
             'CompanyName':['Peterengland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
             'Quantity':[20,10,3,4,6,2],
             'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
# It filters the dataframe through column value and display average price of all products
print( 'Average Price of All products is: ',dataframe['Price'].sum()/dataframe['Price'].count()) 

Output:

Average Price of All products is:  2150.0

Code Explanation:

  • Single line comment tells the goal of the program to programmers.
  • Imports the pandas library.
  • A dictionary named Product is created which contains the product name, company name, quantity and price of products.
  • Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
  • Data of dataframe is displayed using dataframe column name. It displays the average price of a product.

Conclusion

Pandas dataframe class is very powerful to process or filter data of different objects such as list, directory, file, tuple etc. It offers many approaches to filter data such as using filter, query, loc, iloc and DataFrame method. Among all, filter data through the DataFrame method is a convenient and effective approach for numerical analysis. In this, all approaches of data filtering have presented with code, its explanation and output.