DataFrame is a collection of heterogeneous or different types of data element which is represented in tabular form that is captioned with rows and columns. Dataframe is mutable or changeable in size and it can be created for any list or dictionary or array or file data via importing pandas library. Following syntax is used to create a dataframe for a dictionary.
DataFrame Creation Code:
# Python program for the dataframe creation
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe)
Output:
Name Branch Gender Age
0 Akash IT Male 24
1 Akshay IT Male 32
2 Bhawna CSE Female 20
3 Chetan CSE Male 38
4 Deepak Mechanical Male 27
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed.
Approaches: How to filter Dataframe
We can filter DataFrame based on column or row name as well as values, using several approaches that are listed below.
- Approach-1: Using Filter Function
- Approach-2: Using Query Function
- Approach-3: Using loc and iloc Function
- Approach-4: Using DataFrame method
Approach-1: Using Filter Function
DataFrame data can be filtered using the Filter function of DataFrame class. With the help of Filter function data may be processed or filtered in many ways.
Selecting column data by name: Following code demonstrates filtering of dataframe column by its name.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe.filter(items=['Name', 'Age']))
Output:
Name Age
0 Akash 24
1 Akshay 32
2 Bhawna 20
3 Chetan 38
4 Deepak 27
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed using item option of filter function.
Selecting column data by regular expression: Following code demonstrates filtering of dataframe column by regular expression.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe.filter(like='Na', axis=1))
Output:
Branch
0 IT
1 IT
2 CSE
3 CSE
4 Mechanical
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed using regular expression option of filter function.
Selecting row data by string: Following code demonstrates filtering of dataframe rows by regular expression.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print(dataframe.filter(like='Na', axis=1))
Output:
Name
0 Akash
1 Akshay
2 Bhawna
3 Chetan
4 Deepak
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed using like option of filter function.
Approach-2: Using Query Function
With the help of the Query function, we can filter the dataframe on the basis of many constraints such as specific column value. Python code to demonstrate dataframe filtering using the query function is shown below.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print( dataframe.query('Name == "Akshay" & Age ==32 '))
Output:
Name Branch Gender Age
1 Akshay IT Male 32
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed using using query function. Query function process or filter data on the basis of student name and age i.e. Akshay and 32.
Approach-3: Using loc and iloc Function
With the help of loc and iloc functions, we can filter dataframe data. Loc function filters dataframe rows on the basis of rows id or name. iloc function filter dataframe rows on the basis of row index or range of row index:
Python code for loc: To demonstrate loc function following python code is written.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
print( dataframe.loc[0]) # filters dataframe through row id 0
print( dataframe.loc[0:2]) # filters dataframe through range of rows with all columns
Output 1:
Name Akash
Branch IT
Gender Male
Age 24
Name: 0, dtype: object
Output 2:
Name Branch Gender Age
0 Akash IT Male 24
1 Akshay IT Male 32
2 Bhawna CSE Female 20
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed using loc functions. Loc function process or filter data on the basis row id or name.
Python code for iloc: To demonstrate iloc function following python code is written.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechnical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
# It filters dataframe through row index 0 to 1 where 2 is excluded.
print( dataframe.loc[0:2])
# It filters the dataframe through a range of rows.
print( dataframe.loc[:4])
# It filters the dataframe through a range of rows with a range of columns.
print( dataframe.iloc[0:2,0:2])
Output 1:
Name Branch Gender Age
0 Akash IT Male 24
1 Akshay IT Male 32
Output 2:
Name Branch Gender Age
0 Akash IT Male 24
1 Akshay IT Male 32
2 Bhawna CSE Female 20
3 Chetan CSE Male 38
Output 3:
Name Branch
0 Akash IT
1 Akshay IT
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- Data of dataframe is displayed using iloc functions. iloc function process or filter data on the basis row index
Approach-4: Using DataFrame method
With the help of the DataFrame method, we can filter data via specifying column names. The python code to demonstrate the dataframe method is given below.
# Python program for filtering dataframe using Filter Function
import pandas as pd
StudentData={ 'Name':['Akash','Akshay','Bhawna', 'Chetan','Deepak'],
'Branch':['IT','IT','CSE','CSE','Mechanical'],
'Gender':['Male','Male','Female','Male','Male'],
'Age':[24,32,20,38,27]}
dataframe=pd.DataFrame(StudentData)
# It filter dataframe through according to column name.
print( dataframe[['Name']])
# It filter dataframe through according to column name i.e. Name and Age where age > 25.
print( dataframe[['Name', 'Age']][dataframe['Age']>25])
# It filters the dataframe through column value and display average of students.
print( 'Average Age of Student is: ',dataframe['Age'].sum()/dataframe['Age'].count())
Output 1:
Name
0 Akash
1 Akshay
2 Bhawna
3 Chetan
4 Deepak
Output 2:
Name Age
1 Akshay 32
3 Chetan 38
4 Deepak 27
Output 3:
Average Age of Student is: 28.2
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named StudentData is created which contains the name, branch, gender and age of students.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(StudentData).
- The first print statement filter dataframe through according to column name i.e. Name. It displays the names of all students.
- The second print statement filter dataframe through according to column name i.e. Name and Age. It displays the name and age of students whose age is greater than 25.
- The last print statement filters the dataframe through column value and displays the average of students.
Best Approach- Using DataFrame method
In the context of numerical computations on data, the DataFrame method is convenient and effective. It filters data via specifying column name. It also enables various numerical functions such as sum(), count(), len() etc.
Sample problem related to filtering dataframe in Python
Sample Problem-1: Using Approach-1
Problem Definition: Create a python code that demonstrates creation of a dictionary which contains data of Product as product name, company name, quantity and price. Load dictionary data into dataframe using pandas library and answer the following question.
Question- To display the quantity and price of each product.
# Python program for the dataframe filtering using approach-1
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'],
'CompanyName':['PeterEngland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
'Quantity':[20,10,3,4,6,2],
'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
print(dataframe.filter(items=['Quantity', 'Price']))
Quantity Price
0 20 1200
1 10 3000
2 3 200
3 4 3500
4 6 4500
5 2 500
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named Product is created which contains the product name, company name, quantity and price of products.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
- Data of dataframe is displayed using the item option of filter function.
Sample Problem-2:Using Approach-2
Consider sample problem-1 and answer the following question.
Question: Find the details of products whose quantity is greater than 5 and price is greater than 1500.
# Python program for the dataframe filtering using approach-1
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'],
'CompanyName':['PeterEngland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
'Quantity':[20,10,3,4,6,2],
'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
print( dataframe.query('Quantity>5 & Price>1500 '))
Output:
ProductName CompanyName Quantity Price
1 Jeans Levis 10 3000
4 Shoes Bata 6 4500
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named Product is created which contains the product name, company name, quantity and price of products.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
- Data of dataframe is displayed using using the query function. Query function filter data of datagrame on the basis of quantity and price. It displays product details whose quantity is greater than 5 and price is greater than 1500.
Sample Problem Using Approach-3
Consider sample problem-1 and answer the following question.
Question: Filter products in a given range i.e start from 0 to end with 2 indexes of row of dataframe.
# Python program for the dataframe filtering using approach-1
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'],
'CompanyName':['PeterEngland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
'Quantity':[20,10,3,4,6,2],
'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
print( dataframe.loc[0:2]) # It filters dataframe through range of rows i.e. row id 0 to 2 (inclusive)with all columns
Output:
ProductName CompanyName Quantity Price
0 Shirt PeterEngland 20 1200
1 Jeans Levis 10 3000
2 Cap Lee Cooper 3 200
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named Product is created which contains the product name, company name, quantity and price of products.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
- Data of dataframe is displayed using the loc function.
Sample Problem Using Approach-4
Consider sample problem-1 and answer the following question.
Question: Find average price of all products using column name of dataframe
# Python program for the dataframe filtering using approach-4
import pandas as pd
Product={ 'ProductName':['Shirt','Jeans','Cap', 'SunGlass','Shoes','Perfume'],
'CompanyName':['Peterengland','Levis','Lee Cooper','RayBan','Bata','WildStone'],
'Quantity':[20,10,3,4,6,2],
'Price':[1200,3000,200,3500,4500,500]}
dataframe=pd.DataFrame(Product)
# It filters the dataframe through column value and display average price of all products
print( 'Average Price of All products is: ',dataframe['Price'].sum()/dataframe['Price'].count())
Output:
Average Price of All products is: 2150.0
Code Explanation:
- Single line comment tells the goal of the program to programmers.
- Imports the pandas library.
- A dictionary named Product is created which contains the product name, company name, quantity and price of products.
- Therefore, a dataframe is created for the dictionary using dataframe=pd.DataFrame(Product).
- Data of dataframe is displayed using dataframe column name. It displays the average price of a product.
Conclusion
Pandas dataframe class is very powerful to process or filter data of different objects such as list, directory, file, tuple etc. It offers many approaches to filter data such as using filter, query, loc, iloc and DataFrame method. Among all, filter data through the DataFrame method is a convenient and effective approach for numerical analysis. In this, all approaches of data filtering have presented with code, its explanation and output.