Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Please follow the task description. This is a data analytics and transformation task. No need to strictly follow the page requirements or need any referencing at all, as long as the task can be done.

1 answer below »



IT IS VERY IMPORTANT TO READ THE


INSTRUCTIONS!!! THIS IS DOCTORAL WORK. Turnitin and Waypoint are being used to check for plagiarism, and please use APA format. Please pay close attention I NEED INSTRUCTIONS TO BE READ THROUGHLY AND FOLLOWED, to plagiarism, it's not tolerated. make sure to use in-text citations demonstrating that I am citing my references. Please do not use fake references, this instructor will check, and this instructor will check Please keep plagiarism under! 10% or lower. VERY IMPORTANT. Let’s make sure all questions are covered and answered.


Answered 3 days After Sep 02, 2024

Solution

Pratibha answered on Sep 05 2024
6 Votes
Business Analytics Foundation
In this case study, The focus is on exploring and processing student data related to assessments
(PAT, NAPLAN and AE assessment), attendance, school, and student datasets to ensure the
dataset is suitable for analysis. The primary goal is to support teachers in identifying areas for
improvement and track student performance, with particular emphasis on gender balance. Aim
is to identify and solve any data quality issues and build a pivot table to meet the client's specific
needs for insights into student performance and gender representation.
Tasks/Objectives
1. Identifying and addressing the dataset’s data quality issues: You need to explore the
dataset provided by Aginic and identify as many of the possible data quality issues
as you can, then perform transformations to address the issues you have identified.
(10 marks + 5marks for Python coding)
2. Justifying your transformed dataset as fit for use: Consider the information needs of
the various stakeholders and the goals specified above. In light of these, provide
easoning that clarifies why your new dataset is fit for use. (10 marks)
3. Performing knowledge discovery using the transformed data: Make a Pivot Table
showing percentage of different genders enrolled at 'Kirimpika School' in secondary
year levels (8, 9, 10) for the year 2020. (5 marks)
Load the Dataset
import pandas as pd
import os
import warnings
warnings.filterwarnings('ignore')
# Specify the folder for data
data_folder = 'Data'
# Dictionary to hold the DataFrames
dataframes = {}
# Load all CSV files from the folde
for file in os.listdir(data_folder):
if file.endswith('.csv'):
file_path = os.path.join(data_folder, file)
var_name = os.path.splitext(file)[0] # Remove the .csv
extension for the variable name
dataframes[var_name] = pd.read_csv(file_path,
low_memory=False)
# Iterating through the dataframes and display all the data.
for name, df in dataframes.items():
print(f"DataFrame for {name}:")
print(df.head(), "\n")
DataFrame for ae_assessment:
id assessment assessment_id student_id school_id \
0 0 A-E 1 0 0
1 1 A-E 2 0 0
2 2 A-E 3 0 0
3 3 A-E 4 0 0
4 4 A-E 1 0 0
year_level_when_assessed year_when_assessed term_when_assessed
esult
0 8 2021 2
C
1 8 2021 2
C
2 8 2021 2
B
3 8 2021 2
B
4 8 2021 1
A
DataFrame for ae_assessment_mapping:
ae_assessment_id assessment
0 1 English
1 2 Mathematics
2 3 Science
3 4 Health and Physical Education
DataFrame for attendance:
enrolment_id attendance_date_id day_of_week session_no \
0 0_0_2021 2021_2_10_4 Thursday 1
1 0_0_2021 2021_2_10_4 Thursday 2
2 0_0_2021 2021_2_10_3 Wednesday 1
3 0_0_2021 2021_2_10_3 Wednesday 2
4 0_0_2021 2021_2_10_2 Tuesday 1
year_level_when_attended school_id_when_attended
class_when_attended \
0 8 0
Class_A
1 8 0
Class_A
2 8 0
Class_A
3 8 0
Class_A
4 8 0
Class_A
participation_code
0 1
1 1
2 1
3 1
4 1
DataFrame for enrolments:
enrolment_id student_id year_of_enrolment enrolled_year_level \
0 0_0_2021 0 2021 8
1 1_0_2021 1 2021 2
2 2_0_2021 2 2021 12
3 3_0_2021 3 2021 3
4 4_0_2021 4 2021 3
enrolled_school_id enrolled_class flag_cu
ent_enrolment
0 0 Class_A 1
1 0 Class_B 1
2 0 Class_B 1
3 0 Class_C 1
4 0 Class_B 1
DataFrame for naplan_assessment:
id assessment_name domain_id student_id school_id grade_level \
0 0 NAPLAN 1 3 0 3
1 1 NAPLAN 2 3 0 3
2 2 NAPLAN 3 3 0 3
3 3 NAPLAN 4 3 0 3
4 4 NAPLAN 5 3 0 3
year_assessed score
0 2021 588
1 2021 671
2 2021 599
3 2021 611
4 2021 603
DataFrame for naplan_mapping:
assessment domain_id domain
0 NAPLAN 1 Numeracy
1 NAPLAN 2 Gramma
2 NAPLAN 3 Spelling
3 NAPLAN 4 Writing
4 NAPLAN 5 Reading
DataFrame for pat_assessment:
id assessment_code domain_id assessment_name student_id
school_id \
0 0 PAT 1 Mathematics 0
0
1 1 PAT 2 Reading 0
0
2 2 PAT 1 Mathematics 1
0
3 3 PAT 2 Reading 1
0
4 4 PAT 1 Mathematics 3
0
year_level_when_assessed year scale_score
0 8 2021.0 43
1 8 2021.0 50
2 2 2021.0 50
3 2 2021.0 25
4 3 2021.0 23
DataFrame for pat_mapping:
assessment domain_id domain
0 PAT 1 Mathematics
1 PAT 2 Reading
DataFrame for schools:
school_id name location ASGS-RA_2016
0 0 Yi
ikipayi School Pa
ea
a QLD 4575 1
1 1 Ja
akarlani School Longreach QLD 4730 5
2 2 Marntuwunyini School Stokes QLD 4823 5
3 3 Kirluwa
inga School Wairuna QLD 4872 4
4 4 Kirimpika School Consuelo QLD 4702 4
DataFrame for students:
student_id student_name gende
0 0 Student_0 female
1 1 Student_1 F
2 2 Student_2 M
3 3 Student_3 F
4 4 Student_4 F
DataFrame for year_level_mapping:
year_level_id year_level
0 0 Preschool
1 1 Year 1
2 2 Year 2
3 3 Year 3
4 4 Year 4
# Display the first 5 rows of each dataframe to explore the dataset
for name, df in dataframes.items():
print(f"DataFrame: {name}")
print(df.head(), "\n")
DataFrame: ae_assessment
id assessment assessment_id student_id school_id \
0 0 A-E 1 0 0
1 1 A-E 2 0 0
2 2 A-E 3 0 0
3 3 A-E 4 0 0
4 4 A-E 1 0 0
year_level_when_assessed year_when_assessed term_when_assessed
esult
0 8 2021 2
C
1 8 2021 2
C
2 8 2021 2
B
3 8 2021 2
B
4 8 2021 1
A
DataFrame: ae_assessment_mapping
ae_assessment_id assessment
0 1 English
1 2 Mathematics
2 3 Science
3 4 Health and Physical Education
DataFrame: attendance
enrolment_id attendance_date_id day_of_week session_no \
0 0_0_2021 2021_2_10_4 Thursday 1
1 0_0_2021 2021_2_10_4 Thursday 2
2 0_0_2021 2021_2_10_3 Wednesday 1
3 0_0_2021 2021_2_10_3 Wednesday 2
4 0_0_2021 2021_2_10_2 Tuesday 1
year_level_when_attended school_id_when_attended
class_when_attended \
0 8 0
Class_A
1 8 0
Class_A
2 8 0
Class_A
3 8 0
Class_A
4 8 0
Class_A
participation_code
0 1
1 1
2 1
3 1
4 1
DataFrame: enrolments
enrolment_id student_id year_of_enrolment enrolled_year_level \
0 0_0_2021 0 2021 8
1 1_0_2021 1 2021 2
2 2_0_2021 2 2021 12
3 3_0_2021 3 2021 3
4 4_0_2021 4 2021 3
enrolled_school_id enrolled_class flag_cu
ent_enrolment
0 ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Looking For Homework Help? Get Help From Best Experts!

Copy and Paste Your Assignment Here