Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Please follow the task description. This is a data analytics and transformation task. No need to strictly follow the page requirements or need any referencing at all, as long as the task can be done.

1 answer below »
Assessment 3: 3A and 3B Assessment type: Group project - Presentation and Report (2,000 words) Purpose: The purpose of this assessment is that students will learn about different topics relevant to data analytics by carrying out research and also by listening to presentations made by their peers. This assessment contributes to learning outcomes b, c and d Value: Total 60% (Presentation 25%, Report 35%) Due Date: Weeks 11 – 12 Assessment topic: Students to select a data source and suggest a topic of analysis for that data source. Tutors to approve the topic before students proceed with further data preparation and analytics Students will work in groups (minimum 3 and maximum 4 students in each group). The first step will be to identify a data set from one of the publicly available data sets and present the summary of some of target models in Descriptive and Predictive Analytic layers to the tutor for approval. Once approved by their tutor, they will further define the research questions and prepare the data using consolidation and reduction if needed. Next, they will select one of the analytical tools (e.g. Excel, Tableau, Rapid Miner) and apply analytical methods for generating novel findings and draw insights from this data set. These outcomes need to be presented using visualisation models and also need to be explained in a detailed report. Students will present their findings as a group during tutorial sessions in week 11 for a duration of 10-15 mins per group. Tutors will provide feedback on their findings and students will then need to update their findings to reflect this feedback in their group report. Submission of a group report will be due in Week 10. This will be 2,000 words report excluding references and executive summary
Answered 3 days After Sep 02, 2024

Solution

Pratibha answered on Sep 05 2024
5 Votes
Business Analytics Foundation
In this case study, The focus is on exploring and processing student data related to assessments
(PAT, NAPLAN and AE assessment), attendance, school, and student datasets to ensure the
dataset is suitable for analysis. The primary goal is to support teachers in identifying areas for
improvement and track student performance, with particular emphasis on gender balance. Aim
is to identify and solve any data quality issues and build a pivot table to meet the client's specific
needs for insights into student performance and gender representation.
Tasks/Objectives
1. Identifying and addressing the dataset’s data quality issues: You need to explore the
dataset provided by Aginic and identify as many of the possible data quality issues
as you can, then perform transformations to address the issues you have identified.
(10 marks + 5marks for Python coding)
2. Justifying your transformed dataset as fit for use: Consider the information needs of
the various stakeholders and the goals specified above. In light of these, provide
easoning that clarifies why your new dataset is fit for use. (10 marks)
3. Performing knowledge discovery using the transformed data: Make a Pivot Table
showing percentage of different genders enrolled at 'Kirimpika School' in secondary
year levels (8, 9, 10) for the year 2020. (5 marks)
Load the Dataset
import pandas as pd
import os
import warnings
warnings.filterwarnings('ignore')
# Specify the folder for data
data_folder = 'Data'
# Dictionary to hold the DataFrames
dataframes = {}
# Load all CSV files from the folde
for file in os.listdir(data_folder):
if file.endswith('.csv'):
file_path = os.path.join(data_folder, file)
var_name = os.path.splitext(file)[0] # Remove the .csv
extension for the variable name
dataframes[var_name] = pd.read_csv(file_path,
low_memory=False)
# Iterating through the dataframes and display all the data.
for name, df in dataframes.items():
print(f"DataFrame for {name}:")
print(df.head(), "\n")
DataFrame for ae_assessment:
id assessment assessment_id student_id school_id \
0 0 A-E 1 0 0
1 1 A-E 2 0 0
2 2 A-E 3 0 0
3 3 A-E 4 0 0
4 4 A-E 1 0 0
year_level_when_assessed year_when_assessed term_when_assessed
esult
0 8 2021 2
C
1 8 2021 2
C
2 8 2021 2
B
3 8 2021 2
B
4 8 2021 1
A
DataFrame for ae_assessment_mapping:
ae_assessment_id assessment
0 1 English
1 2 Mathematics
2 3 Science
3 4 Health and Physical Education
DataFrame for attendance:
enrolment_id attendance_date_id day_of_week session_no \
0 0_0_2021 2021_2_10_4 Thursday 1
1 0_0_2021 2021_2_10_4 Thursday 2
2 0_0_2021 2021_2_10_3 Wednesday 1
3 0_0_2021 2021_2_10_3 Wednesday 2
4 0_0_2021 2021_2_10_2 Tuesday 1
year_level_when_attended school_id_when_attended
class_when_attended \
0 8 0
Class_A
1 8 0
Class_A
2 8 0
Class_A
3 8 0
Class_A
4 8 0
Class_A
participation_code
0 1
1 1
2 1
3 1
4 1
DataFrame for enrolments:
enrolment_id student_id year_of_enrolment enrolled_year_level \
0 0_0_2021 0 2021 8
1 1_0_2021 1 2021 2
2 2_0_2021 2 2021 12
3 3_0_2021 3 2021 3
4 4_0_2021 4 2021 3
enrolled_school_id enrolled_class flag_cu
ent_enrolment
0 0 Class_A 1
1 0 Class_B 1
2 0 Class_B 1
3 0 Class_C 1
4 0 Class_B 1
DataFrame for naplan_assessment:
id assessment_name domain_id student_id school_id grade_level \
0 0 NAPLAN 1 3 0 3
1 1 NAPLAN 2 3 0 3
2 2 NAPLAN 3 3 0 3
3 3 NAPLAN 4 3 0 3
4 4 NAPLAN 5 3 0 3
year_assessed score
0 2021 588
1 2021 671
2 2021 599
3 2021 611
4 2021 603
DataFrame for naplan_mapping:
assessment domain_id domain
0 NAPLAN 1 Numeracy
1 NAPLAN 2 Gramma
2 NAPLAN 3 Spelling
3 NAPLAN 4 Writing
4 NAPLAN 5 Reading
DataFrame for pat_assessment:
id assessment_code domain_id assessment_name student_id
school_id \
0 0 PAT 1 Mathematics 0
0
1 1 PAT 2 Reading 0
0
2 2 PAT 1 Mathematics 1
0
3 3 PAT 2 Reading 1
0
4 4 PAT 1 Mathematics 3
0
year_level_when_assessed year scale_score
0 8 2021.0 43
1 8 2021.0 50
2 2 2021.0 50
3 2 2021.0 25
4 3 2021.0 23
DataFrame for pat_mapping:
assessment domain_id domain
0 PAT 1 Mathematics
1 PAT 2 Reading
DataFrame for schools:
school_id name location ASGS-RA_2016
0 0 Yi
ikipayi School Pa
ea
a QLD 4575 1
1 1 Ja
akarlani School Longreach QLD 4730 5
2 2 Marntuwunyini School Stokes QLD 4823 5
3 3 Kirluwa
inga School Wairuna QLD 4872 4
4 4 Kirimpika School Consuelo QLD 4702 4
DataFrame for students:
student_id student_name gende
0 0 Student_0 female
1 1 Student_1 F
2 2 Student_2 M
3 3 Student_3 F
4 4 Student_4 F
DataFrame for year_level_mapping:
year_level_id year_level
0 0 Preschool
1 1 Year 1
2 2 Year 2
3 3 Year 3
4 4 Year 4
# Display the first 5 rows of each dataframe to explore the dataset
for name, df in dataframes.items():
print(f"DataFrame: {name}")
print(df.head(), "\n")
DataFrame: ae_assessment
id assessment assessment_id student_id school_id \
0 0 A-E 1 0 0
1 1 A-E 2 0 0
2 2 A-E 3 0 0
3 3 A-E 4 0 0
4 4 A-E 1 0 0
year_level_when_assessed year_when_assessed term_when_assessed
esult
0 8 2021 2
C
1 8 2021 2
C
2 8 2021 2
B
3 8 2021 2
B
4 8 2021 1
A
DataFrame: ae_assessment_mapping
ae_assessment_id assessment
0 1 English
1 2 Mathematics
2 3 Science
3 4 Health and Physical Education
DataFrame: attendance
enrolment_id attendance_date_id day_of_week session_no \
0 0_0_2021 2021_2_10_4 Thursday 1
1 0_0_2021 2021_2_10_4 Thursday 2
2 0_0_2021 2021_2_10_3 Wednesday 1
3 0_0_2021 2021_2_10_3 Wednesday 2
4 0_0_2021 2021_2_10_2 Tuesday 1
year_level_when_attended school_id_when_attended
class_when_attended \
0 8 0
Class_A
1 8 0
Class_A
2 8 0
Class_A
3 8 0
Class_A
4 8 0
Class_A
participation_code
0 1
1 1
2 1
3 1
4 1
DataFrame: enrolments
enrolment_id student_id year_of_enrolment enrolled_year_level \
0 0_0_2021 0 2021 8
1 1_0_2021 1 2021 2
2 2_0_2021 2 2021 12
3 3_0_2021 3 2021 3
4 4_0_2021 4 2021 3
enrolled_school_id enrolled_class flag_cu
ent_enrolment
0 ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Looking For Homework Help? Get Help From Best Experts!

Copy and Paste Your Assignment Here