Solution
Shubham answered on
Jun 04 2024
Learning Activities and Challenge Activities-WK6
[WLO: 2] [CLO: 2]
Prior to beginning work on this learning activity, review the video included in the Development Workspace along with the supporting resources.
The learning activities are designed to provide an immersive, interactive experience that encourages active participation and engagement. These activities are integrated into your zyBooks interactive textbook, Chapter 6 Data Science Programming, and comprise of two main components:
Participation Activities: These are interactive exercises woven into textbook content that encourage active reading and understanding. As you progress through the chapters, you'll encounter various activities that require your input or decision-making, ensuring that you are comprehending and engaging with the material, rather than passively reading.
Challenge Exercises: Beyond comprehension, these activities aim to test your application of the knowledge acquired. Embedded within the textbook content, these exercises will prompt you to solve problems or answer more complex questions related to the topics covered. These challenges will push you to think critically and apply the theories and concepts you've learned in a practical context.
Both activities are designed to enhance your understanding and retention of the course material, while also providing a more engaging and interactive learning experience.
1. PARTICIPATION ACTIVITY 6.1.1: Comparing data science, computer science, and statistics.
import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn2_circles
from matplotlib.animation import FuncAnimation
# Initialize the figure
fig, ax = plt.subplots()
def update(frame):
ax.clear()
if frame == 0:
# Step 1: Computer Scientists
venn2(subsets=(1, 0, 0), set_labels=('Computer Scientists', 'Statisticians'), ax=ax)
ax.annotate('Design websites\nDevelop software\nProtect from hackers\nAnalyze data with algorithms\nBuild data storage tools',
xy=(0.2, 0.5), xycoords='axes fraction', fontsize=10, ha='center', color='blue')
ax.set_title('Computer scientists use programming to design new software and websites, protect computer systems from hackers, implement algorithms, and store data.')
elif frame == 1:
# Step 2: Statisticians
venn2(subsets=(0, 1, 0), set_labels=('Computer Scientists', 'Statisticians'), ax=ax)
ax.annotate('Design experiments\nDerive new models\nAnalyze data using models\nInterpret results',
xy=(0.8, 0.5), xycoords='axes fraction', fontsize=10, ha='center', color='red')
ax.set_title('Statisticians design experiments and apply models to discover trends and patterns in a dataset. Statisticians also derive new models using mathematical techniques.')
elif frame == 2:
# Step 3: Data Scientists
v = venn2(subsets=(1, 1, 1), set_labels=('Computer Scientists', 'Statisticians'), ax=ax)
venn2_circles(subsets=(1, 1, 1), ax=ax)
v.get_label_by_id('10').set_text('Design websites\nDevelop software\nProtect from hackers')
v.get_label_by_id('01').set_text('Design experiments\nDerive new models')
v.get_label_by_id('11').set_text('Analyze data with algorithms\nBuild data storage tools\nAnalyze data using models\nInterpret results')
ax.annotate('Modify and format datasets\nCreate dynamic plots and graphs',
xy=(0.5, 0.2), xycoords='axes fraction', fontsize=10, ha='center', color='green')
ax.set_title('Data scientists use programming to transform data into meaningful information using graphs, algorithms, and models.')
ani = FuncAnimation(fig, update, frames=3, repeat=False, interval=3000)
plt.show()
2. PARTICIPATION ACTIVITY 6.1.2: Data science.
1.Data scientist
2.Computer scientist
3.Statistician
3. PARTICIPATION ACTIVITY 6.1.3: Features and instances.
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.animation import FuncAnimation
# Sample data
data = {
'Species': ['Adelie', 'Chinstrap', 'Gentoo'],
'Bill Length (mm)': [39.1, 48.7, 50.0],
'Body Mass (g)': [3750, 3800, 5000],
'Sex': ['Male', 'Female', 'Male']
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Initialize the figure and axis
fig, ax = plt.subplots()
ax.axis('off') # Turn off the axis
def update(frame):
ax.clear()
ax.axis('off')
if frame == 0:
# Step 1: The first column of a data table is shown
ax.table(cellText=[[df.iloc[i, 0]] for i in range(len(df))],
colLabels=[df.columns[0]], cellLoc='center', loc='center')
ax.set_title('Researchers at the Palmer Archipelago in the Antarctic collected data on three local penguin species: Adelie, Chinstrap, and Gentoo.')
elif frame <= len(df):
# Step 2: Rows are revealed one by one
ax.table(cellText=[[df.iloc[i, j] for j in range(1)] + [df.iloc[i, 0]] for i in range(frame)],
colLabels=[df.columns[0]], cellLoc='center', loc='center')
ax.set_title('Each individual penguin in this dataset is an instance. In the dataset, each row represents a different instance.')
elif frame <= len(df) + 1:
# Step 3: Column headers are revealed
ax.table(cellText=df.values,
colLabels=df.columns, cellLoc='center', loc='center')
ax.set_title('Each characteristic of a penguin, such as bill length, body mass, and sex, is a feature. In the dataset, each column represents a different feature.')
ani = FuncAnimation(fig, update, frames=len(df) + 2, repeat=False, interval=2000)
plt.show()
4. PARTICIPATION ACTIVITY 6.1.4: Features and instances.
1.Transactions: Instance
2.Sales amount: Feature
3.Type of store: Feature
5. PARTICIPATION ACTIVITY 6.1.5: Big data at Twitter.
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import matplotlib.image as mpimg
# Initialize the figure and axis
fig, ax = plt.subplots()
def update(frame):
ax.clear()
ax.axis('off')
if frame == 0:
# Step 1: The Twitter logo appears
img = mpimg.imread('twitter_logo.png') # Ensure you have the Twitter logo image
ax.imshow(img)
ax.set_title('In early 2022, Twitter had 200 million active daily users and about 1.3 billion accounts. These accounts generated big data.', fontsize=10)
elif frame == 1:
# Step 2: The header Volume appears
ax.text(0.5, 0.9, 'Volume', ha='center', va='center', fontsize=14, color='blue')
ax.text(0.5, 0.7, 'Daily and annual tweet volume is displayed under the Volume header.', ha='center', va='center', fontsize=10)
ax.text(0.5, 0.5, 'Daily usage: 12 TB (about 6 MacBook Pros)', ha='center', va='center', fontsize=10)
ax.text(0.5, 0.3, 'Annual usage: 4.3 PB (about 2,100 MacBook Pros)', ha='center', va='center', fontsize=10)
for i in range(6):
ax.text(0.2 + i * 0.1, 0.2, '?', ha='center', va='center', fontsize=20)
ax.set_title('Storing new tweets takes about 12 terabytes (TB) per day, or the storage of about six MacBook Pros. Storing all tweets posted in a year takes 4.3 petabytes (PB), or about 2,100 MacBook Pros.', fontsize=10)
elif frame == 2:
# Step 3: The header Variety appears
ax.text(0.5, 0.9, 'Variety', ha='center', va='center', fontsize=14, color='green')
ax.text(0.5, 0.7, 'A sketch of a tweet appears under the Variety header, with each icon revealed one at a time.', ha='center', va='center', fontsize=10)
ax.text(0.5, 0.5, '? Tweet\n#️⃣ Hashtag\n? Image\n❤️ Like\n? Share\n➕ Follow\n? Block\n? Re-tweet\n? Comment', ha='center', va='center', fontsize=10)
ax.set_title('Twitter users do more than tweet. User events on Twitter include tweets, hashtags, images, likes, shares, follows, blocks, re-tweets, and comments.', fontsize=10)
elif frame == 3:
# Step 4: The header Velocity appears
ax.text(0.5, 0.9, 'Velocity', ha='center', va='center', fontsize=14, color='red')
ax.text(0.5, 0.7, 'A large number of comment, heart, and retweet icons appear under the Velocity heading.', ha='center', va='center', fontsize=10)
for i in range(10):
ax.text(0.1 + i * 0.1, 0.5, '?', ha='center', va='center', fontsize=15)
ax.text(0.1 + i * 0.1, 0.4, '❤️', ha='center', va='center', fontsize=15)
ax.text(0.1 + i * 0.1, 0.3, '?', ha='center', va='center', fontsize=15)
ax.set_title('Twitter users create approximately 400 billion events per day.', fontsize=10)
ani = FuncAnimation(fig, update, frames=4, repeat=False, interval=3000)
plt.show()
6. PARTICIPATION ACTIVITY 6.1.6: Big data in healthcare.
Electronic health records contain data on patient measurements, test results, medical history, image scans, and other characteristics.
Variety
Wearable devices like a smartwatch can track a patient's exercise, heart rate, and sleeping habits. Data from these devices are sent to a patient's doctor or care team in real time.
Velocity
UnitedHealth Group provides insurance for nearly 50 million customers. As part of providing insurance coverage, UnitedHealth Group manages medical records and claims data for each individual customer.
Volume
7. PARTICIPATION ACTIVITY 6.1.7: How big is big data?
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Data for the animation
years = [1986, 1993, 2000, 2007, 2020]
data_sizes = [2.6, 15.8, 54.5, 295, 6800] # in Exabytes (EB)
captions = [
"Big data is really, really big. In 1986, the total estimated data in the world was 2.6 exabytes (EB).",
"One exabyte (EB) equals 1 million TB. Most laptops come with 1 TB storage at most.",
"By 1993, the total estimated data in the world had grown to 15.8 EB.",
"In 2000, the total estimated data in the world had reached 54.5 EB.",
"In 2007, the total estimated data in the world was 295 EB.",
"By 2020, the total estimated data had increased to 6800 EB, or 6.8 PB - the equivalent of over 7 trillion laptop computers."
]
# Initialize the figure and axis
fig, ax = plt.subplots()
def update(frame):
ax.clear()
ax.set_xlim(1985, 2025)
ax.set_ylim(0, 7000)
ax.set_xlabel('Year')
ax.set_ylabel('Data Size (Exabytes)')
ax.set_title(captions[frame])
if frame == 0:
ax.bar(years[frame], data_sizes[frame], width=2, color='blue')
else:
ax.bar(years[:frame + 1], data_sizes[:frame + 1], width=2, color='blue')
for i, size in enumerate(data_sizes[:frame + 1]):
ax.text(years[i], size + 200, f"{size} EB", ha='center')
ani = FuncAnimation(fig, update, frames=6, repeat=False, interval=3000)
plt.show()
8. PARTICIPATION ACTIVITY 6.1.8: Big data volume.
1.Datasets may be too large to store on a personal computer.
2.Spreading portions of datasets across different locations and servers
9. PARTICIPATION ACTIVITY 6.1.9: Reproducible analysis.
import matplotlib.pyplot as plt
from matplotlib.offsetbox import TextArea, DrawingArea, OffsetImage, AnnotationBbox
from matplotlib.animation import FuncAnimation
# Initialize the figure and axis
fig, ax = plt.subplots()
ax.set_xlim(0, 6)
ax.set_ylim(0, 1)
ax.axis('off')
# Load
ain icon image
ain_icon = plt.imread('
ain_icon.png') # Ensure you have the
ain icon image
# Define the coordinates for the
ain icons
ain_coords = [(0.2 + i * 0.4, 0.5) for i in range(10)]
# Define the captions for each step
captions = [
"Two data scientists are building models to classify
ain tumors as benign or malignant. Both data scientists start with a set of 10
ain scans.",
"One data scientist uses a programming language, such as Python, to write code to fit the model.",
"Another data scientist uses software instead of coding to fit the model.",
"Later, new
ain scans a
ive.",
"The...