Waddle We Do Without Penguins: A Data Science Exploration Into Antarctica's Penguins

By: Daisy Liao

penguin%20image.jpeg

Image source: https://www.cabq.gov/artsculture/biopark/news/10-cool-facts-about-penguins

Hello and welcome to my data science project about penguins!

All relevant files and figures in this notebook (i.e. the .ipynb and .html file for this notebook, as well as the graphs) can be found in my public github repo, linked here: https://github.com/dliao1/dliao1.github.io

Onto the project!

Introduction

In today's constantly evolving society, climate change is becoming a growing problem. As carbon dioxide levels rise, so does the rate at which coral reefs die off the oceans. Society continues to grow and produce, sometimes at the cost of the wildlife habitats. Greenhouse gas emmissions and pollution continue to rise with the advent of cars and factories, and global temperatures have also reflected this trend. Many people have also turned their attention to the Earth's North and Southern poles. Every year the Earth's natural glaciers melt at an alarming rate, which further results in the loss of habitat for animals that inhabit these environments.

One such animal consists of penguins. Antarctica is home to many penguin species, such as emperor penguins (as seen in Happy Feet!), king penguins, and macaroni penguins, among others. Although these penguins' habitats have become endangered, conservation efforts have been made to protect the wildlife in Antarctica, such as giving endangered animals homes in zoos and aquariums where they can be studied and nurtured.

One such dataset that exists is called Palmer's Penguins, which contains valuable data of approximately 344 penguins. In this dataset, a number of observations have been recorded about these penguins, such as their island of origin, species, bill length and depth, body mass in grams, and gender. This exploration aims to analyze these differences between three species of penguins: Adelie, Gentoo, and Chinstrap. In it, we will explore how bill lengths, bill depths, flipper lengths, and body mass differ between different penguin species and different genders. Furthermore, we will take a machine learning approach and attempt to use decision trees, k-NN, and random forest classification methods to attempt to classify penguins into different species based upon their demographic attributes, as listed earlier.

This project's ultimate goal to analyze these differences between penguin species and see if machine learning can be applied in this context to identify penguin species' based on their characteristics, which could ultimately lead to more effective penguin identification and conservation in the future. Furthermore, since this dataset includes quantitave measurements like a penguin's bill length/depth, flipper length, and body mass, analyzing these characteristics could lead to more insight into how each penguin's attributes could affect susceptibility to certain penguin-related diseases (E. coli, Avian viruses, etc.), as well how conservations and zoologists could better tailor medical devices to different penguin species.

Data Collection

Here we import all the modules needed for analysis of the data:

Here we read in the data into a dataframe using pandas. The dataset's source is from Kaggle, linked here: https://www.kaggle.com/parulpandey/palmer-archipelago-antarctica-penguin-data?select=penguins_size.csv

As an overview of the variables contained in the dataset, they are as follows:

Next, we need to clean the data! Since this dataset is already from Kaggle, most of the work has already been done for us. However, some of the column names aren't exactly intuitive and we'll be renaming those (i.e. culmen_length instead of bill length). Furthermore, there are some missing values in the dataset, so all rows with missing values will be dropped. To make it easier to create linear regression equations involving a penguin's sex, we will be converted the Sex column to numbers: 0 will represent a male, and 1 will represent a female. We'll also drop the island column since we don't plan on using that in our analysis.

Now that the data's been cleaned sufficiently, we can move onto analysis!

Exploratory Data Analysis

For this data analysis, first we are interested in grouping each penguin by its species and then creating histograms of the penguins' different attributes, split by gender.

Intraspecies Gender Observations

First, let's separate the data into male and female categories, based upon their species. We can make four overlapped histograms per species and see if there's an overall trend in the bill lengths, bill depths, flipper lengths, and body mass of penguins. The male distribution will be plotted in blue and the female distribution will be plotted in pink.

Adelie Penguins

First, let's start with Adelie penguins:

As suspected, it seems that the histograms for male attributes versus female attributes for Adelie penguins differ somewhat. For all eight distributions (four per gender), it seems that the data is more or less unimodal.

Bill Length

Looking at the first histogram comparing male and female Adelie penguin bill lengths, both distributions are unimodal, but the male distribution seems somewhat symmetric, if not slightly left skewed. In contrast, the female distribution seems more right skewed. The peak of male bill lengths is at 41 mm, while the peak of female bill lengths is at around 36 mm. The distance between these peaks suggest that on average, male Adelie penguin bill lengths tend to be longer than female bill lengths.

Bill depth

Similarly, for bill depths, both distributions are unimodal. However, male bill depths tend to more symmetrically distributed while female bill depths are right skewed. The female distribution is centered at 17 mm, while the male distribution is centered at around 19 mm. Similar to the trends we observed in bill length, this also suggests that male bill depth tends to be larger than female bill depths.

Flipper Length

Concerning flipper length, it appears that the male distribution seems to be more left skewed, while the female distribution seems to be more right skewed. The most common male flipper length is 195 mm, whilst the most commonly observed female flipper length is 185 mm. Consistent with the previous observations, this suggests that male flipper length tends to be longer than female flipper lengths. It also seems that male Adelie penguins have a slightly larger range of flipper lengths when compared to their female counterparts.

Body Mass

Lastly, looking at overall body mass in grams, both the male and female distribution seem more or less symmetrically distributed. For the female distribution, there is an overall peak at 3500 grams, while for males, there is a peak at around 4000 grams. This also suggests that male Adelie penguins tend to have more body mass than female Adelie penguins as well.

Chinstrap Penguins

Next, let's look at Chinstrap penguins:

These histograms look very similar to those produced by Adelie penguins, but still differ somewhat.

Bill length

Regarding Chinstrap penguin bill lengths, it looks as if there seems to be an outlier in the female data, with one female penguin having a bill length of 57.5 mm. Aside from that, both distributions look somewhat symmetric. The peak in the female distribution occurs at 47.5 mm while the peak in male bill length occurs at around 50 mm. Just like in Adelie penguins, male Chinstrap penguins seem to have longer bill lengths than female Chinstrap penguins.

Bill depth

For bill depths, the female distribution looks somewhat bimodal, with two peaks at 16 mm and 18 mm, respectively. On the other hand, the male distribution looks more unimodal, with a peak at 20 mm. Given these peaks, it follows that it appears that on average, male bill depths tend to be longer than female bill depths, similar to what was seen in the Adelie penguins.

Flipper length

For flipper length, interestingly enough, both the male and female distributions appear to have peaks at the same measurement, at around 195 mm. However, the female distribution seems to be bimodal, with peaks at 185 mm and 195 mm, while the male distrbution appears more unimodal and centered at 195 mm. Stil, although both distributions have overlapping peaks at 195 mm, the ranges for the female and male distributions still appear to differ: the female flipper lengths range from 180 to 205 mm, while male flipper lengths range from 185 to 210 mm - thus, it still seems that on average, male flipper lengths tend to be longer than those of females.

Body mass

Lastly, regarding body mass, similar to flipper length distributions, the male and female peak body mass in grams appear to occur in similar places, with the most common female body mass occuring at around 3700 grams, while the most common male body mass was around 3800 grams. Both male and female distributions appears to also be unimodal, with one peak. Therefore, it also appears that male penguins tended to have larger body masses than female penguins.

Gentoo Penguins

Next, let's look at gender breakdowns of Gentoo penguins:

Looking at all the graphs together, it seems that Gentoo penguins may have the most differences between the male and female penguins. As seen in Adelie and Chinstrap penguins' graphs, there was a lot of marked overlap between the ranges of the male and female penguins' attributes, but in the case of the Gentoo penguin graphs, it seems that the ranges of the male and female attributes' obersevations seem to have significantly less overlap when compared to the other two species of penguin.

Bill length

For the histogram of bill lengths, it appears that the female distribution is unimodal and symmetric, peaking at 45 mm, while the male distribution appears unimodal and left skewed, with a peak at around 48 mm. It also look as though the range of male bill lengths is much larger than that of females. Thus, we can conclude that the dataset suggests that for Gentoo penguins, male bill length appears to be, on average, longer than that of the females.

Bill depth

For bill depth, both the male and female distributions look more or less symmetric around their peaks. However, whilst the male distrubution is unimodal around 15.7 mm, it appears that the female stribution is somewhat bimodal, with peaks around 14 and 15 mm. This difference in peaks still suggest that male bill depths are on average longer than female bill depths.

Flipper length

Regarding flipper length, the female distribution seems to be unimodal and symmetric, with a peak at around 212 mm, while the male distribution seems to be unimodal and centered at 220 mm. Interestingly enough, the male distribution seems to be very left skewed, with a much larger range of flipper lengths than their female counterparts. This overlapped histogram seems to suggest that on average, male Gentoo penguin flipper lengths are both longer and more varied than those of female Gentoo penguins.

Body mass

Finally, for the histogram displaying male and female body mass in grams, it seems that this histogram demonstrates the least amount of overlap, with both distributions occupying distinct regions of the graph. Namely, the range of female body mass is from 4000 g to 5000 g, whilst the range of male body mass spans 4750 g to 6000 g. However, both distributions look fairly symmetrically distributed, with one peak. The peak on the female distribution occurs at around 4750 g, and the peak for males is at around 5500 g, which suggests that on average, male Gentoo penguins tend to have a larger body mass than female Gentoo penguins.

Interspecies Observations

Now that we've sufficiently analyzed differences between male and female penguins within their own respective species, let's take a look at how each species differs from each other overall. We can already see some differences between histograms (i.e Adelie male penguins had a typical bill length of 41 mm, while Gentoo male penguins have a typical bill length of 48 mm), but creating violin plots comparing each attribute of penguins (bill length/depth, flipper length, body mass) between species will also help illustrate this difference more clearly.

Bill length

Looking at the violin plot, it appears that the Adelie penguin distribution is more unimodal and symmetric around its center, whilst the Chinstrap and Gentoo penguin distrbutions resemble bimodal distributions. Gentoo bill length distribution also looks somewhat right skewed, to a degree. These bimodal qualities could likely be attributed to the differences in center noted when constructing histograms comparing male versus female data in between each species in the previous section. Overall, it seems that Adelie penguins tended to have shorter bills than Chinstrap and Gentoo penguins, with Chinstrap penguins on average having the longest bill lengths. Furthermore, looking at the overall lengths of the violin plots, the graph seems to suggest that Chinstrap and Gentoo penguins had more variation in bill length when compared to Adelie penguins.

Bill depth

This graph seem to be opposite of the violin plot of bill lengths. The Adelie and Chinstrap violin plots seem to be unimodally distributed and somewhat symmetric, while the Gentoo plot seems somewhat bimodal. Furthermore, it appears that Adelie and Chinstrap penguins tended to have larger bill depths (with medians of around 18.5 mm) compared to Gentoo penguins. It also looks like Adelie penguins tended to have the most variation in bill length, being the longest violin plot plotted. It is interesting that, taking into account the bill length plots, Adelie penguins tended to have shorter bill lengths but also longer bill depths, which could suggest a negative relationship between the two. We'll go more into detail about this in the next section.

Flipper length

Both Adelie and Chinstrap penguins have symmetric, unimodal distributions centered from 190 - 200 mm flipper lengths. In contrast, the Gentoo penguin distribution appears to be bimodal and centered at 220 mm. This suggets that Adelie and Chinstrap penguins tended to have shorter flippers, while Gentoo penguins had longer flippers. The lengths of the violin plots also suggest more variation in Adelie and Chinstrap flipper lengths when compared to Gentoo flipper lengths.

Body mass

This violin plot looks quite similar to that of flipper lengths, which suggests that there could be a possible relationship beween the two. Again, Adelie and Chinstrap distrbutions look mostly symmetric and unimodal. Penguins belonging to either species tended to have lower body masses, at around 4000 grams. On the other hand, the Gentoo penguin distribution looks bimodal. Gentoo penguins tended to have higher body masses, at around 5000 grams. On an interesting note, it seems as though the consistently bimodal distributions for Gentoo penguins present in all four of these violin plots are likely due to differents between male and female Gentoo penguins, as observed earlier.

Heatmap of Variables

We've already analyzed gender differences in each species and interspecies differences in attributes. It was noted that a possible positive correlation could exist between certain attributes of penguins, such as between flipper length and body mass, given the similar violin plots. Thus, we've created a heatmap displaying possible relationships variables could have with each other, generated using seaborn.

This heatmap gives high values for a relationship between flipper length and body mass, which confirms earlier suspicions. The heatmap also gives a high negative correlation value between bill depth and flipper length, which is also consistent with the earlier observation that the violin plots for those two attributes seem to be opposites. It is also worth noting that the Sex variable seems to have negative correlations with every noted measurement (i.e. Bill length, depth, etc.). Recall earlier that in the dataset, female penguins were noted with a 1 in the column and males were noted with a 0. Furthermore, in all of the generated gender-specific histograms, females had lower measurements across the board when compared to their male counterparts. Thus, this negative correlation could be due to a general trend where, if a 1 was encountered (i.e. a female), measurements of bill length/depth, flipper length, and body mass seemed to decrease: hence, the noted negative correlation in the heatmap.

Moving on, since the heatmap notes a high correlation value of 0.87 between flipper length and body mass, it's worth conducting a linear regression analysis to attempt to visualize the data and how it could be linearly related. Furthermore, the heatmap also noted fairly high correlation values for flipper length and bill length, and also body mass and bill length, which I've also decided to investigate.

First, let's make scatterplots of all three relationships that we've decided to focus on:

Body Mass vs. Flipper Length

Looking at this graph, most of the data for all three penguins seems to follow a positive linear relationship. That is, for all three species of penguin, as body mass increases, so does flipper length. As observed earlier, there seems to be a lot of similarity in measurements for Adelie and Chinstrap data for these two particular observations. However, there seems to be a clear separation between observations for Adelie/Chinstrap penguins and Gentoo penguins.

Body Mass vs. Bill Length

All three species of penguin data still appear to follow a positive linear relationship. As body mass increases, bill length tends to increase. However, it appears that when compared to their Adelie and Gentoo counterparts, Chinstrap penguins at the same level of body mass tended to have higher bill lengths (shown by the green cluster placed above the blue and red clusters) Similarly, in the first graph, Adelie and Chinstrap data seemed to be interspersed with each other, but in this scatterplot, each species' data seems to occupy separate parts of the graph. When conducting a linear regression analysis, it looks like including an interaction term between species and body mass could lead to better results.

Bill Length vs. Flipper Length

Observing overall trends, as bill length increased, flipper length also tended to increase. Similar to what was observed in the scatterplot showing body mass vs. bill length, there are three marked clusters of data, and Chinstrap penguins seem to occupy a separate area of the scatterplot entirely. Chinstrap penguins tended to have lower flipper lengths when compared to Adelie and Gentoo penguins with the same bill lengths.

Linear Regression Analysis

Body Mass vs Flipper Length

We've chosen to include an interaction term between species and body mass, and due to the nature of statsmodels, the Adelie penguin column has been dropped. This is due to the way statsmodels deals with categorical variables. However, this means that we can interpret the coefficient of body mass (0.0066) as the interaction term between Adelie penguins and body mass.

Putting it all together, when the species is Chinstrap, the flipper length increases 0.0053 times as much as compared to when the species is Adelie. Likewise, when the species is Gentoo, the flipper length increases 0.0027 times as much as compared to when the species is Adelie. The interaction term between Gentoo penguins and body mass being much smaller than that of Chinstrap penguins makes sense - as seen in the scatter plot, Gentoo data appeared to be interspersed with Adelie data, which is consistent with this smaller coefficient.

Looking at the P > t column of the statsmodel results summary, the only two coefficients that have p-values greater than an alpha of 0.05 are Q('Species')[T.Chinstrap] and Q('Species')[T.Gentoo]. However, overall, most of the coefficients in the model are signifacntly different from 0.

From the results, it appears that the R^2 value for this linear regression model is 0.856, which is close to 1 and suggests that there is evidence of a strong linear relationship between flipper length and body mass.

To confirm this, let's look at the violin plot residuals from this model:

Violin Plot of Residuals

This violin plot of residuals by penguin species matches assumptions of the linear regression model well. Each of the violin plots is unimodal, and also clustered around the center, where the median is roughly 0. The ranges of the residuals are also quite large, as seen by the larger range of Adelie penguins. Thus, we can say that this violin plot of residuals matches assumptions of the linear regression model well.

Let's move onto our next relationship between bill length and flipper length:

Bill Length vs Flipper Length

Like what we saw with our first linear regression, the Adelie penguin column has been dropped due to how statsmodel works. Thus we can interpret the coefficient of bill length (0.8139) as the interaction term between Adelie penguins and bill length.

Looking at all our interaction terms together, when the species is Chinstrap, the flipper length increases 0.1934 times as much as compared to when the species is Adelie. Likewise, when the species is Gentoo, the flipper length increases 0.5943 times as much as compared to when the species is Adelie.

Looking at the P > t column of the statsmodel results summary, there are three coefficients that have p-values greater than an alpha of 0.05: Q('Species')[T.Chinstrap] , Q('Species')[T.Gentoo] , and Q('Species')[T.Chinstrap]:Q('Bill length (mm)'). However, all other coefficients are significantly different from 0.

From the results, it appears that the R^2 value for this linear regression model is 0.831, which is close to 1 and suggests that there is evidence of a strong linear relationship between flipper length and bill length.

To confirm this, let's look at the violin plot residuals from this model:

Violin Plot of Residuals

Just like the first plot of residuals, this one looks similar. All of the plots are unimodal and centered around zero, affirming our previous idea that a linear regression model would be appropriate for the data.

Let's move onto our next relationship between bill length and flipper length:

Body Mass vs Bill Length

Thus we can interpret the coefficient of body mass length (0.0032) as the interaction term between Adelie penguins and bill length.

Looking at all our interaction terms together, when the species is Chinstrap, the bill length increases 0.0013 times as much as compared to when the species is Adelie. Likewise, when the species is Gentoo, the flipper length increases 0.0010 times as much as compared to when the species is Adelie.

Looking at the P > t column of the statsmodel results summary, there are many coefficients that have p-values greater than an alpha of 0.05. This could suggest that most of the data follows a general overall trend and an interaction term between Species and Body Mass wasn't necessarily needed.

The R^2 value for this linear regression model is 0.808, which is close to 1 and suggests that there is evidence of a strong linear relationship between body mass and bill length.

To confirm this, let's look at the violin plot residuals from this model:

All of the plots are unimodal and centered around zero, affirming our previous idea that a linear regression model would be appropriate for the data.

Final Remarks

After including an interaction term between a penguin's species and another independent variable, it was found that the data supported evidence of a strong linear relationship between bill length vs. flipper length, body mass vs. flipper length, and body mass vs. bill length, as shown by relatively high R^2 values around 0.8 when we conducted a linear regression analysis. To that end, we can conclude that increases in bill length are correlated with increases in flipper length, and likewise similar positive correlations with the other two relationships investigated.

It is interesting that although the heatmap gave different correlation values for all three relationships, most of the R^2 values generated were around the same number. This is likely due to the inclusion of the interaction term included for Species. In the scatterplots shown earlier, some species of penguin had observations in a different part of the graph, and so by accounting for these species differences with the interaction term, the linear regression models were able to attain similar levels of accuracy.

Next, let's move onto machine learning with our dataset!

Machine Learning

We can use powerful machine learning algorithms to analyze our penguin dataset. In particular, this project will be investigating how three machine learning algorithms for classification: k-NN, Decision Trees, and Random Forests perform on our dataset. We will be using accuracy to score the three previously identified algorithms. This is because we are particularly interested in the percentage of correct predictions of penguin species made based on our predictors.

k-NN Classification

k-NN Setup

The hyperparameters chosen to focus on were n_neighbors and weights.

First a grid search was performed on n_neighbors/weights due to the large impact both can have our model - one of the most important thing for k-Nearest Neighbors is determining the k (i.e. how many nearest neighbors there can be), and the weights parameter can influence whether points closer to the query point have less or greater influence, which would also have a large impact on the model. After performing grid search, it was determined that the k should be set to 1, and weights should be uniform.

Results

After performing grid search and 10-fold cross validation, our k-NN classification has a mean 0.86 accuracy with a standard deviation of 0.05. Dividing our standard deviation by sqrt(10) to find the standard error would give a standard error of 0.0158. This means that on average, our k-NN classification algorithm correctly classified penguins into their species with an accuracy rate of 86%. This number isn't low, but let's run the rest of our classfication algorithms to get a better idea of their accuracy rates as well:

Decision Tree

Decision Tree Setup

In this case, we chose to focus on criterion, splitter, max_features, and max_depth as hyperparameters.

First, criterion could be calculated with either gini or entropy. Since entropy and information gain were explored in class, we wanted to examine if the criterion for calculating how "good" a split was had a large impact on the model's accuracy, thus, we chose to include these in the dictionary of hyperparameters to see if calculating a split based off of gini would be better than using entropy, or vice versa.

The choice of splitter was also considered, as choosing the best split versus choosing the split randomly could have an effect on the accuracy of the model.

Max_depth and max_features were also focused on due to their potential for overfitting (and thus, the subsequent use of GridSearchCV to minimize overfitting).

After running a grid search on these parameters, it was determined that the entropy criterion should be used to determine the best subtrees for splitting, the best split would be chosen, sqrt(n_features) would be considered when splitting, and the max_depth would be 4.

The decision tree has also been printed above.

Results

After performing grid search and 10-fold cross validation, it was caclulated that the decision tree had a mean 0.95 accuracy with a standard deviation of 0.02. Dividing our standard deviation by sqrt(10) to find the standard error would give a standard error of 0.00632. This means that on average, our decision tree classification algorithm correctly classified penguins into their species with an accuracy rate of 95%. This number is quite high and close to 100%, which increases our confidence in using this decision tree to classify penguins in the future.

However, let's proceed with random forest classification to see if we can get a higher accuracy rate:

Random Forest

Setup

In this case, we chose to focus on criterion, max_depth, and max_features as hyperparameters, similar to what was done for our Decision Tree classfier.

First, as mentioned before, criterion was included in our dictionary of hyperparameters to see if calculating a split based off of gini would be better than using entropy, or vice versa.

Max_depth and max_features were also included in the grid search due to their potential for overfitting (as large values could lead to poorer accuracy).

Thus, after running a grid search on these parameters, it was determined that the entropy criterion should be used to determine the best subtrees for splitting, sqrt(n_features) would be considered when splitting, and the max_depth would be 7. This is interesting as the max_depth calculated for our decision tree was 4 instead, but this likely makes sense due to how decision trees work: i.e. it generates many decision trees based off randomly selected attributes and aggregates them together for a final result.

Results

After performing grid search and 10-fold cross validation, it was calculated that the decision tree had a mean 0.98 accuracy with a standard deviation of 0.02. Dividing our standard deviation by sqrt(10) to find the standard error would give a standard error of 0.00632. This means that on average, our random forest classification algorithm correctly classified penguins into their species with an accuracy rate of 98%. This number is quite high and close to 100%, which increases our confidence in using a random forest to classify penguins in the future.

Overall Remarks on Machine Learning

After running three machine learning classification algorithms on the penguin dataset, our k-NN algorithm had an accuracy rate of 0.86 with a standard error of 0.0158, the decision tree had an accuracy of 0.95 with a standard error of 0.00632, and the random forest had an accuracy of 0.98 with a standard error of 0.00632. Among all three algorithms we ran, the random forest seems to have the highest accuracy rate and lowest standard error. This higher accuracy rate is likely due to how random forests are calculated, as instead of creating one decision tree (as seen in our decision tree algorithm), multiple are created and then aggregated.

Conclusion

After doing our exploratory data analysis and machine learning, we can conclude quite a lot from this penguin dataset. From our intraspecies gender analysis using overlapping histograms, it was found that across all three species of penguin, females tended to have shorter bill lengths, bill depths, flipper lengths, and body mass when compared to males. Furthermore, after lookinng at the interspecies violin plots of attributes, it was noted that Adelie penguins tended to have shorter bill lengths and flipper lengths when compared to Chinstrap and Gentoo penguins. Conversely, Gentoo penguins appeared to have the longest bill lengths, flipper lengths, and body mass when compared to Adelie and Chinstrap penguins.

After generating scatterplots and conducting linear regression analysis, it was found that strong evidence of a linear relationship exists for the following relationships: bill length and flipper length, body mass and bill length, and body mass and flipper length. These linear regression lines were calculated using an interaction term between a penguin's species and the other independent variable, as it was noted on the scatterplots that Chinstrap penguins tended to have longer bill lengths at the body mass level of Adelie and Gentoo penguins.

Finally, after conducting machine learning algorithms on the data, it was found that after using k-NN, Decision Trees, and Random Forests, random forests had the highest accuracy rate, at around 0.98.

This project was truly a valuable exercise in learning how to use datasets, clean them, and analyze them for valuable solutions. It highlighted how important data science can be to all areas of life, but in this case, especially towards animal conservation efforts and animal scientists. Studying data on animals and drawing conclusions about correlations between different animal attributes could lead scientists to discover more effective ways to treat animals in cases of illness. Furthermore, such knowledge could even help aid wildlife conservation efforts in the future. For future investigations, it would be interesting to study data on penguin-related disease infection rates and attempt to draw further conclusions using even more data.

Further Reading

If you're interested in reading up more on topics related to this notebook, here are some resources I've found helpful: