In our class discussion today, we explored the concept of p-value. To summarize my understanding, a p-value signifies the probability of an event occurring under the assumption that the null hypothesis is valid. In practical terms, when the p-value is small, it provides strong evidence to reject the null hypothesis, while a larger p-value suggests that it might be reasonable to retain the null hypothesis for further consideration.
Regarding project, after digging deeper into the data, I had a lot of questions. Within our group, we’ve currently focused our attention on identifying health disparities between rural and urban populations. This led me to contemplate which factors, aside from ‘overall SVI’ (Social Vulnerability Index), are crucial for our analysis.
After a thoughtful discussion with one of my fellow group members, we decided to work with datasets encompassing various social determinants of health with respect to urban and rural status of the counties. Specifically, I embarked on a task to compare and correlate the percentage of food access with obesity data within urban and rural counties.
I started with formatting datasets in a way it will be relevant to my goal this included merging the two datasets containing food access and obesity data using Python. Further after performing linear- regression I found the r-squared value to 0.0295 which implies that the independent variables in our model exhibit little to no explanatory power, indicating that the model’s fit is far from ideal.
As part of my ongoing efforts, I plan to enhance the model’s performance. This will involve the removal of outliers from the dataset and the incorporation of additional factors to consider. The aim is to refine our analysis and achieve more meaningful insights into the health disparities between rural and urban populations.