Explored data and tried analyzing patterns– The dataset includes three main variable inactivity, obesity and diabetes. I observed data in detail with different indicators and social determinants like economics, food access, healthcare, social vulnerability index which is tool used by CDC to spatially identify “at-risk” populations. I also attempted to correlate and identify patterns among multiple data points, including factors like Social Vulnerability Index (SVI).
In class, we discussed relating diabetes with inactivity through this example I learnt more statistical terms like “kurtosis”. Kurtosis is the measure of tailedness of distribution, tails being the tapering ends of the distribution. The second important term discussed today was “heteroskedasticity”. In layman’s terms it can be explained as the “fanning out” of data. Higher heteroskedasticity implies lesser reliable is the model. The test used to determine heteroscedasticity in a linear regression model is “Breusch-Pagan Test”.
Group meeting– Today after a discussion with the group about the possibilities with the dataset, we collectively decided to concentrate on examining the prevalence and underlying factors of inactivity, obesity, and diabetes among rural and urban populations.
I brushed up my python skills and displayed the data with python for urban and rural population depending on the factors Food access and SVI and plotted graph for the same.
I further plan to discuss my findings with my group and instructor.