COURSE: CSCE 155T

Introduction to Computer Science I: Informatics

Designing Methods to Estimate US Poverty Data Using Spatial Interpolation and Ethical Considerations

Course Description:

This 16-week introductory course offers a foundation in computer science and data science, equipping students with essential programming, algorithmic thinking, and problem-solving skills. Using Python as the primary programming language, the course covers fundamental programming constructs, data structures, file handling, and an introduction to data visualization techniques. By applying these skills to real-world datasets and problems, students gain practical experience and develop interdisciplinary connections. Designed for students without prior programming experience, the course aims to create a supportive learning environment that inspires students to further hone their computational abilities and critical thinking skills.


Module Topic:Designing Methods to Estimate US Poverty Data Using Spatial Interpolation and Ethical Considerations


Note: This module focuses on the technical aspects and hands-on exercises related to spatial interpolation and data-driven decision-making. It builds upon the foundation established in previous lectures, where guest a speaker from the Department of Philosophy provided an introduction to ethical considerations in computer science applications.

Acknowledgment: This module was co-developed by Colton Harper and Zachariah Wrublewski.

Module Overview:

In this module, students work with an interactive Python notebook that guides them through the process of designing, coding, and evaluating a method to estimate US poverty data using spatial interpolation techniques. The primary focus is on exploring the ethical considerations involved in designing and implementing data-driven solutions, as well as understanding the implications of these decisions on various stakeholders.


Learning Objectives:

Through hands-on coding exercises, collaborative problem-solving, and reflective discussions, students will:
  1. Enhance students' understanding of the ethical considerations in using spatial interpolation methods for estimating poverty data in real-world scenarios.
  2. Encourage students to think critically about the potential consequences of their methodological choices for various stakeholders, including direct and indirect ones.
  3. Reinforce students' ability to apply spatial interpolation techniques while considering the ethical dimensions of their work.

Key Questions:

  • What ethical considerations should be taken into account when using spatial interpolation methods to estimate poverty data?
  • How do the choices made in designing a method impact various stakeholders, both directly and indirectly?
  • How can students balance the need for accurate estimations with the ethical implications of their work?

Key Concepts:

  • Ethical dimensions in spatial interpolation and data-driven decision-making

Identifying and addressing the needs of direct and indirect stakeholders

  • Evaluating trade-offs and potential consequences of methodological choices in a design/development context
  • Balancing accuracy, fairness, and transparency in data-driven solutions
  • Developing a responsible and informed approach to the design and implementation of algorithms and data analysis methods
  • Navigating ethical dilemmas and complexities in computer science applications

Jump To Scenario

Interpolation of Census Data¶

Interpolation content and illustrations adapted from: https://gisgeography.com/inverse-distance-weighting-idw-interpolation/

Utility Functions¶

All the utility functions and classes used in this notebook have been organized and stored in a separate Python file called poverty_interpolation_utils.py. This file contains the following classes: DataDownloader, CountyData, CountyPlotter, CensusData, SamplingMethods, IDWInterpolation, ErrorCalculator, and ErrorVisualizer.

You can find this file in the same directory as this notebook. If you need to modify any of the existing classes or functions, you can do so directly in the poverty_interpolation_utils.py file. Make sure to save your changes in the file before running the notebook again.

Alternatively, if you prefer to create new functions or modify existing ones directly in the notebook, you can do so by adding new code cells and defining your functions there. Remember to import any necessary modules or classes in the notebook as needed.

Keeping the utility functions and classes in a separate file, the notebook is a bit more organized, clean, and focused.

In [1]:
import pandas as pd
import plotly.express as px
import numpy as np
import math
import requests
from urllib.request import urlopen
import json
import os
import matplotlib.pyplot as plt

from poverty_interpolation_utils import DataDownloader, CountyData, CountyPlotter, CensusData, SamplingMethods, IDWInterpolation, ErrorCalculator, ErrorVisualizer
censusVar = 'DP03_0120PE'

¶

Scenario:¶


Background:

  • The year is 2018, and you have been hired as a consultant for the US Census Bureau. The American Community Survey (ACS), which typically comes out every five years, is facing budget constraints that prevent a full-scale survey. As a result, the Census Bureau can only survey 50% of US counties for poverty data.
  • Your task is to provide the Census Bureau with suggestions on how they should conduct the survey and recommend a method to supplement the data to best estimate the percent of families in poverty in the counties that were not selected to be surveyed.

Approach Overview:

  1. Determine the poverty indicator to be used in the survey.
  2. Select the counties to be surveyed (limited to 50% of the total counties).
  3. Use spatial interpolation methods to estimate the poverty rate in the unsampled counties.
  4. Design a performance measure to evaluate the accuracy of the estimations and the implications of the chosen method.
  5. Optimize the method to improve performance while considering ethical implications.

Use Case/Implications:

  • The US Census Bureau will disseminate the results to donors and government officials, who will use the information to develop policies to aid areas with higher poverty rates. The Census Bureau has not provided any additional information regarding the use of the results.

Approach Overview¶

Our Goal: Design a method to sample 50% of Nebraska counties to obtain the percentage of households in poverty. Then, use spatial interpolation methods to estimate the poverty rate in the remaining (unsampled) 50% of Nebraska counties.

Call to Action: As developers, we need to understand the elements of our design that can impact the performance of our method. Let's explore some essential design considerations and their tradeoffs.

General Approach:

  1. Identify the poverty indicator for the survey
  2. Select the counties to survey (50% of total counties)
  3. Estimate poverty values for unsampled counties using spatial interpolation
  4. Design a performance measure to evaluate our estimations
  5. Optimize our method to improve performance

1. Identify the poverty indicator for the survey¶


Though there are numerous poverty indicators, we'll focus on a single measure for the scope of this project.

We will use the following variable:

  • DP03_0120PE represents the county-level data for the percentage of families and people whose income in the past 12 months is below the poverty level. Specifically, it corresponds to impoverished households with related children of the householder under 18 years.
    • This variable is more closely related to child poverty in Nebraska.
    • More information about this variable can be found here.

1.b) Obtain Census Data for the Given Variable¶

Let's start by importing the census data.

We will fetch the ACS data for 2015 and 2020.

  • We will sample from the ACS 2020 data once we decide on a sampling method to test. Then, we can compare the interpolated values with the actual values.
  • We can use the ACS 2015 data to inform some of our design decisions, such as choosing a more informed sampling method.
In [2]:
# Instantiate the Classes
data_downloader = DataDownloader()
county_data = CountyData(data_downloader)
census_data = CensusData(county_data)
county_plotter = CountyPlotter()
In [3]:
# Download the necessary data using the instances:
data_downloader.download_fips_data()
data_downloader.download_county_centers()
county_plotter.download_geojson()
In [4]:
# Get Census Datasets
censusData2015 = census_data.getCensusPovertyDataByYear_ne("2015")
censusData2020 = census_data.getCensusPovertyDataByYear_ne("2020")
censusData2020.head()
Out[4]:
DP03_0120PE state county fips_id Latitude Longitude countyName stateName
0 3.2 31 179 31179 42.210746 -97.126243 Wayne County Nebraska
1 3.2 31 089 31089 42.459287 -98.784766 Holt County Nebraska
2 3.4 31 081 31081 40.877145 -98.021943 Hamilton County Nebraska
3 3.6 31 039 31039 41.915865 -96.788517 Cuming County Nebraska
4 3.9 31 165 31165 42.483806 -103.742605 Sioux County Nebraska

Visualizing Nebraska's 2015 Poverty Data¶

Let's take a quick look at Nebraska's poverty data for 2015.

We will visualize the data as a heatmap, where counties with high poverty rates (25%+) will appear in bright yellow-green or green. In contrast, counties with low poverty rates (5% and below) will be represented in dark blue.

In [5]:
import plotly.express as px
import plotly.io as pio
from IPython.display import display, Image

print("Percent of Houses Under the Poverty Line Across Counties in Nebraska")
fig = county_plotter.plotCountyData_ne(censusData2015)
img_bytes = pio.to_image(fig, format="png")  # Convert the figure to an image in memory (PNG format)
Image(img_bytes) 
Percent of Houses Under the Poverty Line Across Counties in Nebraska
Out[5]:

2. Sampling: Determine which counties to survey¶

Remember, we can only survey (sample from) 50% of Nebraskan counties. Which ones should we sample? Let's consider a few approaches.

Method #1¶

  • Random Sampling
    • This method is very straightforward. We can consider sampling any random 50% of counties in Nebraska.

Method #2¶

  • Can we do better than random sampling? Well, we'll have to think about it and test it out.
  • We have the 2015 data, which seems like it might correlate somewhat with the 2020 data. Perhaps we could reference the 2015 data to make an informed decision on which counties to sample.

Sample the 50% of counties that previously had the highest poverty rates in the last survey

  • The data from our methods will be used to distribute resources to help people in poverty. So, what if we identify the counties from 2015 that had the highest poverty rates and use those counties as our samples?

Method #3¶

Representative Sample

  • Let's also try sampling in a way where we try to get counties that have the highest poverty rates and the lowest poverty rates.
  • Based on the 2015 data, we can sample 25% of the counties that have the highest poverty rates.
  • We can also sample the 25% of counties that have the lowest poverty rates.

Other¶

There are many more ways to sample the data, many of which will likely lead to better results than the above sampling methods. We encourage you to think of some additional sampling methods.

  • Can you identify any better sampling methods?
  • Can you identify other sampling methods you should clearly steer clear from?
In [6]:
# Instantiate the `SamplingMethods` class:
sampling_methods = SamplingMethods()

# Modify the code to use the class instance and its methods:
sample_1 = sampling_methods.getHalfCounties_random(censusData2020)
sample_2 = sampling_methods.getHalf_highestPovCounties(censusData2015, censusData2020)
sample_3 = sampling_methods.get25PercLowestPov_25PercHighestPov(censusData2015, censusData2020)

Visualize Samples¶

Let's take a look at one of the sample set values plotted on a map.

Here, we'll consider sample_1, our random sample:

In [7]:
from IPython.display import Image, display

fig = county_plotter.plotCountyData_ne(sample_1)
img_bytes = pio.to_image(fig, format="png")  # Convert the figure to an image in memory (PNG format)
Image(img_bytes) 
Out[7]:

Task: Change the above code from sample_1 to one of the other two samples.¶

3.a) Interpolation Background: Use a method to estimate the poverty value of the counties we don't have samples for¶

We have sampled 50% of the counties in Nebraska. The remaining 50% of counties we must estimate. We can do so (with varying degrees of error) using interpolation methods.

Interpolation

  • When you are given a set of known values, interpolation helps you estimate the unknown values.
  • You may find an illustration of a simple case of "linear" interpolation below, where the red dots are the known values, and we are trying to estimate a value in between.
In [8]:
from IPython.display import Image, display
print("Linear Interpolation")
display(Image("https://gisgeography.com/wp-content/uploads/2016/05/Linear-Interpolation-2.png"))
Linear Interpolation

Spatial Interpolation¶

Spatial interpolation is a similar method, but applied to higher dimensional data.

Examples where it makes sense to apply spatial interpolation methods include:

  • Estimating the rainfall in various neighborhoods when you have some rainfall data of the surrounding neighborhoods.
    • It makes sense to apply spatial interpolation here because the data is spatially correlated. That is, it’s more likely to rain 1 meter away compared to 500 meters away.

Inverse Distance Weighting (IDW)¶

There are many spatial interpolation algorithms to choose from, e.g., IDW, kriging, spline, etc.

IDW is a simple spatial interpolation algorithm that is often used. We will only consider IDW for this lab.

How IDW Works:

IDW allows you to estimate one point by conducting a weighted average of the neighboring points around it. The farther away a point is, the less it contributes to the estimate.

There are two main settings to IDW that you can change to improve your estimations.

  • Number of Neighbors The number of neighbors to consider in your estimation. The default tends to be about 5 neighbors.
  • Power The power governs how much neighboring points of different distances impact the value of the estimated point.
    • A lower power allows farther points to have a higher impact on the estimated point
    • A high power makes neighbors farther away impact the estimated point less
    • The default power tends to be 2

The image below illustrates a plane with four known values (in red) and a value we want to interpolate (in purple). In this illustration, the estimate of the unknown point will be some weighted average of 3 of its closest neighbors. We can vary the number of neighbors we consider in pursuit of better results.

In [9]:
print("Inverse Distance Weighting with 3 points")
display(Image("https://gisgeography.com/wp-content/uploads/2016/05/IDW-3Points.png"))
Inverse Distance Weighting with 3 points

Power Setting: 1

Immediately below, you'll find an illustration where an unknown point is found using a 3-neighbor IDW with a power of 1. The image following, illustrates the same data and settings, except the power is set to 2.

While it doesn't make a huge impact in this case, you can see that in the case of power = 1, the estimated value is lower because the points farther away are having more impact.

In the latter illustration, when power = 2, the estimated value is greater because the point closer to it is contributing more to the weighted average.

In [10]:
print("Illustration of spatial interpolation with a power of 1")
display(Image("https://gisgeography.com/wp-content/uploads/2016/05/IDW-Power1.png"))

print("Illustration of spatial interpolation with a power of 2")
display(Image("https://gisgeography.com/wp-content/uploads/2016/05/IDW-Power2.png"))

print("Inverse Distance Weighting formula")
display(Image("https://gisgeography.com/wp-content/uploads/2016/05/idw-formula.png"))
Illustration of spatial interpolation with a power of 1
Illustration of spatial interpolation with a power of 2
Inverse Distance Weighting formula

Formula: You only need to understand this at a conceptual level, so feel free to ignore this formula. If you would like to develop a stronger mathematical intuition, here is the general formula for IDW.

3.b) Apply Interpolation to our Dataset¶

Notice, we use a custom function interpolate() we provide it the censusData, our sampled data, the IDW power, and the number of neighbors.

In this example, our sample is sample_1, a random sample of 50% of the counties in Nebraska. Our power is set to the value of 2, and the number of neighbors, numNeighbors, is set to consider the nearest 5 neighbors.

In [12]:
# Instantiate the `IDWInterpolation` class:
idw_interpolation = IDWInterpolation()

# Interpolate the values
sample_1_interp = idw_interpolation.interpolate(censusData2020, sample_1, power=2, numNeighbors=5)

# Plot a sampled and interpolated data
fig = idw_interpolation.plotSampleWithInterp(sample_1, sample_1_interp, county_plotter)

img_bytes = pio.to_image(fig, format="png")  # Convert the figure to an image in memory (PNG format)
Image(img_bytes) 
Out[12]:

Performance Measure¶

How do we measure the performance of our model?¶

There is no straightforward answer to this. We can consider some common measures, but we may want to develop a measure more customized for our context. In general, our performance measure is going to be some function of the error between our poverty rate estimates and the actual value.

Recall the General Goal: To develop a map with estimated values that may eventually be used to inform how funds are distributed to people in poverty.

We want our performance measure to align with this aim. We can change elements of our method design, and check the performance using our performance measure. We can iteratively adapt the design and optimize to find what we think will be the 'best' design.

Performance Measures We'll Consider:

  1. Average Percent Error
  • In general the error of an estimate is just: actualValue - estimatedValue
  • The average percent error is just an average of each of these errors
  • Limitation a key limitation of this method is that we can have a 0% error even when we have large errors. Consider the case of an error where we overestimate a value by 10% and underestimate a value by 10%. Our average error of these two estimates would average to be 0% error.
  1. Average Absolute Error
  • Average absolute error takes the average of the absolute value of the percent error. This way, we can see the average magnitude of error.
  • Limitation a huge error counts just about as much as a small error
  1. Mean Squared Error
  • The mean squared error squares each of the errors and then averages them.
  • Since the error is squared before it is averaged, large errors penalize the performance score by a lot more than small errors do.
  • Limitation the mean squared error is a relative measure and only really makes sense when you are comparing the mean squared error of two models.
  1. Root Mean Squared Error
  • This method is the square root of the mean squared error
  • This is often used in machine learning
  1. Binary Classification of Error Below a Threshold
  • This measure calculates the percentage of estimations whose absolute error is less than a certain value. (e.g., what percent of our estimations have less than a 3% error?)
  1. Errors 1-4 by Poverty Quartile (Lowest poverty, "middle low" poverty, "middle-high" poverty, and Highest poverty)
  • We want to make sure we have high accuracy estimates particularly for households in high poverty, since resource distribution decisions may be made on this basis.
  • It may make sense for us to look at the error measures among subgroups. This measure, measure the above poverty errors, but does so for the poverty quartiles.
In [13]:
# Get the real values that correspond to the interpolated values
sample_1_interp_withActual = idw_interpolation.getRealvaluesGivenInterpolated(censusData2020, sample_1_interp)

# Calculate the error
error_calculator = ErrorCalculator()
percentBound = 3
errors = error_calculator.getErrors(sample_1_interp_withActual, percentBound, printErrors=True)
Average Error: -1.4350797291601014
Average Absolute Error: 4.832253826027726
Mean Squared Error: 39.79716386439208
Root Mean Squared Error: 6.308499335372247
Percent Predicted With Smaller than a 3% error: 36.17%


In [14]:
result = error_calculator.getErrorsByQuartile(sample_1_interp_withActual, percentBound, printErrors=True)
quartile_errors = result['quartile_errors']
poverty_ranges = result['poverty_ranges']
quartile_errors
ERROR FOR QUARTILE #1 -- 3.2% poverty to 6.7% poverty:

ERROR FOR QUARTILE #2 -- 6.8% poverty to 10.8% poverty:

ERROR FOR QUARTILE #3 -- 10.9% poverty to 15.3% poverty:

ERROR FOR QUARTILE #4 -- 15.4% poverty to 36.0% poverty:
Out[14]:
{'Quartile 1 Errors': {'Average Error': 5.036136112119926,
  'Average Absolute Error': 5.036136112119926,
  'Mean Squared Error': 26.95467124810375,
  'Root Mean Squared Error': 5.191788829305729,
  'Percent Under Error Threshold': '0.0%'},
 'Quartile 2 Errors': {'Average Error': 0.40718513287355773,
  'Average Absolute Error': 2.4841513297567284,
  'Mean Squared Error': 9.767625895584974,
  'Root Mean Squared Error': 3.125320126896599,
  'Percent Under Error Threshold': '80.0%'},
 'Quartile 3 Errors': {'Average Error': -2.2133597558727853,
  'Average Absolute Error': 2.973863697072576,
  'Mean Squared Error': 12.979338252317518,
  'Root Mean Squared Error': 3.6026848671952307,
  'Percent Under Error Threshold': '61.538%'},
 'Quartile 4 Errors': {'Average Error': -8.598379593196103,
  'Average Absolute Error': 8.598379593196103,
  'Mean Squared Error': 106.7169158677671,
  'Root Mean Squared Error': 10.330387982441275,
  'Percent Under Error Threshold': '8.333%'}}
In [15]:
error_visualizer = ErrorVisualizer()  # Create an instance of ErrorVisualizer class
error_visualizer.plot_error_barchart(quartile_errors, poverty_ranges)

Example Interpretation¶

Based on example quartile errors provided below, we can interpret the performance of the interpolation method and sampling techniques on the poverty data. Here's an example interpretation of the errors and insights that can help you with the exercises:

Quartile Errors and Performance¶

  1. Quartile 1 (3.2% to 7.6% poverty): High errors, indicating lower accuracy for counties with lower poverty rates.

    • Average Absolute Error: 6.73%
    • Root Mean Squared Error: 7.14%
    • Predictions within 3% error threshold: 8.33%
  2. Quartile 2 (8.5% to 10.9% poverty): Lower errors, suggesting better performance for counties with mid-range poverty rates.

    • Average Absolute Error: 2.24%
    • Root Mean Squared Error: 2.85%
    • Predictions within 3% error threshold: 81.82%
  3. Quartile 3 (11.0% to 13.4% poverty): Low errors, indicating good performance for counties with mid-range poverty rates.

    • Average Absolute Error: 1.79%
    • Root Mean Squared Error: 2.3%
    • Predictions within 3% error threshold: 83.33%
  4. Quartile 4 (13.6% to 36.0% poverty): High errors, suggesting lower accuracy for counties with higher poverty rates.

    • Average Absolute Error: 6.34%
    • Root Mean Squared Error: 8.89%
    • Predictions within 3% error threshold: 41.67%

Methodology and Implications¶

  • Sampling Method: Random selection (getHalfCounties_random function). May not be representative of the entire dataset, leading to uneven distribution and affecting interpolation accuracy.
  • Interpolation Method: Inverse Distance Weighting (IDW) technique (standard_idw function). The choice of power and the number of nearest neighbors can influence accuracy.

The interpolation method performs well for mid-range poverty rates (Quartiles 2 and 3) but has higher errors for lower and higher poverty rates (Quartiles 1 and 4). Possible reasons:

  1. Random sampling method might not provide a representative sample across poverty levels, leading to insufficient data for accurate interpolation.
  2. The choice of power and the number of nearest neighbors in IDW method might not be optimal for all poverty levels, causing bias in estimates.

These design decisions have implications:

  1. Inaccurate poverty estimates can impact resource allocation and support for people in need, causing disparities in assistance.
  2. Random sampling method might not capture spatial patterns of poverty effectively, leading to less accurate interpolation.
  3. The choice of power and the number of nearest neighbors in IDW method can influence trade-offs between accuracy, fairness, and transparency, with ethical implications.

Optimize and Reflect: Interpolation, Sampling, and Ethics¶

In this section, you will work on improving the interpolation method settings and the sampling techniques. You will also discuss and analyze the ethical implications, development aspects, and stakeholder considerations of your approach.

Exercise 1: Interpolation Methods and Sampling Techniques

  1. Modify the code below and try different interpolation method settings on different samples. Record your results for comparison.
  2. Experiment with various sampling methods and analyze their performance using the performance measures discussed earlier.

Exercise 2: Identifying Stakeholders and Evaluating Impacts

  1. Identify the direct and indirect stakeholders impacted by your methodological choices.
  2. Analyze the potential consequences of your choices on these stakeholders, considering issues like accuracy, fairness, and transparency.

Exercise 3: Ethical Considerations and Development Aspects

  1. Discuss the ethical implications of using different sampling methods and interpolation techniques when estimating poverty rates.
  2. Reflect on the potential consequences of inaccurate estimates on the distribution of resources to people in poverty, and the implications for sustainable development goals.

Exercise 4: Balancing Trade-offs and Navigating Dilemmas

  1. Evaluate the trade-offs between accuracy, fairness, and transparency in your methodological choices.
  2. Explore potential solutions to ethical dilemmas you encounter in designing your methods, considering the needs and concerns of various stakeholders.

Collaborative Activity: Group Discussion and Presentation

  1. Form small groups to discuss your findings from the exercises above. Share your insights and debate the pros and cons of different models, sampling methods, and performance measures in the context of ethical considerations and development impacts.
  2. Each group will present their most promising model, sampling method, and performance measure, along with their reasoning and a discussion of ethical considerations, stakeholder implications, and development aspects.
In [18]:
# Interpolate the values
power = 2         # consider changing this value and observe interpret any changes in performance
numNeighbors = 5  # consider changing this value and observe interpret any changes in performance
sample = sample_2 # consider changing this value and observe interpret any changes in performance
sample_2_interp = idw_interpolation.interpolate(censusData2020, sample, power, numNeighbors)

# Plot a sampled and interpolated data
fig = idw_interpolation.plotSampleWithInterp(sample_2, sample_2_interp, county_plotter)
img_bytes = pio.to_image(fig, format="png")  # Convert the figure to an image in memory (PNG format)
Image(img_bytes) 
Out[18]:
In [19]:
# Get the real values that correspond to the interpolated values
sample_2_interp_withActual = idw_interpolation.getRealvaluesGivenInterpolated(censusData2020, sample_2_interp)

# Calculate the error
percentBound = 3
overallErrors = pd.DataFrame.from_dict(error_calculator.getErrors(sample_2_interp_withActual, percentBound), orient="index", columns=["Sample Results"])

errors_by_quartile_result = error_calculator.getErrorsByQuartile(sample_2_interp_withActual, percentBound)
errorsByQuartile = pd.DataFrame.from_dict({(key, sub_key): value for key, sub_dict in errors_by_quartile_result['quartile_errors'].items() for sub_key, value in sub_dict.items()}, orient="index").unstack().droplevel(0, axis=0)
In [20]:
overallErrors
Out[20]:
Sample Results
Average Error 1.561561
Average Absolute Error 4.43028
Mean Squared Error 29.053239
Root Mean Squared Error 5.390106
Percent Under Error Threshold 36.956%
In [21]:
errorsByQuartile
Out[21]:
(Quartile 1 Errors, Average Error)                     7.468757
(Quartile 1 Errors, Average Absolute Error)            7.468757
(Quartile 1 Errors, Mean Squared Error)               61.144233
(Quartile 1 Errors, Root Mean Squared Error)           7.819478
(Quartile 1 Errors, Percent Under Error Threshold)         0.0%
(Quartile 2 Errors, Average Error)                     2.739632
(Quartile 2 Errors, Average Absolute Error)            3.076405
(Quartile 2 Errors, Mean Squared Error)               14.758129
(Quartile 2 Errors, Root Mean Squared Error)           3.841631
(Quartile 2 Errors, Percent Under Error Threshold)      58.333%
(Quartile 3 Errors, Average Error)                    -0.502834
(Quartile 3 Errors, Average Absolute Error)            2.377943
(Quartile 3 Errors, Mean Squared Error)               11.941773
(Quartile 3 Errors, Root Mean Squared Error)           3.455687
(Quartile 3 Errors, Percent Under Error Threshold)      72.727%
(Quartile 4 Errors, Average Error)                    -3.139077
(Quartile 4 Errors, Average Absolute Error)            4.880194
(Quartile 4 Errors, Mean Squared Error)               29.617114
(Quartile 4 Errors, Root Mean Squared Error)           5.442161
(Quartile 4 Errors, Percent Under Error Threshold)      16.666%
dtype: object

Assignment: Census Bureau Brief and Recommendations

  1. In your lab groups, review the existing data and findings from your spatial interpolation exercises for estimating poverty rates in Nebraska. Given the scope of this lab, we didn't extend it to the United States, but use your findings from Nebraska for this assignment.

  2. Write a brief 'policy brief' (1-2 pages) that:

    a. Summarizes your findings from the exercises, focusing on the methods used, the ethical considerations, and the development implications.

    b. Discusses the challenges and limitations of using spatial interpolation methods for estimating poverty rates in Nebraska.

    c. Presents recommendations for using spatial interpolation methods, sampling techniques, and performance measures in Nebraska, along with justifications for their suitability.

    d. Reflects on the trade-offs, dilemmas, and potential consequences of implementing your recommendations in the context of Nebraska.

  3. Include a one-page executive summary of your policy brief at the beginning, summarizing the key points and recommendations.

Submit your policy brief along with any code you modified or developed during the exercises.

In [ ]: