I queried businesses and restaurants falling into one of the following Yelp categories: healthmarkets, grocery, salad, farmersmarkets, organic_stores, restaurants. Because the Yelp categories are user-tagged, it should be noted that they do not yield very precise results and many of the venues are ill-classified or irrelevant. The Yelp data set was then split into two subsets: one with businesses containing the words “health”, “salad”, “farm”, “organic”, “natural”, “gourmet”, “fresh”, “vegan”, or “vegetarian” (healthy subset) and one which contained the words “fried”, “wok”, “china”, “chinese”
[4] “wings”, “ihop”, “mcdonalds”, “wendys”, “popeyes”, “kfc”, “dominos”, “burger”, “sbarro”, “nathans”, “panda-express”, “white-castle” (unhealthy). These keyword searches were determined by a summary exploration of common alias terms and yielded ~14,000 unhealthy venues and ~1200 healthy venues.
Healthy venues per person in each PUMA is tabulated by simple aggregation; to calculate average minimum distance to a healthy venue in each PUMA, I compute the distance from each randomly sampled point within the PUMA to the nearest healthy venue and then take the mean. A similar process is performed for unhealthy venues. Because the points were sampled at random within each PUMA zone, the mean distance of all points to the nearest venue is a rough proxy of what I qualify as “average minimum distance” in that PUMA; note that for any particular point, the nearest venue need not be within that PUMA itself.