Assignment 4

In this assignment, you'll combine the assignment 3 data set with nutrition data from the USDA Food Composition Databases. The CSV file fresh.csv contains the fresh fruits and vegetables data you extracted in assignment 3.

The USDA Food Composition Databases have a documented web API that returns data in JSON format . You need a key in order to use the API. Only 1000 requests are allowed per hour, so it would be a good idea to use caching.

Sign up for an API key here. The key will work with any Data.gov API. You may need the key again later in the quarter, so make sure you save it.

These modules may be useful:

Exercise 1.1. Read the search request documentation, then write a function called ndb_search() that makes a search request. The function should accept the search term as an argument. The function should return the search result items as a list (for 0 items, return an empty list).

Note that the search url is: https://api.nal.usda.gov/ndb/search

As an example, a search for "quail eggs" should return this list:

[{u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'CHAOKOH, QUAIL EGG IN BRINE, UPC: 044738074186',
  u'ndbno': u'45094707',
  u'offset': 0},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'L&W, QUAIL EGGS, UPC: 024072000256',
  u'ndbno': u'45094890',
  u'offset': 1},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'BUDDHA, QUAIL EGGS IN BRINE, UPC: 761934535098',
  u'ndbno': u'45099560',
  u'offset': 2},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'GRAN SABANA, QUAIL EGGS, UPC: 819140010103',
  u'ndbno': u'45169279',
  u'offset': 3},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u"D'ARTAGNAN, QUAIL EGGS, UPC: 736622102630",
  u'ndbno': u'45178254',
  u'offset': 4},
 {u'ds': u'SR',
  u'group': u'Dairy and Egg Products',
  u'name': u'Egg, quail, whole, fresh, raw',
  u'ndbno': u'01140',
  u'offset': 5}]

As usual, make sure you document and test your function.

In [446]:
from urllib2 import Request, urlopen
from urlparse import urlparse, urlunparse
import requests
import requests_cache
import pandas as pd
import json
import os
from matplotlib import pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
requests_cache.install_cache('usda_cache')

key = 'O6dynzQFxqlFUFzc9XoqGxr0froELHWUtxhW3aVR'

def ndb_search(search_term):
    """
    This queries the USDA Food composite database to make a search request
    Input: search_term (string of terms to search for)
    Output: json object (of search results)
    """
    urlbase = "https://api.nal.usda.gov/ndb/search"
    try: 
        usdareq = requests.get(urlbase + "?format=json" + "&q={}".format(search_term) + "&api_key={}".format(key))
        usda_json = usdareq.json()
        usda_json = usda_json['list']['item']
        return usda_json
    except (KeyError): 
        pass

ndb_search('potatoes raw')
Out[446]:
[{u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Potatoes, raw, skin',
  u'ndbno': u'11362',
  u'offset': 0},
 {u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Sweet potato, raw, unprepared',
  u'ndbno': u'11507',
  u'offset': 1},
 {u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Sweet potato leaves, raw',
  u'ndbno': u'11505',
  u'offset': 2},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u"BRAD'S RAW CHIPS, RAW CHIPS, SWEET POTATO, UPC: 854615002023",
  u'ndbno': u'45017998',
  u'offset': 3},
 {u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Potatoes, flesh and skin, raw',
  u'ndbno': u'11352',
  u'offset': 4},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'RAW SWEET POTATO FRIES, UPC: 087738172100',
  u'ndbno': u'45184166',
  u'offset': 5},
 {u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Potatoes, russet, flesh and skin, raw',
  u'ndbno': u'11353',
  u'offset': 6},
 {u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Potatoes, white, flesh and skin, raw',
  u'ndbno': u'11354',
  u'offset': 7},
 {u'ds': u'SR',
  u'group': u'Vegetables and Vegetable Products',
  u'name': u'Potatoes, red, flesh and skin, raw',
  u'ndbno': u'11355',
  u'offset': 8},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'RAW STUFFED CHICKEN BREASTS LOADED BAKED POTATO, UPC: 073461000056',
  u'ndbno': u'45218813',
  u'offset': 9}]

Exercise 1.2. Use your search function to get NDB numbers for the foods in the fresh.csv file. It's okay if you don't get an NDB number for every food, but try to come up with a strategy that gets most of them. Discuss your strategy in a short paragraph.

Hints:

  • The foods are all raw and unbranded.
  • You can test search terms with the online search page.
  • You can convert the output of ndb_search() to a data frame with pd.DataFrame().
  • The string methods for Python and Pandas are useful here. It's okay if you use simple regular expressions in the Pandas methods, although this exercise can be solved without them.
  • You can merge data frames that have a column in common with pd.merge().
In [456]:
path = os.path.join(os.getcwd(), 'fresh.csv')
fresh = pd.read_csv(path)
fresh['food'] = fresh['food'].str.replace("_", " ")
fresh_df = pd.DataFrame()
for i in fresh['food']:
    search = i + ' raw'
    j = i.replace(" ", "|")
    try: 
        results = pd.DataFrame(ndb_search(search))
        results = results.loc[results.ds.str.startswith("SR"), :]
        results = results.loc[results.group.str.startswith(("Fruits", "Vegetables")), :]
        if results.shape == (1,5):
            results["offset"] = i
            fresh_df = fresh_df.append(results)
        else:
            results = results[results['name'].str.contains(j, case = False)]
            try: 
                results["offset"] = i
                fresh_df = fresh_df.append(results.iloc[0])
            except (IndexError):
                pass

    except (AttributeError):
        pass
In [458]:
fresh_df = fresh_df.drop_duplicates()
fresh_df = fresh_df.rename(columns = {'offset':'food'})
fresh_df = fresh_df.merge(fresh)
fresh_df
Out[458]:
ds group name ndbno food form price_per_lb yield lb_per_cup price_per_cup type
0 SR Fruits and Fruit Juices Watermelon, raw 09326 watermelon Fresh1 0.333412 0.520000 0.330693 0.212033 fruit
1 SR Fruits and Fruit Juices Melons, cantaloupe, raw 09181 cantaloupe Fresh1 0.535874 0.510000 0.374786 0.393800 fruit
2 SR Fruits and Fruit Juices Tangerines, (mandarin oranges), raw 09218 tangerines Fresh1 1.377962 0.740000 0.407855 0.759471 fruit
3 SR Fruits and Fruit Juices Strawberries, raw 09316 strawberries Fresh1 2.358808 0.940000 0.319670 0.802171 fruit
4 SR Fruits and Fruit Juices Plums, raw 09279 plums Fresh1 1.827416 0.940000 0.363763 0.707176 fruit
5 SR Fruits and Fruit Juices Oranges, raw, California, valencias 09201 oranges Fresh1 1.035173 0.730000 0.407855 0.578357 fruit
6 SR Fruits and Fruit Juices Raspberries, raw 09302 raspberries Fresh1 6.975811 0.960000 0.319670 2.322874 fruit
7 SR Fruits and Fruit Juices Pomegranates, raw 09286 pomegranate Fresh1 2.173590 0.560000 0.341717 1.326342 fruit
8 SR Fruits and Fruit Juices Pineapple, raw, all varieties 09266 pineapple Fresh1 0.627662 0.510000 0.363763 0.447686 fruit
9 SR Fruits and Fruit Juices Apricots, raw 09021 apricots Fresh1 3.040072 0.930000 0.363763 1.189102 fruit
10 SR Fruits and Fruit Juices Melons, honeydew, raw 09184 honeydew Fresh1 0.796656 0.460000 0.374786 0.649077 fruit
11 SR Fruits and Fruit Juices Papayas, raw 09226 papaya Fresh1 1.298012 0.620000 0.308647 0.646174 fruit
12 SR Fruits and Fruit Juices Cherries, sweet, raw 09070 cherries Fresh1 3.592990 0.920000 0.341717 1.334548 fruit
13 SR Fruits and Fruit Juices Bananas, raw 09040 bananas Fresh1 0.566983 0.640000 0.330693 0.292965 fruit
14 SR Fruits and Fruit Juices Apples, raw, with skin 09003 apples Fresh1 1.567515 0.900000 0.242508 0.422373 fruit
15 SR Fruits and Fruit Juices Peaches, yellow, raw 09236 peaches Fresh1 1.591187 0.960000 0.341717 0.566390 fruit
16 SR Fruits and Fruit Juices Nectarines, raw 09191 nectarines Fresh1 1.761148 0.910000 0.319670 0.618667 fruit
17 SR Fruits and Fruit Juices Pears, raw 09252 pears Fresh1 1.461575 0.900000 0.363763 0.590740 fruit
18 SR Fruits and Fruit Juices Grapefruit, raw, white, California 09117 grapefruit Fresh1 0.897802 0.490000 0.462971 0.848278 fruit
19 SR Fruits and Fruit Juices Blackberries, raw 09042 blackberries Fresh1 5.774708 0.960000 0.319670 1.922919 fruit
20 SR Fruits and Fruit Juices Grapes, muscadine, raw 09129 grapes Fresh1 2.093827 0.960000 0.330693 0.721266 fruit
21 SR Fruits and Fruit Juices Blueberries, raw 09050 blueberries Fresh1 4.734622 0.950000 0.319670 1.593177 fruit
22 SR Fruits and Fruit Juices Mangos, raw 09176 mangoes Fresh1 1.377563 0.710000 0.363763 0.705783 fruit
23 SR Vegetables and Vegetable Products Asparagus, raw 11011 asparagus Fresh1 3.213494 0.493835 0.396832 2.582272 vegetables
24 SR Vegetables and Vegetable Products Lettuce, iceberg (includes crisphead types), raw 11252 lettuce iceberg Fresh1 1.213039 0.950000 0.242508 0.309655 vegetables
25 SR Vegetables and Vegetable Products Onions, raw 11282 onions Fresh1 1.038107 0.900000 0.352740 0.406868 vegetables
26 SR Vegetables and Vegetable Products Turnip greens, raw 11568 turnip greens Fresh1 2.471749 0.750000 0.319670 1.053526 vegetables
27 SR Vegetables and Vegetable Products Mustard greens, raw 11270 mustard greens Fresh1 2.569235 0.840000 0.308647 0.944032 vegetables
28 SR Vegetables and Vegetable Products Potatoes, raw, skin 11362 potatoes Fresh1 0.564320 0.811301 0.264555 0.184017 vegetables
29 SR Vegetables and Vegetable Products Collards, raw 11161 collard greens Fresh1 2.630838 1.160000 0.286601 0.650001 vegetables
30 SR Vegetables and Vegetable Products Beans, snap, green, raw 11052 green beans Fresh1 2.139972 0.846575 0.275578 0.696606 vegetables
31 SR Vegetables and Vegetable Products Squash, winter, acorn, raw 11482 acorn squash Fresh1 1.172248 0.458554 0.451948 1.155360 vegetables
32 SR Vegetables and Vegetable Products Peppers, sweet, red, raw 11821 red peppers Fresh1 2.277940 0.820000 0.264555 0.734926 vegetables
33 SR Vegetables and Vegetable Products Swamp cabbage, (skunk cabbage), raw 11503 cabbage Fresh green cabbage1 0.579208 0.778797 0.330693 0.245944 vegetables
34 SR Vegetables and Vegetable Products Swamp cabbage, (skunk cabbage), raw 11503 cabbage Fresh red cabbage1 1.056450 0.779107 0.330693 0.448412 vegetables
35 SR Vegetables and Vegetable Products Sweet potato, raw, unprepared 11507 sweet potatoes Fresh1 0.918897 0.811301 0.440925 0.499400 vegetables
36 SR Vegetables and Vegetable Products Squash, summer, scallop, raw 11475 summer squash Fresh1 1.639477 0.769500 0.396832 0.845480 vegetables
37 SR Vegetables and Vegetable Products Radishes, raw 11429 radish Fresh1 1.311629 0.900000 0.275578 0.401618 vegetables
38 SR Vegetables and Vegetable Products Squash, winter, butternut, raw 11485 butternut squash Fresh1 1.244737 0.714000 0.451948 0.787893 vegetables
39 SR Fruits and Fruit Juices Avocados, raw, California 09038 avocados Fresh1 2.235874 0.740753 0.319670 0.964886 vegetables
40 SR Vegetables and Vegetable Products Kale, raw 11233 kale Fresh1 2.807302 1.050000 0.286601 0.766262 vegetables
41 SR Vegetables and Vegetable Products Jerusalem-artichokes, raw 11226 artichoke Fresh1 2.213050 0.375309 0.385809 2.274967 vegetables
42 SR Vegetables and Vegetable Products Okra, raw 11278 okra Fresh1 3.213552 0.769474 0.352740 1.473146 vegetables
43 SR Vegetables and Vegetable Products Peppers, sweet, green, raw 11333 green peppers Fresh1 1.410363 0.820000 0.264555 0.455022 vegetables
44 SR Vegetables and Vegetable Products Brussels sprouts, raw 11098 brussels sprouts Fresh1 2.763553 1.060000 0.341717 0.890898 vegetables
45 SR Vegetables and Vegetable Products Corn, sweet, white, raw 11900 corn sweet Fresh1 2.690623 0.540000 0.363763 1.812497 vegetables
In [406]:
ndbno = [i for i in fresh_df['ndbno']]
nd = ['09279']

My strategy for tackling this problem started with replacing the '_' in all food names with spaces then adding ' raw' in order to use the ndb_search function to search for that term. Then I further narrowed the results DataFrame down by eliminating all rows without 'SR', which meant that no branded foods would be included. Then, I narrowed the results to those that were in the Fruits or Vegetables categories, since I was getting some results in other categories. Finally, I further narrowed the results to keep all results that contained the key words of the food I was searching for. In order to do this well, I replaced all spaces in the food types with the '|' (or) symbol. Because, I still had 300+ results, I simply just took the first entry of each search and appended that into the fresh_df DataFrame. In the end, I ended up with most of the correct rows that I wanted. The cucumber rows were a sad consequence, since the keyword given here was "cucumbers" and the keyword in the results DataFrame was "cucumber". Besides that, the search for "kiwi raw" did not give any results, since in the USDA database it is under "Kiwifruit, raw". Finally, I ended up with the wrong cabbage, artichoke, and potato types. Besides those 5 mistakes, I ended up with the correct fruits and vegetables for each food type.

Exercise 1.3. Read the food reports V2 documentation, then write a function called ndb_report() that requests a basic food report. The function should accept the NDB number as an argument and return the list of nutrients for the food.

Note that the report url is: https://api.nal.usda.gov/ndb/V2/reports

For example, for "09279" (raw plums) the first element of the returned list should be:

{u'group': u'Proximates',
 u'measures': [{u'eqv': 165.0,
   u'eunit': u'g',
   u'label': u'cup, sliced',
   u'qty': 1.0,
   u'value': u'143.93'},
  {u'eqv': 66.0,
   u'eunit': u'g',
   u'label': u'fruit (2-1/8" dia)',
   u'qty': 1.0,
   u'value': u'57.57'},
  {u'eqv': 151.0,
   u'eunit': u'g',
   u'label': u'NLEA serving',
   u'qty': 1.0,
   u'value': u'131.72'}],
 u'name': u'Water',
 u'nutrient_id': u'255',
 u'unit': u'g',
 u'value': u'87.23'}

Be sure to document and test your function.

In [459]:
def ndb_report(ndbno):
    """
    This queries the USDA Food composite database to make a search request for reports based on an NDB number
    Input: ndbno(takes in one number)
    Output: json object(of nutrient values of the NDB numbers provided)
    """
    urlbase = "https://api.nal.usda.gov/ndb/V2/reports?"
    try: 
        usdareq = requests.get(urlbase + 'ndbno={}&'.format("".join(ndbno)) + "type=b&"+ "format=json&" + "api_key={}".format(key))
        usda_json = usdareq.json()
        usda_json = usda_json['foods'][0]['food']['nutrients']
        return usda_json
    except (KeyError): 
        pass

ndb_report(nd)
Out[459]:
[{u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'143.93'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'57.57'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'131.72'}],
  u'name': u'Water',
  u'nutrient_id': u'255',
  u'unit': u'g',
  u'value': u'87.23'},
 {u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'76'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'30'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'69'}],
  u'name': u'Energy',
  u'nutrient_id': u'208',
  u'unit': u'kcal',
  u'value': u'46'},
 {u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'1.15'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.46'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'1.06'}],
  u'name': u'Protein',
  u'nutrient_id': u'203',
  u'unit': u'g',
  u'value': u'0.70'},
 {u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.46'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.18'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.42'}],
  u'name': u'Total lipid (fat)',
  u'nutrient_id': u'204',
  u'unit': u'g',
  u'value': u'0.28'},
 {u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'18.84'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'7.54'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'17.24'}],
  u'name': u'Carbohydrate, by difference',
  u'nutrient_id': u'205',
  u'unit': u'g',
  u'value': u'11.42'},
 {u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'2.3'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.9'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'2.1'}],
  u'name': u'Fiber, total dietary',
  u'nutrient_id': u'291',
  u'unit': u'g',
  u'value': u'1.4'},
 {u'group': u'Proximates',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'16.37'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'6.55'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'14.98'}],
  u'name': u'Sugars, total',
  u'nutrient_id': u'269',
  u'unit': u'g',
  u'value': u'9.92'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'10'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'4'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'9'}],
  u'name': u'Calcium, Ca',
  u'nutrient_id': u'301',
  u'unit': u'mg',
  u'value': u'6'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.28'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.11'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.26'}],
  u'name': u'Iron, Fe',
  u'nutrient_id': u'303',
  u'unit': u'mg',
  u'value': u'0.17'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'12'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'5'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'11'}],
  u'name': u'Magnesium, Mg',
  u'nutrient_id': u'304',
  u'unit': u'mg',
  u'value': u'7'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'26'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'11'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'24'}],
  u'name': u'Phosphorus, P',
  u'nutrient_id': u'305',
  u'unit': u'mg',
  u'value': u'16'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'259'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'104'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'237'}],
  u'name': u'Potassium, K',
  u'nutrient_id': u'306',
  u'unit': u'mg',
  u'value': u'157'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0'}],
  u'name': u'Sodium, Na',
  u'nutrient_id': u'307',
  u'unit': u'mg',
  u'value': u'0'},
 {u'group': u'Minerals',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.17'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.07'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.15'}],
  u'name': u'Zinc, Zn',
  u'nutrient_id': u'309',
  u'unit': u'mg',
  u'value': u'0.10'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'15.7'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'6.3'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'14.3'}],
  u'name': u'Vitamin C, total ascorbic acid',
  u'nutrient_id': u'401',
  u'unit': u'mg',
  u'value': u'9.5'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.046'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.018'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.042'}],
  u'name': u'Thiamin',
  u'nutrient_id': u'404',
  u'unit': u'mg',
  u'value': u'0.028'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.043'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.017'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.039'}],
  u'name': u'Riboflavin',
  u'nutrient_id': u'405',
  u'unit': u'mg',
  u'value': u'0.026'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.688'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.275'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.630'}],
  u'name': u'Niacin',
  u'nutrient_id': u'406',
  u'unit': u'mg',
  u'value': u'0.417'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.048'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.019'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.044'}],
  u'name': u'Vitamin B-6',
  u'nutrient_id': u'415',
  u'unit': u'mg',
  u'value': u'0.029'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'8'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'3'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'8'}],
  u'name': u'Folate, DFE',
  u'nutrient_id': u'435',
  u'unit': u'\xb5g',
  u'value': u'5'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.00'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.00'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.00'}],
  u'name': u'Vitamin B-12',
  u'nutrient_id': u'418',
  u'unit': u'\xb5g',
  u'value': u'0.00'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'28'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'11'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'26'}],
  u'name': u'Vitamin A, RAE',
  u'nutrient_id': u'320',
  u'unit': u'\xb5g',
  u'value': u'17'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'569'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'228'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'521'}],
  u'name': u'Vitamin A, IU',
  u'nutrient_id': u'318',
  u'unit': u'IU',
  u'value': u'345'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.43'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.17'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.39'}],
  u'name': u'Vitamin E (alpha-tocopherol)',
  u'nutrient_id': u'323',
  u'unit': u'mg',
  u'value': u'0.26'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.0'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.0'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.0'}],
  u'name': u'Vitamin D (D2 + D3)',
  u'nutrient_id': u'328',
  u'unit': u'\xb5g',
  u'value': u'0.0'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0'}],
  u'name': u'Vitamin D',
  u'nutrient_id': u'324',
  u'unit': u'IU',
  u'value': u'0'},
 {u'group': u'Vitamins',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'10.6'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'4.2'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'9.7'}],
  u'name': u'Vitamin K (phylloquinone)',
  u'nutrient_id': u'430',
  u'unit': u'\xb5g',
  u'value': u'6.4'},
 {u'group': u'Lipids',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.028'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.011'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.026'}],
  u'name': u'Fatty acids, total saturated',
  u'nutrient_id': u'606',
  u'unit': u'g',
  u'value': u'0.017'},
 {u'group': u'Lipids',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.221'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.088'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.202'}],
  u'name': u'Fatty acids, total monounsaturated',
  u'nutrient_id': u'645',
  u'unit': u'g',
  u'value': u'0.134'},
 {u'group': u'Lipids',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.073'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.029'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.066'}],
  u'name': u'Fatty acids, total polyunsaturated',
  u'nutrient_id': u'646',
  u'unit': u'g',
  u'value': u'0.044'},
 {u'group': u'Lipids',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0.000'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0.000'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0.000'}],
  u'name': u'Fatty acids, total trans',
  u'nutrient_id': u'605',
  u'unit': u'g',
  u'value': u'0.000'},
 {u'group': u'Lipids',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0'}],
  u'name': u'Cholesterol',
  u'nutrient_id': u'601',
  u'unit': u'mg',
  u'value': u'0'},
 {u'group': u'Other',
  u'measures': [{u'eqv': 165.0,
    u'eunit': u'g',
    u'label': u'cup, sliced',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 66.0,
    u'eunit': u'g',
    u'label': u'fruit (2-1/8" dia)',
    u'qty': 1.0,
    u'value': u'0'},
   {u'eqv': 151.0,
    u'eunit': u'g',
    u'label': u'NLEA serving',
    u'qty': 1.0,
    u'value': u'0'}],
  u'name': u'Caffeine',
  u'nutrient_id': u'262',
  u'unit': u'mg',
  u'value': u'0'}]
In [460]:
#list of all foods
usda_list = [ndb_report(i) for i in ndbno]

kcal_list = [i[1]['value'] for i in usda_list]
kcal_list = pd.Series(map(int, kcal_list))
fresh_df['nutrition'] = kcal_list.values
usda_list[37][1]
Out[460]:
{u'group': u'Proximates',
 u'measures': [{u'eqv': 116.0,
   u'eunit': u'g',
   u'label': u'cup slices',
   u'qty': 1.0,
   u'value': u'19'},
  {u'eqv': 9.0,
   u'eunit': u'g',
   u'label': u'large (1" to 1-1/4" dia)',
   u'qty': 1.0,
   u'value': u'1'},
  {u'eqv': 4.5,
   u'eunit': u'g',
   u'label': u'medium (3/4" to 1" dia)',
   u'qty': 1.0,
   u'value': u'1'},
  {u'eqv': 1.0,
   u'eunit': u'g',
   u'label': u'slice',
   u'qty': 1.0,
   u'value': u'0'},
  {u'eqv': 2.0,
   u'eunit': u'g',
   u'label': u'small',
   u'qty': 1.0,
   u'value': u'0'},
  {u'eqv': 58.0,
   u'eunit': u'g',
   u'label': u'cup slices',
   u'qty': 0.5,
   u'value': u'9'}],
 u'name': u'Energy',
 u'nutrient_id': u'208',
 u'unit': u'kcal',
 u'value': u'16'}

Exercise 1.4. Which foods provide the best combination of price, yield, and nutrition? You can use kilocalories as a measure of "nutrition" here, but more a detailed analysis is better. Use plots to support your analysis.

In [461]:
import seaborn as sns

sns.regplot('yield', 'price_per_lb', data = fresh_df, fit_reg = False)
plt.title('Yield vs. Price per Pound')
plt.xlabel('Yield')
plt.ylabel('Price per Pound')
Out[461]:
<matplotlib.text.Text at 0x120d85b0>
In [462]:
sns.regplot('nutrition', 'price_per_lb', data = fresh_df, fit_reg = False)
plt.title('Nutrition vs. Price per Pound')
plt.xlabel('Nutrition in KCal')
plt.ylabel('Price per Pound')
Out[462]:
<matplotlib.text.Text at 0x11ec9470>

It seems as though, the best combination of Nutrition, Price per Pound, and Yield can be calculated by this equation: (Nutrition * Yield)/(Price per Lb)

In [463]:
fresh_df['score'] = fresh_df['nutrition']*fresh_df['yield']/fresh_df['price_per_lb']
best = fresh_df.ix[fresh_df['score'] > 50]
best
Out[463]:
ds group name ndbno food form price_per_lb yield lb_per_cup price_per_cup type nutrition score
13 SR Fruits and Fruit Juices Bananas, raw 09040 bananas Fresh1 0.566983 0.640000 0.330693 0.292965 fruit 89 100.461492
28 SR Vegetables and Vegetable Products Potatoes, raw, skin 11362 potatoes Fresh1 0.564320 0.811301 0.264555 0.184017 vegetables 58 83.384393
35 SR Vegetables and Vegetable Products Sweet potato, raw, unprepared 11507 sweet potatoes Fresh1 0.918897 0.811301 0.440925 0.499400 vegetables 86 75.930089
39 SR Fruits and Fruit Juices Avocados, raw, California 09038 avocados Fresh1 2.235874 0.740753 0.319670 0.964886 vegetables 167 55.327699

For this analysis I used the amount of KCals in 100 grams of each food (nutrition), the Price per Pound, and the yield percentage. By multiplying the yield and nutrition, we get the amount of KCals that survived the growing and picking process of the fruit or vegetables per 100 grams. This eliminates many low yield and low energy foods. Next by dividing by the price per pound, instead of price per cup, we make sure that we still using weight instead of volume as our comparison tool. By diving by pricer per pound, we allow lower priced foods with high energy per yield to stand out and get higher scores.

In the end, excluding the wrong potato type, I ended up with bananas being the best combination of nutrition, price, and yield, with sweet potatoes and avocados coming in 2nd and 3rd place.