# CME 193 - Pandas Exercise Supplement

In this extended exercise, you'll load and play with CO2 data collected at the Mauna Loa observatory over the last 60 years. 

* NOAA Website: https://www.esrl.noaa.gov/gmd/ccgg/trends/full.html
* NOAA data: https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html

The monthly data can be found at this [link](ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt)

In [1]:
import numpy as np
import scipy
import pandas as pd
import matplotlib.pyplot as plt

Reads the data from the ftp server directly.

In [None]:
df = pd.read_csv('ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt', 
                   delim_whitespace=True, 
                   comment='#',
                   names=["year", "month", "decdate", "co2", "co2interp", "trend", "days"],
                   index_col=False)

In [None]:
pd.set_option('display.max_rows', 10)
df

In [None]:
# copies the original data.
orig = df.copy()

## Part 1 - Normalize the Date

1. create a new column for the dataframe called 'day' that is set to be 1 in every entry

In [None]:
# your code here

2. The dataframe now has columns for 'day', 'month', and 'year'.  Use `pd.to_datetime()` to create a new series of dates 

`dates = pd.to_datetime(...)`

In [None]:
# your code here

3. set a new column of the dataframe to hold this series.  Call the column `'date'`

In [None]:
# your code here

4. set the index of the dataframe to be the `'date'` column using the `set_index()` method.

In [None]:
# your code here

5. Now let's remove the old columns with date information.  Use the `drop()` method to remove the 'day', 'month', 'year', and 'decdate' columns.  Hint: `df.drop(..., axis=1, inplace=True)`

5a. Go ahead and drop the 'days' column as well, since we're not going to use it.

In [None]:
# your code here

## Part 2 - deal with missing values

1. First, use the `plot()` method to visualize the contents of your dataframe.  What do you see?

In [None]:
# your code here

if you read the header for the file we used to load the dataframe, you'll see that missing values take the value -99.99.

2. Set values that are `-99.99` to `None` (this indicates a missing value in Pandas).

Hint: use the `applymap()` method, and the lambda function
```python
lambda x: None if x == -99.99 else x
```
If you're familiar with [ternary operators](https://en.wikipedia.org/wiki/%3F:), this is the equivalent of
```
x == -99.99 ? None : x
```
Note that you may need to make a new assignment e.g., `df = df.applymap(...)`

In [None]:
# your code here

3. Plot your dataframe again.  What do you see now?

3a. Try plotting just the 'co2' series.  What do you see?

In [None]:
# your code here

## Part 3 - Create New DataFrames with rows that meet conditions

1. Create new dataframe called `recent` that contains all rows of the previous dataframe since 2007.  Plot it.

In [None]:
# your code here

2. Create a new dataframe called `old` that contains all rows of the dataframe before 1990.  Plot it.

In [None]:
# your code here

##### At this point, by inspection, you might be convinced there is further analysis to be done

In [None]:
np.var(old['trend']), np.var(recent['trend'])

## Part 4 - Create some groups

Let's go back to the original data that we loaded

In [None]:
df = orig
df

Suppose that we want to look at co2 averages by year instead of by month.

1. drop rows with missing values

1a. apply the map that sends -99.99 to none

1b. use the `dropna()` method to remove rows with missing values: `df = df.dropna()`

In [None]:
# your code here

2. Create a group for each year (use key 'year')

In [None]:
# your code here

3. Aggregate the groups into a new dataframe, `df2`, using `np.mean`

3a. you can drop all the columns except `'co2'` if you'd like

In [None]:
# your code here

4. make a plot of the `'co2'` series

In [None]:
# your code here