# Choosing a Good Model Function

We would like to know the "best" answer to the question "What is the net change in CO2 concentration over a 24 hour time period?" Before we can we can get very far with this, we need to identify some of the relevant issues.

## Experiment 1

Use each of the five models to approximately determine how much CO2 accumulated in the San Marcos River during a 24-hour period using "Data Set 1." This data set gives rates of carbon-dioxide during a time interval [0,24]. To do this, select a Model Function (such as "Piecewise Linear" and "Cubic Spline") and press the button labelled "Integrate the model function." The integral of the model over the selected interval of integration will be display under the graph of the model function.

### Question 7

One way of determining whether a particular model is valid for a given set of data is to compare the properties of the model functions with the properties of the system being modeled. Argue whether you think CO2 production by plants and animals in the river is a continuous function of time. Is it differentiable? Which models have these properties?

### Question 8

Another important issue is whether the way the data was collected will react in some strange way with a given kind of model function.

• Our data appears cyclic. What is the period? Which of the models should be better at modeling periodic data. Why?

• Think about the piecewise linear and piecewise constant models. What would those models look like had we only sampled twice a day at sunrise and noon?

• Suppose a biologist pays a pair of student assistants to measure the rate of change of CO2 in the San Marcos River for a month. The students are spending so much time on calculus that they decide to only measure data once a day, at 9:00 in the evening. After a week's worth of data, the students report to the biologist that the river is accumulating CO2 at an alarming rate!

Write a short memo from the biologist to the students explaining why more data is needed before becoming alarmed. Be sure to include a mathematical and a biological reason why the students' data is skewed.

• Aberrant results can occur with any model if the period at which data is taken is about the same as the period of the function underlying the data. Try to create a guideline for sampling periodic data: suppose we know a function has a period P. How often would you want to sample to get a representative data set?

• Extra Credit: Look up the term Nyquist frequency in a book on numerical analysis (possibly in a chapter on the "Fourier Transform.") Explain the "official ruling" on how often one needs to sample periodic data.

## Experiment 2

We know from the definition of the integral as a limit of approximating sums that if we collect enough data, at least the piecewise constant models will get closer and closer to the true value of the integral (provided the function we are modeling is continuous). However, because of the time and expense of collecting and processing data, researchers usually prefer to use as few data points as possible to get good results.

Integrate each of the five models a second time using data set 2. Data set 2 consists of the CO2 rate measurements taken every other hour during the day. Again record your answers in the table you were given. This data will be used in the next question.

### Question 9

• Which model functions seem to behave the worst with fewer data points? Explain your observations.

• Which model functions would be the most sensible to use if data were taken every few seconds? What practical considerations start becoming important with very, very large data sets?

• What kinds of things would you have to know about the underlying phenomenon in order to have much faith in a model function based on relatively few data points?

Next:Analyzing Accuracy
Previous:Introduction to Models of CO2 Concentrations