Eureqa Desktop Tutorials

Modeling Time Series Data in Eureqa

Time series data is sequence of data points that were obtained at successive times. It typically contains a date or time column along with other columns representing values at that time. With time series data you may want to allow older values to predict newer ones. This tutorial describes strategies for using Eureqa to model time series data.

Contents

Entering Time Series Data

There are a couple of keys to preparing your time series data for analysis in Eureqa:

  1. Review the tips for entering date and time variables into Eureqa
  2. For more details see the section Dates, Dollars, Times, and Percents on the Enter Data documentation page.

  3. Enter your data in time order
  4. Eureqa's history building blocks operate on data a specified number of rows back in the data set. This means that your data needs to be in time order for the row offsets to accurately reflect a time delay.

  5. Use blank lines between multiple time series of discontinuous time gaps
  6. Some time series data may include subsets of data that should be treated separately. For example, if you are trying to model sales performance for two different products, you don't want the sales numbers for one product to be used when modeling sales for the other. To indicate a break in the data set, such as the break between data for two products (or other type of partition), simply insert a blank row between data subsets. In this case the data within each partition of data should be in time order.

    Similarly, if you have data that is typically taken at an interval of 1 minute but have some gaps that are much larger, you may not want Eureqa to treat this as one continuous data set. In this case you can again use a blank row to indicate the discontinuity and prevent Eureqa from treating rows before the blank row of part of a continuous history for the rows after the blank row.

    For more information, see the section Partitioned or Discontinuous Data on the Enter Data documentation page.

return to top ⇑

History Building Blocks

Eureqa provides several "history" building blocks designed specifically for time series data. These building blocks are briefly explained below.

Delayed Variable

Eureqa provides the delay(x, c) building block to represent an arbitrary time delay, where x could be any expression, and the time delay is represented in terms of a number of rows in the input data. The expression delay(x, c) returns the value of x at c rows in the past. When used as a building block, Eureqa can automatically optimize both which expression or variable had a delayed effect (x) and the amount of time delay (c).

The figure to the right plots an arbitrary variable x and a delayed value delay(x, 1). The delayed version is equal to x, 1 row in the past.

Moving Average and Moving Median Building Blocks

The remaining history building blocks are moving average and moving median values. An example is the simple moving average building block, sma(x, c) that represents the average value of the last c rows of the x variable. All of these building blocks represent a smoothed effect over a number of rows.

For more information on all of the moving average and moving median building blocks available see the Model Building Blocks Reference Page.

Configuring History Building Blocks

The history building blocks can be found in the "Formula building blocks" area of the "Set Target" tab. If you scroll down to the "History" section you will see the relevant building blocks:

You can use the check boxes to select one or more of the history building blocks that you want to include in your Eureqa search. You may also consider lowering the complexity value of history building blocks if you expect your data to have delayed or smoothed effects and want to encourage their use in your time series models.

return to top ⇑

Setting Limits on Time Delay

In some cases, you may want to set limits on time delays that appear in your data.

Fixed Time Delays

If you want to only allow a variable with a fixed time delay to appear in your models, you can enter the variable with the fixed time delay directly into the your target expression on the Set Target tab. For example:

        x = f(delay(x,5), delay(x,10))

This target expression tells Eureqa to find an equation to model the value of x as a function of its value at 5 and 10 rows in the past.

Note that if you truly want want delays only of the size specified in the target expression, you should disable the "delayed variable" building block to prevent Eureqa from automatically adding in additional delay building blocks.

Minimum Time Delays

You may also want to specify a minimum time delay offset. A common example is if you want to model a particular variable based on the past performance of that same variable. If you entered a target expression such as x = f(x), Eureqa would find the trivial answer f(x) = x. More likely, you want to find a model of x, but as a function of x at least some amount of time in the past.

The way to do this is to again use a fixed delay, such as:

        x = f(delay(x,5))

This time, be sure to enable the "delayed variable" building block. Here, 5 rows becomes the minimum time delay. Then, Eureqa may choose to delay this delayed input further if necessary. For example, Eureqa may find a solution that uses the term delay(delay(x,5),2) which would simplify to delay(x,7).

Another common reason to set a minimum time delay is if you want to be able to use your time series model to predict a future value based on currently available values. Suppose for example that you have one row per week and you want to be able to predict next week sales based only on this weeks data. In this case you might set up a target expression with a minimum delay for all input variables, for example:

        sales=f(delay(x,1), delay(y,1), delay(z,1))

A model produced with this target expression will enable the value of sales to be entirely computed based on the previous week. A larger minimium delay could be set if you wanted to be able to make predictions for a larger time period, for example 4 weeks in advance.

Maximum History Data

Notice that the delayed output plotted above does not have values on the left side of the graph for the first few time points. This is because these points require previous values of x that lie before the first point in our data set. Eureqa will automatically ignore these data points when calculating errors. The more historical data that is used in delayed variable or moving average/median building blocks, the more early data points will be ignored for training and validation.

There is a way to control how much of the data set Eureqa is allowed to use for history. This setting will effectively specify a maximum delay offset for rows to be included in models via the history building blocks. Be default, Eureqa will choose a limit based on the number of rows in the data. To set the limit manually, choose the "Use custom data settings" option on the "Set Target" page and use the "Maximum History Data" setting in the "Data Settings" menu to set the maximum percentage of the input records that will be used as history.

If you find that Eureqa is identifying solutions with very large time delays, perhaps to avoid modeling difficult features in the first half of the data set, you may want to reduce this fraction.

return to top ⇑