Eureqa Desktop User Guide

Entering Data into Eureqa Desktop

This spreadsheet-like view is the place to enter, edit, and inspect your data. You can paste in data from Excel, from Matlab's array editor, or from just about any other source of tabular data, including plain-text files with tab-separated values.

Contents

Layout

The following aspects of the layout are predefined:

  • Each column corresponds to a single variable in your data, e.g., time, or dissolved oxygen concentration.
  • The first row, labeled "desc", is for descriptions and other comments regarding the data in the column below, e.g., "elapsed time in seconds", or "dissolved oxygen concentration in ppm". (This row is for humans; Formulize ignores it.)
  • The second row, labeled "var", is where you give names to your variables, e.g., "t", or "DO".
  • The remaining rows are for data, each row representing a set of measurements or values that are in some sense simultaneous, e.g., a time and an oxygen measurement taken at that time.
return to top ⇑

Partitioned or Discontinuous Data

Several features in Eureqa assume that your data is one continuous series of points. These features are:

  • Data smoothing
  • History building blocks
  • Numerical derivative operators

If you are using any of those features and your data consists of discontinuous parts (e.g., data from two or more independent experiments or time series), indicate separations with an empty row. This keeps Eureqa from attempting to smooth, use history, or differentiate across discontinuous data points.

You can insert a blank row by clicking on the first row of the next series of points. After selecting the row, right click and select the Insert > Insert Row option. For example:


Eureqa will automatically recognize this blank line as a break between continuous series of data for each variables. This allows you to use the smoothing, history, and derivative features correctly within each continuous series, as in this data smoothing example:


Without a break, the smooth would attempt to blur the two distinct series into a single smooth curve.

return to top ⇑

Expressions

You can fill an entire column with values derived from values in other columns by entering a mathematical expression (preceded by an "=") anywhere in the column. For example, suppose you have values of x in column A and values of y in column B, and you enter =sin(x)+y somewhere in column C. For each non-empty row, sin(x)+y will be evaluated using the values of x and y in that row, and the result will be placed in that row of column C. Note that references must be made using variable names, not cell addresses. Any function found on the Building Blocks page can be used in a column-filling expression.

return to top ⇑

Dates, Times, Dollars, and Percents

Dates, times, dollars, and percents can be entered using the following formats:

  • 2/14/2013 (month/day/year)
  • 2/14/2013 19:24:22 (month/day/year hour:minute:second)
  • $15.22
  • 76.1

If you enter date or time values into Eureqa, Eureqa will automatically convert the dates or times into numeric values that can be used for modeling. The size and starting point of the numeric values may depend on the data itself. For example, if you enter the dates "2/24/2013", "2/25/2013", and "2/26/2013", Eureqa may internally convert these to the values "1", "2", and "3", with each unit representing 1 day. Similarly, if you entered a series of times "2/14/2013 01:01:00", "2/14/2013 01:02:00", "2/14/2013 01:03:00", Eureqa may internally convert these to the value "1", "2", and "3" with each unit representing 1 minute.

Manually Converting Date and Time Data to Numeric Values

You may want to convert your date or time data into numeric values yourself prior to entering them into Eureqa. To do that, you need to pick a reference point to measure a time duration from, and units to measure the time duration.

For example, you could convert a time value "8:31 AM" to 8.52 hours since midnight. Similarly, for dates, you could convert a date like "Dec. 6, 1981 8:31 AM" to 81.9 total years since 1900.

You should make the date and time conversions to numeric duration values in another program like Excel before entering the data in Eureqa (see example below).

Some things to avoid:

  1. Do not concatenate date and time strings to get a numeric value. For example, do not convert a date like "1981-12-06" to 19811006. This representation of time is extremely nonlinear. It can preserve order, but has lost all meaning. Additionally, the values are very large and numerically unstable.
  2. Avoid measuring time durations from a very distant reference point. For example, if your data uses time values that span a few days, do not convert these time values to total seconds since the beginning of the century. The numeric values would be enormous and numerically unstable.

Instead, the best practice is to measure a time duration since the first time point in your data set, or enter data in the format supported by Eureqa and let Eureqa do the conversion to numeric values for you.

Converting Date and Time Values in Microsoft Excel

Many programs can convert date and time values to numeric time duration values. In Microsoft Excel, if you subtract two date cells, the result is the fractional number of days between the two dates. You could then convert days to hours or some other unit to get the numeric value with reasonable numeric magnitudes. For example: =(A0-A$0)*24 repeated for all rows would subtract the first date in cell A0, and multiply the resulting day values into hours.

Another useful function is the YEARFRAC function which converts the difference between a date and a reference date to a fraction of years difference between them. For example: =YEARFRAC(A$0,A0) repeated for all rows returns the fractional value of years from the value in cell A0.

return to top ⇑

Labels and Categorical Variables

You can create columns containing up to 100 different labels (e.g., "beagle", "dachshund", "poodle"). In the special case where a column contains only "true"/"false" or "yes"/"no", those labels will be automatically converted into 1s and 0s. In all other cases, Eureqa will convert each label into a Boolean variable with values of 1 and 0. For example, take the following data set:

Year Breed
1976 dachshund
1977 beagle
1978 poodle

Eureqa treats the above as the following:

year breed_ dachshund breed_ beagle breed_ poodle
1976 1 0 0
1977 0 1 0
1978 0 0 1

Note that all of the values in a given column must be of the same type; labels and numeric values can't be mixed.

return to top ⇑

Text Data

If you enter text data, Eureqa will automatically convert it into numeric features that can be used for modeling. Each new feature will contain only the values 1 or 0 (true or false) which indicate whether or not a particular word appeared in the text. Eureqa attempts to automatically select only the words that are most likely to be meaningful for modeling. For example, words that appear in only a single record may be excluded.

As an example, your data might include a "Comments" column which contains free-form text. Eureqa will automatically identify words in the text column that will be treated as individual features. Take the following data set:

Date Rating Comments
2/1/2013 3 the food was great, service was very slow
2/1/2014 5 the cheesecake is the best!
2/3/2013 2 service was bad. we had to wait forever for our food
2/8/2013 4 there was a long wait but the salmon was worth it
2/8/2013 5 love the cheesecake!
2/8/2013 4 i ate the salmon and cheesecake - they were great!

Eureqa will treat the above as the following:

Date Rating Comments _cheesecake Comments _food Comments _great Comments _salmon Comments _service Comments _wait
2/1/2013 3 0 1 1 0 1 0
2/1/2014 5 1 0 0 0 0 0
2/3/2013 2 0 1 0 0 1 1
2/8/2013 4 0 0 0 1 0 1
2/8/2013 5 1 0 0 0 0 0
2/8/2013 4 1 0 1 1 0 0

After entering your data, you can look at the "Variables" list on the "Prepare Data" tab to see which features have been automatically added.

For more advanced text processing, we recommend performing preprocessing outside of Eureqa to develop the numeric features that you want, then loading the preprocessed data into Eureqa for modeling.

return to top ⇑

Variable Names

Use short names, preferably single letters, to keep the resulting formulae concise. For example, use F instead of Force. To use numeric subscripts, just append the number to the variable name. For non-numeric subscripts, append an underscore followed by the desired subscript. Examples: R2 will display as R2; C_3po will display as C3po.

return to top ⇑