How to find the best equation for multiple data sets, each with different coefficients
• Hi Nutonian,

I have several sets of data (each for a different chemical in this case), and each data set should follow the same general equation, but the parameters (coefficients) of that equation will be different for each data set. I know how to find an equation to fit one data set at a time, but it will probably not be the best to fit all the other data sets. How can I find the equation that best fits all of the data sets at once? In other words, how can I have Eureqa calculate unique coefficients for each set of data, but keep the equation the same for all of them?

Right now, my data looks like this for three data sets. I have x, y data, and a boolean value to indicate which data set it belongs to.

x    y    y_a    y_b    y_c
0.05    1.043294245    1
0.1    0.783138041    1
0.2    0.534640298    1
0.3    0.396971682    1
0.4    0.291981867    1
0.5    0.187840845    1
0.6    0.071907744    1
0.7    -0.072687936    1
0.8    -0.246701431    1
0.9    -0.429899026    1

0.1    1.742416897        1
0.2    1.225023399        1
0.3    0.774354946        1
0.4    0.4700594        1
0.5    0.201441899        1
0.6    -0.029454979        1
0.7    -0.232227604        1
0.8    -0.416372794        1
0.9    -0.522109401        1

0.3    2.177073517            1
0.4    1.322148221            1
0.5    0.641768141            1
0.6    0.006031965            1
0.7    -0.674747927            1

I am using the following target expression to give it up to three coefficients, but each coefficient is so complicated that Eureqa does not want to put in any coefficients at all:

y = f0(if(y_a = 1, f1(), if(y_b = 1, f2(), f3())), if(y_a = 1, f4(), if(y_b = 1, f5(), f6())), if(y_a = 1, f7(), if(y_b = 1, f8(), f9())), x)

I've tried using Eureqa several times in the past, but I can never get over this problem so I've never ended up being able to use it. Is there a solution or is Eureqa just limited to one data set at a time?

Thanks!
 Tweet
• Paul,

I created the attached Eureqa file using the data from your post.  The file has three searches.  Eureqa can be a bit path dependent and multiple searches on the same data can often find a good model for the data faster than allowing a single search to run a long time.

As an example, I'm using the solution with complexity size 29 from the third search.  It's not necessarily the best solution for the data, but it has a good form to demonstrate the approach I've taken.

Y = 0.917468710697514 + 3.07996667338533*C_c + 1.25458020363483*C_b + 2.47607506274962*C_b*X^2 - 1.45750336064287*C_not_b*X - 5.2164248002651*C_not_a*X

The variables, C_a, C_b, C_c, C_not_a, C_not_b and C_not_c, are classification variables that have a value of 0 or 1 depending on which chemical the corresponding X, Y variables represent.

I've broken the solution down for each chemical as follows:

Chemical a:

Y = 0.917468711 - 1.457503361*X

Chemical b:

Y = 2.172048915 - 5.216424800*X + 2.476075063*X^2  (Y = 0.917468711 + 1.254580204 + 2.476075063*X^2 - 5.216424800*X)

Chemical c:

Y = 3.997435384 - 6.673928161*X  (Y = 0.917468711 + 3.079966673 - 1.457503361*X - 5.216424800*X)

Even if this approach isn't a solution to your problem, it may give you some ideas on how to develop your own solutions.

------------------------

The forum software won't allow me to upload the Eureqa file.

I've uploaded a text file that has your data in tab delimited form that can be copied and pasted into Eureqa.  You can see from looking at the file how multiple classification variables can be placed into a single column.

If you like, I could email the Eureqa file to you.
• Thanks CharlesWT! That's an interesting approach - the problem is that different terms in the equation appear/disappear depending on the chemical. I need each chemical to have an equation of precisely the same form, just with different coefficients, and I can't figure out how to make Eureqa find that equation for me. If I just search one chemical at a time, I get an equation that fits the data for that chemical really well, but not necessary all the other ones.

I have a total of 1026 chemicals right now, so it would be really nice to be able to throw all the data into Eureqa (or at least 30 of the most representative ones) and have it find the equation that best fits all of them at once (again, each will have different coefficients).

It seems like Eureqa treats constants just like any other building block, but I would like Eureqa to treat coefficients as dataset-specific values. For example, if I have two data sets that are best fit by two lines, I am now required to run Eureqa on each dataset separately, returning two separate equations (e.g., y=2.34*x+3.43 and y=1.123*x+0.003). But I would rather have it examine both datasets at once and return y=m*x+b. I really don't care what the coefficients are for each dataset - I can easily solve for those using Eureqa or other software later.

This seems like the sort of problem that many other people would have. I am surprised that there is no easy way to do it.

Thanks again!

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!