1.5. Data Fusion¶
1.5.1. Introduction¶
This example shows how to use pSeven Core to train an approximation model using two data sets. One of them is considered to be precise, the other is coarse data.
Start by importing the Generic Tool for Data Fusion (GTDF) module:
from da.p7core import gtdf
1.5.2. Model Training¶
Generate the example samples:
import numpy as np
def getTrainData(sample_size, dim, noisy):
x = np.random.uniform(low=0, high=1, size=sample_size * dim).reshape((sample_size, dim))
f= np.sum(x*x, axis=1).reshape((sample_size, 1))
if noisy:
noise = np.random.normal(loc=0.0, scale=1.0, size=sample_size).reshape((sample_size, 1))
f += noise
return x, f
dim = 3
sample_size_hf = 10
sample_size_lf = sample_size_hf * 10
# prepare the high fidelity sample
x_hf, f_hf = getTrainData(sample_size_hf, dim, False)
# prepare the low fidelity sample
x_lf, f_lf = getTrainData(sample_size_lf, dim, True)
Create a gtdf.Builder
instance and train the model:
from da.p7core import gtdf
from da.p7core import loggers
# create the model builder
builder = gtdf.Builder()
# set logger; StreamLogger outputs to sys.stdout by default
builder.set_logger(loggers.StreamLogger())
# train the model
model = builder.build(x_hf, f_hf, x_lf, f_lf, options={'GTDF/loglevel': 'Info'})
1.5.3. Using the Model¶
Print a readable model summary:
print(model)
Save the model and reload it from disk:
model.save('model.gtdf')
loaded_model = gtdf.Model('model.gtdf')
Validate the model by checking its errors on a est sample with reference data:
# calculate errors
def calc_mean_error(model, x_sample, f_sample):
errors = f_sample - model.calc(x_sample)
return np.std(errors)
sample_size_test = 1000
x_hf, f_hf = getTrainData(sample_size_test, dim, False)
x_lf, f_lf = getTrainData(sample_size_test, dim, True)
print('-'*80)
print('RMS errors:')
print('HF RMS error: %.15g' % calc_mean_error(model, x_hf, f_hf))
print('LF RMS error: %.15g' % calc_mean_error(model, x_lf, f_lf))
1.5.4. Full Example Code¶
import numpy as np
from da.p7core import gtdf
from da.p7core import loggers
def getTrainData(sample_size, dim, noisy):
x = np.random.uniform(low=0, high=1, size=sample_size * dim).reshape((sample_size, dim))
f = np.sum(x*x, axis=1).reshape((sample_size, 1))
if noisy:
noise = np.random.normal(loc=0.0, scale=1.0, size=sample_size).reshape((sample_size, 1))
f += noise
return x, f
dim = 3
sample_size_hf = 10
sample_size_lf = sample_size_hf * 10
x_hf, f_hf = getTrainData(sample_size_hf, dim, False)
x_lf, f_lf = getTrainData(sample_size_lf, dim, True)
builder = gtdf.Builder()
builder.set_logger(loggers.StreamLogger())
model = builder.build(x_hf, f_hf, x_lf, f_lf, options={'GTDF/loglevel': 'Info'})
print(model)
model.save('model.gtdf')
loaded_model = gtdf.Model('model.gtdf')
def calc_mean_error(model, x_sample, f_sample):
errors = f_sample-model.calc(x_sample)
return np.std(errors)
sample_size_test = 1000
x_hf, f_hf = getTrainData(sample_size_test, dim, False)
x_lf, f_lf = getTrainData(sample_size_test, dim, True)
print('-'*80)
print('RMS errors:')
print('HF RMS error: %.15g' % calc_mean_error(model, x_hf, f_hf))
print('LF RMS error: %.15g' % calc_mean_error(model, x_lf, f_lf))