View the Project on GitHub ITMO-NSS-team/FEDOT.Docs

The data preprocessing in FEDOT:

The data preparation for modelling implemented in following way in FEDOT. It is separated to transforming and preprocessing.

Transformer converts data formats, preprocessor - changes their “scale” through scaling/normalization, etc.


More complex operations with data are located in separate models. For example, data enrichment or dimensionality reduction can be done as:


The direct datamodel allow to enrich the model inputs with the raw data:


The data prerocessing model for the table data can be added to the repository as follows:

"pca_data_model": {
	  "meta": "dim_red_data_model",
	  "tags": ["linear"]

"dim_red_data_model": {
	  "tasks": "[TaskTypesEnum.classification, TaskTypesEnum.regression]",
	  "input_type": "[DataTypesEnum.table]",
	  "output_type": "[DataTypesEnum.table]",
	  "strategies": ["core.models.evaluation.data_evaluation", "DataModellingStrategy"],
	  "tags": ["without_preprocessing", "data_model"],
	  "description": "Implementations of the models for the feature preprocessing (dimensionality reduction, etc)"

The following python code can be used to create the chain with datamodel:

node_first = PrimaryNode('pca_data_model')
    node_second = PrimaryNode('lda')
    node_final = SecondaryNode('rf', nodes_from=[node_first, node_second])

    chain = Chain(node_final)

The data prerocessing model for the time series:


It can be created as follows:

node_trend = Primary_node('trend_data_model')
node_lstm_trend = Secondary_node('lstm', nodes_from=[node_trend])

node_residual = Primary_node('residual_data_model')
node_ridge_residual = Secondary_node('ridge', nodes_from=[node_residual])

node_final = Secondary_node('linear', nodes_from=[node_ridge_residual, node_lstm_trend])
chain = Chain(node_final)