Pre-Processing

class src.lib.analysis.preprocessing.PreProcessing

Bases: object

Pre processing is intended to make initial operations to fix or adequate the data which is received from the API, fo example, limiting the amount of data as eventually the series can have years of data, which is not necessary; or to define the the correct column for the closing price of the entry.

define_closure()

Define the column for closure. This is necessary since depending on the source of data or on the configuration, there might be different columns for it. The new column is named “Close Final”.

define_past_time()

Populates all the initial available data as Real data. This information is relevant to the dataset, as any data from prediction (future) will be tagged as Predict data.

Parameters

None – No parameters are used by this method.

Returns

Result is done directly to the ohlc_dataset dataframe, by adding a new column named Data Type.

Return type

None

extend_time_range(length: int)

Populates the Pandas dataframe with dates following the next day for predictions. The list skips weekends, however holidays are not taken into account, so not skipped.

Parameters

length (int) – Length of the list of dates.

Returns

Populated new index with dates on the ohlc_dataset_prediction dataframe, with a sequence of dates incrementing one by one. Weekends are skipped in the list.

Return type

None

truncate_range(length: int = 0, shift_last: int = 0)

Limit the data to a proper length, always keeping the latest data available.

Note

The dataframe to be trim might be based on work-days only, so weekends are not included. Attention for such cases, so for example if a year of data is necessary, the input should be 250 instead of 360.

Parameters

length (int) – Number of entries to be included in the analysis. Data outside this range is truncated.