Pre-Processing¶
- class src.lib.analysis.preprocessing.PreProcessing¶
Bases:
object
Pre processing is intended to make initial operations to fix or adequate the data which is received from the API, fo example, limiting the amount of data as eventually the series can have years of data, which is not necessary; or to define the the correct column for the closing price of the entry.
- define_closure()¶
Define the column for closure. This is necessary since depending on the source of data or on the configuration, there might be different columns for it. The new column is named “Close Final”.
- define_past_time()¶
Populates all the initial available data as Real data. This information is relevant to the dataset, as any data from prediction (future) will be tagged as Predict data.
- Parameters
None – No parameters are used by this method.
- Returns
Result is done directly to the
ohlc_dataset
dataframe, by adding a new column namedData Type
.- Return type
None
- extend_time_range(length: int)¶
Populates the Pandas dataframe with dates following the next day for predictions. The list skips weekends, however holidays are not taken into account, so not skipped.
- Parameters
length (int) – Length of the list of dates.
- Returns
Populated new index with dates on the
ohlc_dataset_prediction
dataframe, with a sequence of dates incrementing one by one. Weekends are skipped in the list.- Return type
None
- truncate_range(length: int = 0, shift_last: int = 0)¶
Limit the data to a proper length, always keeping the latest data available.
Note
The dataframe to be trim might be based on work-days only, so weekends are not included. Attention for such cases, so for example if a year of data is necessary, the input should be 250 instead of 360.
- Parameters
length (int) – Number of entries to be included in the analysis. Data outside this range is truncated.