Data Analysis

Data analysis and decision taking module.

class src.analysis.Analysis(symbol: str, ohlc_data: pandas.core.frame.DataFrame, analysis_length: int, initial_value: float, stopgain: float, stoploss: float, operation_cost: float, tax_percentage: float, logger_name: str, display_analysis: bool = False, save_analysis: bool = False)

Bases: src.lib.analysis.methods.crash.Crash, src.lib.analysis.methods.macd.MACD, src.lib.analysis.methods.rsi_sma.RSI_SMA, src.lib.analysis.methods.rsi_ema.RSI_EMA, src.lib.analysis.methods.bollinger_band.BOLLINGER_BANDS, src.lib.analysis.methods.combined.CombinedStrategy, src.lib.analysis.preprocessing.PreProcessing

Data analysis class.

symbol

A string with the acronym of the symbol / ticker to be used.

Type

string

ohlc_data

A Pandas dataframe with the OHLC data to be used on the analysis.

Type

Pandas dataframe

decision

An integere which holds the final outcome of the analysis. The value is enumerated as:

  • BUY = 1

  • SELL = -1

  • HOLD = 0

Type

int

analysis_length_pre

Number of samples to be used for the analysis. This number is the one applied on the initial steps of the analysis, when truncating the dataset. Afterwards the truncated dataset is immediatelly used for the methods and predictions. For the methods themselves, this number shouldn’t be too high, since using older data doesn’t bring performance improvement to them. However, for the neural netoworks, a higher value can bring benefits since it means a larger dataset for learning. The downside on using al the available data is that it may take a considerable time to adjust the neural network if analysis_length_pre is too high.

Type

int

analysis_length_post

Number of samples to be used for simulation and comparison. The analysis themselves use the analysis_length_pre parameter. This parameter is applies for final comparison.

Type

int

sequence_length

Number of samples to be used as sequence for input to the RNN / LSTM.

Type

int

prediction_length

Number of samples to be used as sequence for output in the RNN / LSTM.

Type

int

logger_name

Name of the logger.

Type

string

display_analysis

Boolean indicating if after the analysis a chart with the results should be displayed or not. The chart will be display for true.

Type

bool

save_analysis

Boolean indicating if after displaying the chart with the results, it should be saved or not. The chart will be saved for true.

Type

bool

analyze()

Performs the complete analysis of the data for a determined symbol / ticker.

The basic operation of this method is:

  1. Pre-Process: Execute operations for adequating the data for analysis.

  2. Parameters calculation: Calculate basic parameters from the signals which are necessary for follow-up methods and strategies.

  3. Apply Strategies: Run the strategies defined. Each one is run individually from each other. The strategy themselves have inclusive prediction techniques.

  4. Arbitrate: Combined the results from all the different strategies into a final outcome.

Parameters

None – This method will use attributes from them class Analysis and no parameter is explicitly passed to it.

Returns

  • analysis_results (dictionary) – Summary from the results from the ticker.

  • ohlc_dataset (Pandas Dataframe) – Complete dataframe from the analysis from ticker.

calc_parameters()

Calculates basic parameters from the time series. The focus of this method are general parameters which can be used to support defining the better strategy when combining results.

Parameters calculated:

Parameters

Parameter

Description

Up movement

Sum of all the positive changes in consecutive entries in the dataframe for the defined column.

Down movement

Sum of all the negative changes in consecutive entries in the dataframe for the defined column. The final value is absolute.

Ratio up/down

Ratio between Up movement and Down movement.