Data Analysis¶
Data analysis and decision taking module.
- class src.analysis.Analysis(symbol: str, ohlc_data: pandas.core.frame.DataFrame, analysis_length: int, initial_value: float, stopgain: float, stoploss: float, operation_cost: float, tax_percentage: float, logger_name: str, display_analysis: bool = False, save_analysis: bool = False)¶
Bases:
src.lib.analysis.methods.crash.Crash
,src.lib.analysis.methods.macd.MACD
,src.lib.analysis.methods.rsi_sma.RSI_SMA
,src.lib.analysis.methods.rsi_ema.RSI_EMA
,src.lib.analysis.methods.bollinger_band.BOLLINGER_BANDS
,src.lib.analysis.methods.combined.CombinedStrategy
,src.lib.analysis.preprocessing.PreProcessing
Data analysis class.
- symbol¶
A string with the acronym of the symbol / ticker to be used.
- Type
string
- ohlc_data¶
A Pandas dataframe with the OHLC data to be used on the analysis.
- Type
Pandas dataframe
- decision¶
An integere which holds the final outcome of the analysis. The value is enumerated as:
BUY
= 1SELL
= -1HOLD
= 0
- Type
int
- analysis_length_pre¶
Number of samples to be used for the analysis. This number is the one applied on the initial steps of the analysis, when truncating the dataset. Afterwards the truncated dataset is immediatelly used for the methods and predictions. For the methods themselves, this number shouldn’t be too high, since using older data doesn’t bring performance improvement to them. However, for the neural netoworks, a higher value can bring benefits since it means a larger dataset for learning. The downside on using al the available data is that it may take a considerable time to adjust the neural network if
analysis_length_pre
is too high.- Type
int
- analysis_length_post¶
Number of samples to be used for simulation and comparison. The analysis themselves use the
analysis_length_pre
parameter. This parameter is applies for final comparison.- Type
int
- sequence_length¶
Number of samples to be used as sequence for input to the RNN / LSTM.
- Type
int
- prediction_length¶
Number of samples to be used as sequence for output in the RNN / LSTM.
- Type
int
- logger_name¶
Name of the logger.
- Type
string
- display_analysis¶
Boolean indicating if after the analysis a chart with the results should be displayed or not. The chart will be display for true.
- Type
bool
- save_analysis¶
Boolean indicating if after displaying the chart with the results, it should be saved or not. The chart will be saved for true.
- Type
bool
- analyze()¶
Performs the complete analysis of the data for a determined symbol / ticker.
The basic operation of this method is:
Pre-Process: Execute operations for adequating the data for analysis.
Parameters calculation: Calculate basic parameters from the signals which are necessary for follow-up methods and strategies.
Apply Strategies: Run the strategies defined. Each one is run individually from each other. The strategy themselves have inclusive prediction techniques.
Arbitrate: Combined the results from all the different strategies into a final outcome.
- Parameters
None – This method will use attributes from them class
Analysis
and no parameter is explicitly passed to it.- Returns
analysis_results (dictionary) – Summary from the results from the ticker.
ohlc_dataset (Pandas Dataframe) – Complete dataframe from the analysis from ticker.
- calc_parameters()¶
Calculates basic parameters from the time series. The focus of this method are general parameters which can be used to support defining the better strategy when combining results.
Parameters calculated:
Parameters¶ Parameter
Description
Up movement
Sum of all the positive changes in consecutive entries in the dataframe for the defined column.
Down movement
Sum of all the negative changes in consecutive entries in the dataframe for the defined column. The final value is absolute.
Ratio up/down
Ratio between Up movement and Down movement.