python - Returning two values from pandas.rolling_apply -
i using pandas.rolling_apply
fit data distribution , value it, need report rolling goodness of fit (specifically, p-value). i'm doing this:
def func(sample): fit = genextreme.fit(sample) return genextreme.isf(0.9, *fit) def p_value(sample): fit = genextreme.fit(sample) return kstest(sample, 'genextreme', fit)[1] values = pd.rolling_apply(data, 30, func) p_values = pd.rolling_apply(data, 30, p_value) results = pd.dataframe({'values': values, 'p_value': p_values})
the problem have lot of data, , fit function expensive, don't want call twice every sample. i'd rather this:
def func(sample): fit = genextreme.fit(sample) value = genextreme.isf(0.9, *fit) p_value = kstest(sample, 'genextreme', fit)[1] return {'value': value, 'p_value': p_value} results = pd.rolling_apply(data, 30, func)
where results dataframe
2 columns. if try run this, exception: typeerror: float required
. possible achieve this, , if so, how?
i had same issue. solved generating global data frame , feeding rolling function. in following example script, generate random input data. then, calculate single rolling apply function min, max , mean.
import pandas pd import numpy np global outputdf global index def myfunction(array): global index global outputdf # random operation outputdf['min'][index] = np.nanmin(array) outputdf['max'][index] = np.nanmax(array) outputdf['mean'][index] = np.nanmean(array) index += 1 # returning useless variable return 0 if __name__ == "__main__": global outputdf global index # random window size windowsize = 10 # preparing random input data inputdf = pd.dataframe({ 'randomvalue': [np.nan] * 500 }) in range(len(inputdf)): inputdf['randomvalue'].values[i] = np.random.rand() # pre-allocate memory outputdf = pd.dataframe({ 'min': [np.nan] * len(inputdf), 'max': [np.nan] * len(inputdf), 'mean': [np.nan] * len(inputdf) }) # precise staring index (due window size) d = (windowsize - 1) / 2 index = np.int(np.floor( d ) ) # rolling apply here inputdf['randomvalue'].rolling(window=windowsize,center=true).apply(myfunction,args=()) assert index + np.int(np.ceil(d)) == len(inputdf), 'length mismatch' outputdf.set_index = inputdf.index # optional : clean nulls outputdf.dropna(inplace=true) print(outputdf)
Comments
Post a Comment