Maths
Stats Fitting
Module implementing methods to find the best fit of statistical distributions to data.
- pyhdtoolkit.maths.stats_fitting.best_fit_distribution(data: pd.Series | np.ndarray, bins: int = 200, ax: Axes = None) tuple[st.rv_continuous, tuple[float, ...]] [source]
Added in version 0.5.0.
Model data by finding the best fit candidate distribution among those in
DISTRIBUTIONS
. One can find an example use of this function in the gallery.- Parameters:
data (
Union[pd.Series
,np.ndarray]
) -- Apandas.Series
ornumpy.ndarray
with your distribution data.bins (
int
) -- The number of bins to decompose your data in before fitting.ax (
matplotlib.axes.Axes
, optional) -- Thematplotlib.axes.Axes
on which to plot the probability density function of the different fitted distributions. This should be provided as the axis on which the distribution data is plotted, as it will add to that plot. If not provided, no plotting will be done.
- Returns:
tuple[st.rv_continuous
,tuple[float
,]]
-- Atuple
containing thescipy.stats
generator corresponding to the best fit to the data among the provided candidates, and the parameters for said generator to best fit the data.
Example
best_fit_func, best_fit_params = best_fit_distribution(data, 200, axis)
- pyhdtoolkit.maths.stats_fitting.make_pdf(distribution: rv_continuous, params: tuple[float, ...], size: int = 25000) Series [source]
Added in version 0.5.0.
Generates a
pandas.Series
for the distributions’s Probability Distribution Function. This Series will have axis values as index, and PDF values as column. One can find an example use of this function in the distribution fitting gallery.- Parameters:
distribution (
scipy.stats.rv_continuous
) -- Thescipy.stats
generator for the distribution to generate the PDF from.params (
tuple[float
,]
) -- The parameters for this generator, as given back by the fit to data or as guessed from the user.size (
int
) -- The number of points to evaluate the PDF on.
- Returns:
pandas.Series
-- Apandas.Series
with the PDF as values, and the corresponding axis values as index.
Example
best_fit_func, best_fit_params = best_fit_distribution(data, 200, axis) pdf = fitting.make_pdf(best_fit_func, best_fit_params)
- pyhdtoolkit.maths.stats_fitting.set_distributions_dict(dist_dict: dict[rv_continuous, str]) None [source]
Added in version 0.5.0.
Sets
DISTRIBUTIONS
as the provideddict
. This allows the user to define the distributions to try and fit against the data. One can find an example use of this function in the distribution fitting gallery.Warning
This function modifies the global
DISTRIBUTIONS
dict
that is used by other functions in this module. It’s not the cleanest way to do things that you’ll ever see.- Parameters:
dist_dict (
dict[st.rv_continuous
,str]
) -- dictionnary with the wanted distributions, in the same format as theDISTRIBUTIONS
dict, aka with ascipy.stats
generator object as key, and a string representation of their name as value.
Example
import scipy.stats as st tested_dists = {st.chi: "Chi", st.expon: "Exponential", st.laplace: "Laplace"} set_distributions_dict(tested_dists)
Utilities
Module with utility functions used throughout the nonconvex_phase_sync
and stats_fitting
modules.
- pyhdtoolkit.maths.utils.get_magnitude(value: float) int [source]
Added in version 0.8.2.
Returns the determined magnitude of the provided value. This corresponds to the power of 10 that would be necessary to reduce value to a \(X \cdot 10^{n}\) form. In this case, n is the result.
- Parameters:
value (
float
) -- Value to determine the magnitude of.- Returns:
Examples
get_magnitude(10) # returns 1
get_magnitude(0.0311) # returns -2
get_magnitude(1e-7) # returns -7
- pyhdtoolkit.maths.utils.get_scaled_values_and_magnitude_string(values_array: pd.DataFrame | np.ndarray, force_magnitude: float | None = None) tuple[pd.DataFrame | np.ndarray, str] [source]
Added in version 0.8.2.
Conveniently scales the provided values to the best determined magnitude, and returns the scaled values and the magnitude string to use in plots labels.
- Parameters:
values_array (
Union[pd.DataFrame
,np.ndarray]
) -- Vectorised structure containing the values to scale.force_magnitude (
float
, optional) -- A specific magnitude value to use for the scaling, if desired.
- Returns:
tuple[pandas.DataFrame | numpy.ndarray
,str]
-- Atuple
of the scaled values (same type as the provided ones) and the string to use for the scale in plots labels and legends.