Maths

Stats Fitting

Module implementing methods to find the best fit of statistical distributions to data.

pyhdtoolkit.maths.stats_fitting.best_fit_distribution(data: pd.Series | np.ndarray, bins: int = 200, ax: Axes = None) → tuple[st.rv_continuous, tuple[float, ...]][source]

Added in version 0.5.0.

Model data by finding the best fit candidate distribution among those in DISTRIBUTIONS. One can find an example use of this function in the gallery.

Parameters:

data (Union[pd.Series, np.ndarray]) -- A pandas.Series or numpy.ndarray with your distribution data.
bins (int) -- The number of bins to decompose your data in before fitting.
ax (matplotlib.axes.Axes, optional) -- The matplotlib.axes.Axes on which to plot the probability density function of the different fitted distributions. This should be provided as the axis on which the distribution data is plotted, as it will add to that plot. If not provided, no plotting will be done.

Returns:

tuple[st.rv_continuous, tuple[float, ]] -- A tuple containing the scipy.stats generator corresponding to the best fit to the data among the provided candidates, and the parameters for said generator to best fit the data.

Example

best_fit_func, best_fit_params = best_fit_distribution(data, 200, axis)

pyhdtoolkit.maths.stats_fitting.make_pdf(distribution: rv_continuous, params: tuple[float, ...], size: int = 25000) → Series[source]

Added in version 0.5.0.

Generates a pandas.Series for the distributions’s Probability Distribution Function. This Series will have axis values as index, and PDF values as column. One can find an example use of this function in the distribution fitting gallery.

Parameters:

distribution (scipy.stats.rv_continuous) -- The scipy.stats generator for the distribution to generate the PDF from.
params (tuple[float, ]) -- The parameters for this generator, as given back by the fit to data or as guessed from the user.
size (int) -- The number of points to evaluate the PDF on.

Returns:

pandas.Series -- A pandas.Series with the PDF as values, and the corresponding axis values as index.

Example

best_fit_func, best_fit_params = best_fit_distribution(data, 200, axis)
pdf = fitting.make_pdf(best_fit_func, best_fit_params)

pyhdtoolkit.maths.stats_fitting.set_distributions_dict(dist_dict: dict[rv_continuous, str]) → None[source]

Added in version 0.5.0.

Sets DISTRIBUTIONS as the provided dict. This allows the user to define the distributions to try and fit against the data. One can find an example use of this function in the distribution fitting gallery.

Warning

This function modifies the global DISTRIBUTIONS dict that is used by other functions in this module. It’s not the cleanest way to do things that you’ll ever see.

Parameters:: dist_dict (dict[st.rv_continuous, str]) -- dictionnary with the wanted distributions, in the same format as the DISTRIBUTIONS dict, aka with a scipy.stats generator object as key, and a string representation of their name as value.

Example

import scipy.stats as st

tested_dists = {st.chi: "Chi", st.expon: "Exponential", st.laplace: "Laplace"}
set_distributions_dict(tested_dists)

Utilities

Module with utility functions used throughout the nonconvex_phase_sync and stats_fitting modules.

pyhdtoolkit.maths.utils.get_magnitude(value: float) → int[source]

Added in version 0.8.2.

Returns the determined magnitude of the provided value. This corresponds to the power of 10 that would be necessary to reduce value to a \(X \cdot 10^{n}\) form. In this case, n is the result.

Parameters:: value (float) -- Value to determine the magnitude of.
Returns:: int -- The magnitude of the provided value, as an int.

Examples

get_magnitude(10)
# returns 1

get_magnitude(0.0311)
# returns -2

get_magnitude(1e-7)
# returns -7

pyhdtoolkit.maths.utils.get_scaled_values_and_magnitude_string(values_array: pd.DataFrame | np.ndarray, force_magnitude: float | None = None) → tuple[pd.DataFrame | np.ndarray, str][source]

Added in version 0.8.2.

Conveniently scales the provided values to the best determined magnitude, and returns the scaled values and the magnitude string to use in plots labels.

Parameters:

values_array (Union[pd.DataFrame, np.ndarray]) -- Vectorised structure containing the values to scale.
force_magnitude (float, optional) -- A specific magnitude value to use for the scaling, if desired.

Returns:

tuple[pandas.DataFrame | numpy.ndarray, str] -- A tuple of the scaled values (same type as the provided ones) and the string to use for the scale in plots labels and legends.

Example

import numpy as np

q = np.array([-330, 230, 430, -720, 750, -110, 410, -340, -950, -630])
get_scaled_values_and_magnitude_string(q)
# returns (array([-3.3,  2.3,  4.3, -7.2,  7.5, -1.1,  4.1, -3.4, -9.5, -6.3]), '{-2}')