Examples
========

Minimal working example with options
------------------------------------

This *minimal working example* shows how to compute a KDE in one line of code.

.. plot::
   :include-source:

    from KDEpy import FFTKDE
    data = np.random.randn(2**6)

    # Notice how bw (standard deviation), kernel, weights and grid points are set
    x, y = FFTKDE(bw=1, kernel='gaussian').fit(data, weights=None).evaluate(2**8)

    plt.plot(x, y); plt.tight_layout()


Three kernels in 1D
-------------------

This example shows the effect of different kernel functions :math:`K`.

.. plot::
   :include-source:

    from KDEpy import FFTKDE
    data = np.exp(np.random.randn(2**6))  # Lognormal data

    for plt_num, kernel in enumerate(['box', 'biweight', 'gaussian'], 1):

        ax = fig.add_subplot(1, 3, plt_num)
        ax.set_title(f'Kernel: "{kernel}"')
        x, y = FFTKDE(kernel=kernel, bw='silverman').fit(data).evaluate()
        ax.plot(x, y)

    fig.tight_layout()


Weighted data
-------------

A *weight* :math:`w_i` may be associated with every data point :math:`x_i`.

.. plot::
   :include-source:

    from KDEpy import FFTKDE
    data = np.random.randn(2**6)  # Normal distribution
    weights = data**2  # Large weights away from zero

    x, y = FFTKDE(bw='silverman').fit(data, weights).evaluate()
    plt.plot(x, y, label=r'Weighted away from zero $w_i = x_i^2$')

    y2 = FFTKDE(bw='silverman').fit(data).evaluate(x)
    plt.plot(x, y2, label=r'Unweighted $w_i = 1/N$')

    plt.title('Weighted and unweighted KDE')
    plt.tight_layout(); plt.legend(loc='best');
    
    
Resampling from the distribution
--------------------------------

Resampling data from the fitted KDE is equivalent to (1) first resampling the
original data (with replacement), then (2) adding noise drawn from the same
probability density as the kernel function in the KDE. Here is an example:


.. plot::
   :include-source:

    from KDEpy import FFTKDE
    from KDEpy.bw_selection import silvermans_rule, improved_sheather_jones
    
    # Get the standard deviation of the kernel functions    
    data = np.array([3.1, 5.2, 6.9, 7.9, 8.5, 11.3, 11.5, 11.5, 11.5, 15.5])
    # Silverman assumes normality of data - use ISJ with much data instead
    kernel_std = silvermans_rule(data.reshape(-1, 1))  # Shape (obs, dims)
    
    # (1) First resample original data, then (2) add noise from kernel
    size = 50
    resampled_data = np.random.choice(data, size=size, replace=True)
    resampled_data = resampled_data + np.random.randn(size) * kernel_std
    
    # Plot the results
    plt.scatter(data, np.zeros_like(data), marker='|', label="Original data")
    plt.scatter(resampled_data, np.ones_like(resampled_data) * 0.01, 
                marker='|', label="Resampled from KDE")
    x, y = FFTKDE(kernel="gaussian", bw="silverman").fit(data).evaluate()
    plt.plot(x, y, label="FFTKDE with Silverman's rule")
    plt.title('Weighted and unweighted KDE')
    plt.tight_layout(); plt.legend(loc='upper left');
    

Multimodal distributions
------------------------

The *Improved Sheather Jones* (ISJ) algorithm for automatic bandwidth selection is implemented in KDEpy.
It does not assume normality, and is **robust to multimodal distributions**.
The disadvantage is that it requires more data to make accurate assessments, and that the running time is slower.

.. plot::
   :include-source:

    from scipy import stats
    from KDEpy import FFTKDE

    # Create a bimodal distribution from two Gaussians and draw data
    dist1 = stats.norm(loc=0, scale=1)
    dist2 = stats.norm(loc=20, scale=1)
    data = np.hstack([dist1.rvs(10**3), dist2.rvs(10**3)])

    # Plot the true distribution and KDE using Silverman's Rule
    x, y = FFTKDE(bw='silverman').fit(data)()
    plt.plot(x, (dist1.pdf(x) + dist2.pdf(x)) / 2, label='True distribution')
    plt.plot(x, y, label="FFTKDE with Silverman's rule")

    # KDE using ISJ - robust to multimodality, but needs more data
    y = FFTKDE(bw='ISJ').fit(data)(x)
    plt.plot(x, y, label="FFTKDE with Improved Sheather Jones (ISJ)")

    plt.title('Silverman vs. Improved Sheather Jones')
    plt.tight_layout(); plt.legend(loc='best');


Boundary correction using mirroring
-----------------------------------

If the domain is bounded (e.g. :math:`\mathbb{R}_+`) and you expect observations to fall near the boundary, a KDE might put density outside of the domain.
Mirroring the data about the boundary is an elementary way to reduce this unfortunate effect.
If :math:`\hat{g}(x)` is the original KDE, then :math:`\hat{g}_*(x)=\hat{g}(2a-x)` is the KDE obtained when mirroring the data about :math:`x=a`.
Note that at the boundary :math:`a`, the derivative of the final estimate :math:`\hat{f}(x)` is zero, since

.. math::

   \hat{f}'(a) = \hat{g}'(x) + \hat{g}_*'(x)  \bigr |_a  = \hat{g}'(x) - \hat{g}'(2a - x)  \bigr |_a = \hat{g}'(a) - \hat{g}'(a) = 0,

where the change of sign is due to the chain rule of calculus.
The reduction of boundary bias and the fact that the derivative is zero is demonstrated graphically in the example below.

.. plot::
   :include-source:

    from scipy import stats
    from KDEpy import FFTKDE

    # Beta distribution, where x=1 is a hard lower limit
    dist = stats.beta(a=1.05, b=3, loc=1, scale=10)

    data = dist.rvs(10**2)
    kde = FFTKDE(bw='silverman', kernel='triweight')
    x, y = kde.fit(data)(2**10)  # Two-step proceudure to get bw
    plt.plot(x, dist.pdf(x), label='True distribution')
    plt.plot(x, y, label='FFTKDE')
    plt.scatter(data, np.zeros_like(data), marker='|')

    # Mirror the data about the domain boundary
    low_bound = 1
    data = np.concatenate((data, 2 * low_bound - data))

    # Compute KDE using the bandwidth found, and twice as many grid points
    x, y = FFTKDE(bw=kde.bw, kernel='triweight').fit(data)(2**11)
    y[x<=low_bound] = 0  # Set the KDE to zero outside of the domain
    y = y * 2  # Double the y-values to get integral of ~1

    plt.plot(x, y, label='Mirrored FFTKDE')
    plt.title('Mirroring data to help overcome boundary bias')
    plt.tight_layout(); plt.legend();


Estimating density on the circle
--------------------------------

If the data is bounded on a circle and the domain is known, the data can be *repeated* instead of *reflected*.
The result of this is shown graphically below.
The derivative of :math:`\hat{f}(x)` at the lower and upper boundary will have the same value.

.. plot::
   :include-source:

    from scipy import stats
    from KDEpy import FFTKDE

    # The Von Mises distribution - normal distribution on a circle
    dist = stats.vonmises(kappa=0.5)
    data = dist.rvs(10**2)

    # Plot the normal KDE and the true density
    kde = FFTKDE(bw='silverman', kernel='triweight')
    x, y = kde.fit(data).evaluate()
    plt.plot(x, dist.pdf(x), label='True distribution')
    plt.plot(x, y, label='FFTKDE')
    plt.xlim([np.min(x), np.max(x)])

    # Repeat the data and fit a KDE to adjust for boundary effects
    a, b = (-np.pi, np.pi)
    data = np.concatenate((data - (b - a), data, data + (b - a)))
    x, y = FFTKDE(bw=kde.bw, kernel='biweight').fit(data).evaluate()
    y = y * 3  # Multiply by three since we tripled data observations

    plt.plot(x, y, label='Repeated FFTKDE')
    plt.plot([a, a], list(plt.ylim()), '--k', label='Domain lower bound')
    plt.plot([b, b], list(plt.ylim()), '--k', label='Domain upper bound')
    plt.tight_layout(); plt.legend();


The effect of norms in 2D
-------------------------

Below a non-smooth kernel is chosen to reveal the effect of the choice of norm more clearly.

.. plot::
   :include-source:

    from KDEpy import FFTKDE

    # Create 2D data of shape (obs, dims)
    data = np.random.randn(2**4, 2)

    grid_points = 2**7  # Grid points in each dimension
    N = 16  # Number of contours

    for plt_num, norm in enumerate([1, 2, np.inf], 1):

        ax = fig.add_subplot(1, 3, plt_num)
        ax.set_title(f'Norm $p={norm}$')

        # Compute the kernel density estimate
        kde = FFTKDE(kernel='box', norm=norm)
        grid, points = kde.fit(data).evaluate(grid_points)

        # The grid is of shape (obs, dims), points are of shape (obs, 1)
        x, y = np.unique(grid[:, 0]), np.unique(grid[:, 1])
        z = points.reshape(grid_points, grid_points).T

        # Plot the kernel density estimate
        ax.contour(x, y, z, N, linewidths=0.8, colors='k')
        ax.contourf(x, y, z, N, cmap="RdBu_r")
        ax.plot(data[:, 0], data[:, 1], 'ok', ms=3)

    plt.tight_layout()
    
   
One dimensional kernel regression
---------------------------------

One dimensional kernel regression seeks to find :math:`\hat{y} = \mathbb{E}[y | x]`.
This can be elegantly computed by first modeling the full distribution :math:`p(x, y)`.
We have that

.. math::

   \hat{y} = \mathbb{E}[y | x] = \int p(y | x) y \, dy = \sum_i p(y_i | x) y_i =  \sum_i \frac{p(y_i, x)}{\sum_j p(y_j, x)} y_i 

Modelling the distribution :math:`p(y | x)` only to infer :math:`\mathbb{E}[y | x]` is generally wasteful,
but the speed of the :class:`~KDEpy.FFTKDE.FFTKDE` implementation makes the approach tractable.
A million points should pose no problem.
Extensions to model the conditional variance :math:`\operatorname{var}[y | x]` are possible too.

.. plot::
   :include-source:

    from KDEpy import FFTKDE
    func = lambda x : np.sin(x * 2 * np.pi) + (x + 1)**2

    # Generate random data
    num_data_points = 2**6
    data_x = np.sort(np.random.rand(num_data_points))
    data_y = func(data_x) + np.random.randn(num_data_points) / 5
    
    # Plot the true function and the sampled values
    x_smooth = np.linspace(0, 1, num=2**10)
    plt.plot(x_smooth, func(x_smooth), label='True function')
    plt.scatter(data_x, data_y, label='Function samples', s=10)
    
    # Grid points in the x and y direction
    grid_points_x, grid_points_y = 2**10, 2**4
    
    # Stack the data for 2D input, compute the KDE
    data = np.vstack((data_x, data_y)).T
    kde = FFTKDE(bw=0.025).fit(data)
    grid, points = kde.evaluate((grid_points_x, grid_points_y))
    
    # Retrieve grid values, reshape output and plot boundaries
    x, y = np.unique(grid[:, 0]), np.unique(grid[:, 1])
    z = points.reshape(grid_points_x, grid_points_y)
    plt.axvline(np.min(data_x), ls='--', c='k', label='Domain lower bound')
    plt.axvline(np.max(data_x), ls='--', c='k', label='Domain upper bound')
    
    # Compute y_pred = E[y | x] = sum_y p(y | x) * y
    y_pred =  np.sum((z.T / np.sum(z, axis=1)).T  * y , axis=1) 
    plt.plot(x, y_pred, zorder=25, label='Kernel regression esimate')
    
    plt.legend(); plt.tight_layout()
    
    
Fast evaluation on a non-equidistant grid
-----------------------------------------

For plotting and in most computations, an equidistant grid is exactly what we want.
To evaluate the :class:`~KDEpy.FFTKDE.FFTKDE` on an arbitrary grid, we can make use of scipy.


.. plot::
   :include-source:

    from KDEpy import FFTKDE
    from scipy.interpolate import interp1d
    
    data = [-0.7, -0.2, -0.2, -0.0, 0.0, 0.1, 0.8, 1.1, 1.2, 1.4]
    x, y = FFTKDE(bw="silverman").fit(data).evaluate()
    
    # Use scipy to interplate and evaluate on arbitrary grid
    x_grid = np.array([-2.5, -2, -1, 0, 0.5, 1, 1.5, 1.75, 2, 2.25, 2.5])
    f = interp1d(x, y, kind="linear", assume_sorted=True)
    y_grid = f(x_grid)
    
    # Plot the resulting KDEs
    plt.scatter(data, np.zeros_like(data), marker='|', label="Data")
    plt.plot(x, y, label="KDE on equidistant grid")
    plt.plot(x_grid, y_grid, '-o', label="KDE on arbitrary grid")
    plt.title('KDE on an equidistant grid by interpolation')
    plt.tight_layout(); plt.legend(loc='upper left');

    
.. comment:
  Kernel regression via KDE
  -------------------------

  Here's how a weighted KDE can be used for 1D kernel regression.
  Beware of boundary effects--the estimate will fall to zero.


  .. plot::
     :include-source:

      from scipy.integrate import trapz
      from KDEpy import FFTKDE

      N = 2**6
      # Sampe the function on equidistant points
      x = np.linspace(0, 25, num=N)
      y = np.sin(x/3) + np.random.randn(N) / 6

      # Compute the area (integral), used to normalize later on
      area = trapz(y, x)

      plt.scatter(x, y, marker='x', alpha=0.5, label='Noisy samples')
      plt.plot(x, np.sin(x/3), label='True function')

      # Weight data by y-values, normalize using the area
      x, y = FFTKDE(bw=0.5).fit(x, weights=y).evaluate()
      plt.plot(x, y * area, label='Kernel regression')

      plt.title('Kernel regression via KDE')
      plt.tight_layout(); plt.legend(loc='best');