Kernels
=======

Available kernels
-----------------

You are likely reading this because you wonder which kernel functions :math:`K` are available.
Every available kernel is shown in the figure below.
Kernels with finite support (synonymous with *bounded support* in these docs) are annotated with an **F**.
A listing of every available option is found in ``NaiveKDE._available_kernels.keys()``.

.. plot::
   :include-source:

   from KDEpy import NaiveKDE

   for name, func in NaiveKDE._available_kernels.items():
      x, y = NaiveKDE(kernel=name).fit([0]).evaluate()
      plt.plot(x, y, label=name + (' (F)' if func.finite_support else ''))

   plt.grid(True, ls='--', zorder=-15); plt.legend();

Kernel properties
-----------------

The kernels implemented in KDEpy obey some common properties.

* Normalization: :math:`\int K(x) \, dx = 1`.
* Unit variance: :math:`\operatorname{Var}[K(x)] = 1` when the bandwidth :math:`h` is 1. The bandwith of the each kernel equals its standard deviation.
* Symmetry: :math:`K(-x) = K(x)` for every :math:`x`.

Kernels are radial basis functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Symmetry follows from a more general property, namely the fact that every kernel implemented is a radial basis function.
A *radial basis function* is a function which evaluates to the same value whenever the distance from the origin is the same.
In other words, it is the composition of a norm :math:`\left\| \cdot \right\| _p: \mathbb{R}^d \to \mathbb{R}_+` and a function :math:`\kappa: \mathbb{R}_+ \to \mathbb{R}_+`.

.. math::

   K(x) = \kappa \left( \left\| x \right\| _p \right)

.. note::

   If you have high dimensional data with different scales, consider standardizing the data before feeding it to a KDE.

Kernels may have finite support, or not
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A given kernel may or may not have finite support.
A kernel with finite (or *bounded*) support is defined on a domain such as :math:`[-1, 1]`,
while a kernel without finite support is defined on :math:`(-\infty, \infty)`.

Below we plot the *Guassian kernel* and the *Epanechnikov kernel*.

* The Gaussian kernel does not have finite support.
* The Epanechnikov kernel has finite support.

The reason why kernels are normalized to unit variance is so that bounded and non-bounded
kernel functions are more easily compared.
In other words, setting ``bw=1`` in the software ensures that the standard deviation :math:`\sigma = 1`.

.. plot::
   :include-source:

   from KDEpy import *

   x, y1 = NaiveKDE(kernel='gaussian', bw=1).fit([0]).evaluate()
   y2 = NaiveKDE(kernel='epa', bw=1).fit([0]).evaluate(x)
   plt.plot(x, y1, label='Gaussian kernel')
   plt.plot(x, y2, label='Epanechnikov kernel')
   plt.grid(True, ls='--', zorder=-15); plt.legend();


Higher dimensional kernels
--------------------------

The one-dimensional example is deceptively simple, since in one dimension every
:math:`p`-norm is equivalent.
In higher dimensions, this is not true.
The general :math:`p`-norm is a measure of distance in :math:`\mathbb{R}^d`,
defined by

.. math::

   \left\| x \right\| _p := \bigg( \sum_{i=1} \left| x_i \right| ^p \bigg) ^{1/p}.

The three most common :math:`p`-norms are

* The Manhattan norm :math:`\left\| x \right\| _1 = \sum_{i} \left| x_i \right|`
* The Euclidean norm :math:`\left\| x \right\| _2 = \sqrt{x_1^2 + x_2^2 + \dots + x_d^2}`
* The max-norm :math:`\left\| x \right\| _\infty = \max_{i} \left| x_i \right|`

In higher dimensions, a norm must be chosen in addition to a kernel.
Let :math:`r := \left\| x \right\| _p` be a measure of distance (:math:`r` stands for radius here).
The normalization necessary to ensure :math:`\int \kappa ( \left\| x \right\| _p ) \, dx = 1` depends on the value of :math:`p`,
but symmetry is guaranteed since :math:`\left\| -x \right\| _p = \left\| x \right\| _p`.
The figure below shows the effect of choosing different norms with the same kernel.


.. plot::
   :include-source:

   from KDEpy.BaseKDE import BaseKDE
   from mpl_toolkits.mplot3d import Axes3D

   kernel = BaseKDE._available_kernels['tri']

   n = 64
   p = np.linspace(-3, 3, num=n)
   obs_x_dims = np.array(np.meshgrid(p, p)).T.reshape(-1, 2)

   ax = fig.add_subplot(1, 2, 1, projection='3d')
   z = kernel(obs_x_dims, norm=np.inf).reshape((n, n))
   surf = ax.plot_surface(*np.meshgrid(p, p), z)
   ax.set_title('Using the $\max$-norm')

   ax = fig.add_subplot(1, 2, 2, projection='3d')
   z = kernel(obs_x_dims, norm=2).reshape((n, n))
   surf = ax.plot_surface(*np.meshgrid(p, p), z)
   ax.set_title('Using the $2$-norm')


Kernel normalization
^^^^^^^^^^^^^^^^^^^^

Kernels in any dimension are normalized so that the integral is unity for any :math:`p`.
To explain how a high-dimensional kernel is normalized, we first examine
high dimensional volumes.

Let :math:`r := \left\| x \right\| _p` be the distance from the origin, as measured by some :math:`p`-norm.
The :math:`d`-dimensional volume :math:`V_d(r)` is proportional to :math:`r^d`.
We will now examine the unit :math:`d`-dimensional volume :math:`V_d := V_d(1)`.

We integrate over a :math:`d-1` dimensional surface :math:`S_{d-1}(r)` to obtain :math:`V_{d}` using

.. math::

   V_d = \int_0^1 S_{d-1}(r) \, dr.

Since a :math:`d-1` dimensional surface is proportional to :math:`r^{d-1}`,
we write it as :math:`S_{d-1}(r) = K_{d-1} r^{d-1}`,
where :math:`K_{d-1}` is a constant.
Pulling this out of the integral, we are left with

.. math::

   V_d = K_{d-1} \int_0^1 r^{d-1} \, dr = K_{d-1} / d.

From this we observe that :math:`K_{d-1} = V_d \cdot d`, a fact we'll use later.

What is the volume of a unit ball :math:`V_d` in the :math:`p` norm in :math:`d` dimensions?
Fortunately an analytical expression exists, it's given by

.. math::

   V_d(r=1; p) = 2^d \frac{\Gamma \left( 1 + \frac{1}{p} \right)^d}{\Gamma \left(1 + \frac{d}{p} \right)}.

For more information about this, see for instance the paper by Wang in :ref:`literature`,
where we have used :math:`K_{d-1} = V_d \cdot d`.
The equation above reduces to more well-known cases when :math:`p` takes common values, as shown in the table below.

.. table:: High dimensional volumes
   :widths: auto

   ==============  ==============  ================================================================
   :math:`p`       Name            Unit volume :math:`V_d`
   ==============  ==============  ================================================================
   :math:`1`       Cross-polytope  :math:`\frac{2^d}{d!}`
   :math:`2`       Hypersphere     :math:`\frac{\pi^{d/2}}{\Gamma\left ( \frac{d}{2} + 1 \right )}`
   :math:`\infty`  Hypercube       :math:`2^d`
   ==============  ==============  ================================================================


Example - normalization
^^^^^^^^^^^^^^^^^^^^^^^

.. note: https://en.wikipedia.org/wiki/N-sphere#Recurrences

We would like to normalize kernel functions in higher dimensions for any norm.
To accomplish this, we start with the equation for the volume of a :math:`d`-dimensional volume.
The equation is

.. math::

   V_d = K_{d-1} \int_0^1 r^{d-1} \, dr = V_{d} \cdot d \int_0^1 r^{d-1} \, dr.

The integral of the kernel :math:`\kappa: \mathbb{R}_+ \to \mathbb{R}_+` over the :math:`d`-dimensional space is then given by

.. math::

   V_{d} \cdot d \int_0^1 \kappa(r) \, r^{d-1} \, dr,

which we can compute.
For instance, the *linear kernel* :math:`\kappa(r) = (1-r)` is
normalized by

.. math::

   V_{d} \cdot d \int_0^1 \left ( 1 - r \right ) r^{d-1} \, dr = V_{d} \cdot d \left ( \frac{1}{d} - \frac{1}{d+1} \right )= V_d \left ( \frac{1}{d+1} \right ).

The *biweight kernel* :math:`\kappa(r) = \left ( 1 - r^2 \right )^2` is similarly normalized by

.. math::

   V_{d} \cdot d \int_0^1 \left ( 1 - r^2 \right )^2 r^{d-1} \, dr = V_d \left ( 1 - \frac{2d}{d+2} + \frac{d}{d+4} \right ) = V_d \left ( \frac{8}{(d+2)(d+4)} \right ).


Some 2D kernels
^^^^^^^^^^^^^^^

Let's see what the kernels look like in 2D when :math:`p=2`.

.. plot::
    :include-source:

    from KDEpy.BaseKDE import BaseKDE
    from mpl_toolkits.mplot3d import Axes3D

    n = 64
    p = np.linspace(-3, 3, num=n)
    obs_x_dims = np.array(np.meshgrid(p, p)).T.reshape(-1, 2)

    # fig = plt.figure() is already set, adjust the size
    fig.set_figwidth(7); fig.set_figheight(5);

    selected_kernels = ['box', 'tri', 'exponential', 'gaussian']
    for i, kernel_name in enumerate(selected_kernels, 1):

      kernel = BaseKDE._available_kernels[kernel_name]
      ax = fig.add_subplot(2, 2, i, projection='3d')
      z = kernel(obs_x_dims, norm=2).reshape((n, n))
      surf = ax.plot_surface(*np.meshgrid(p, p), z)
      ax.set_title(f"'{kernel_name}', $2$-norm")