How to Estimate Nonparametric Kernel Density

Statistical analyses often hope to generalize small, individual datasets to large, general populations. When performing analyses, researchers often deal with rigid, small sets of data. However, it is often desirable to “smooth” this data, changing a rigid histogram into a curve. Nonparametric, kernel density estimation does exactly this. This function is nonparametric in that it does not assume there is a fixed structure associated with the data. In this process, you will apply a kernel (a function that changes your original dataset into a new dataset) to your dataset to estimate a smoother dataset.

Things You'll Need

  • Statistical software
Show More

Instructions

    • 1

      Input the data into your statistical software. Nonparametric, kernel density estimation can only be applied to one-dimensional data, so if your data is multidimensional, you will need to perform the density estimation one variable at a time.

    • 2

      Choose the kernel to be used. Most statistical programs offer a wide choice of kernels. For example, the statistical program, R, offers Gaussian, triangular, rectangular, Epanechnikov, biweight, cosine and optcosine kernels. In general, rectangular and triangular kernels tend to be less smooth, whereas the Gaussian and Epanechnikov kernels tend to lead to smoother distributions. The most common kernel choice is the Gaussian kernel, but you can experiment with multiple kernels.

    • 3

      Choose the bandwidth for the estimation. The bandwidth acts as the standard deviation of the smoothing kernel and affects the shape of the final estimated distribution. Bandwidth choice is a complex, heavily debated topic in statistics and there is no single way for choosing an appropriate bandwidth. It is best to experiment with many bandwidths, observing the resulting distribution. In general, there is a tradeoff between variance and bias for different size bandwidths. Choosing a larger bandwidth decreases the variance while increasing the bias; choosing the smaller bandwidth increases the variance while decreasing the bias.

    • 4

      Run the nonparametric, kernel density estimation function. Statistical software will ask you to input the data, kernel and bandwidth upon calling this function. For example in R, the command is “density(data, bandwidth, kernel).”

    • 5

      Plot the results. Plotting the output of the density function will allow you to see how the density estimation changed the shape of the data. If performed correctly, it should appear as a smoother version of your original data.

Learnify Hub © www.0685.com All Rights Reserved