How to Get a Multivariate Density Estimation

Imagine the situation in which you have a set of data that is multivariate yet plots as a chunky, rigid distribution. This result is not ideal as it is hard to describe such a distribution mathematically. The solution to this problem is multivariate density estimation. This method allows you to use your blocky multivariate distribution to estimate a smooth version that likely corresponds to the true distribution behind the sampled data. In this method, you apply a specific type of function, called a kernel, to your original data, thereby yielding a new, smoother distribution that can describe your data.

Things You'll Need

  • Statistical software
Show More

Instructions

    • 1

      Put your data into your statistical software of choice. When you do this, you should clearly label each variate in your data, as multivariate data estimation requires marginalizing your data (i.e., you will need to break you data into multiple sets of univariate data). As long as you input your data in the form of a matrix, there will be no problems. For example, in the statistical software R, you may first put the data into an .csv Excel file, and then read in the data with the command “data <- read.csv(“data.csv”).

    • 2

      Decide which kernel you will apply to the data. The Gaussian kernel serves most practical purposes. However, most statistical software packages offer a variety of kernels for users with particular purposes. For example, R offers almost one dozen kernels, including triangular, rectangular and cosine. It is also possible to program your own kernel, provided you are familiar with how to program in your software package of choice. If in doubt as to which kernel to use, choose the Gaussian kernel.

    • 3

      Decide on the bandwidth for the density estimation. The bandwidth, in short, is the equivalent of the standard deviation for the smoothing process. There is no standard method of choosing a bandwidth for multivariate density estimation. Keep in mind that smaller bandwidths are less biased but lead to higher levels of variation, while larger bandwidths have less variation but are more biased. You may want to return to this step multiple times, experimenting with different bandwidths for your density estimation.

    • 4

      Perform the multivariate density estimation. Use the data, bandwidth and kernel you selected earlier. Most statistical software packages use a one-line call for this task, asking only for the parameters needed (data, bandwidth and kernel). In R, you call this function with “density (data, bandwidth, kernel).” The result (output) will be the multivariate density estimation.

Learnify Hub © www.0685.com All Rights Reserved