Regression and KNN –
Picking parameters –
This cannot be solved directly for in closed form for finding the positions of the various centers and the widths of the RBF basis functions and can use another o criteria to choose them and also we optimize parameters for the squared-error, then we will end up with one basis center at each data point, and with tiny width that exactly fit the data. This is a problem where model not give good predictions for inputs other than training set dataset.
The following points are commonly used to determine these parameters without overfitting the training data. To pick the basis centers:
- Place the centers uniformly spaced in the region containing the data. This is quite simple, but can lead to empty regions with basic functions, and will have an impractical number of data points in higher-dimensional input spaces.
- Place one center at each data point. This is used more common because it limits need of the number of centers, also it can also be expensive if the number of data points is large.
- Cluster the data, and use one center for each cluster. We will cover clustering methods later in the course.
To pick the width parameter:
- Can try manually to get different values of the width and will pick the best by trial-and-error.
- Use the average squared distances (or median distances) to neighboring centers, scaled by a constant, to be the width. This lead to use different widths for different basic functions, and it allows the basic functions to be spaced non-uniformly.
Overfitting and Regularization
Directly minimizing squared-error can lead to an effect called overfitting, wherein we fit the training data extremely well (i.e., with low error), yet we obtain a model that produces low accuracy on future test data and the test inputs is differ from the training data Understanding Overfitting in different ways, all of which are variations on the same underlying pathology:
- This problem statement is insufficiently constrained: if we consider ten measurements and ten model parameters, usually to obtain a perfect fit to the data.
- Fitting noise: overfitting can happen when the model is more powerful that it fits the data and also, the random noise in the data.
- Discarding uncertainty: the posterior probability distribution of the unknown’s parameter is insufficiently peaked to pick a single estimate.
We have two solutions to overfitting problem, first one to have prior knowledge and second one to handle uncertainty. In the most of cases, there is some sort of prior knowledge that we should have. The common assumption is the underlying function is likely to be smooth, for example, having small derivatives. Reason to have a smoothness lead to reduces model complexity: it is simple to estimate smooth models with using small datasets. In the odd cases, if we have no prior assumptions about this nature of the fit then it is not possible to learn and generalize at all and to learn from small datasets smoothness assumptions is the way that will constrain the space of models To add smoothness, parameterize the model in a smooth but this limits the expressiveness of the model. When we have large dataset, data will be able to ‘overrule’ assumptions of the smoothness and not possible to get curved models with large width no matter what will be the data Instead, we have another term regularization and we can add it: It is an extra term to the learning objective function that prefers smooth models.
Many learning procedures when our prior knowledge is weak so amount to smoothing the training data. RBF fitting is an example of this. However, many of these fitting procedures require making a number of decision and the locations of the basic functions, and may be sensitive to these choices. Question arises: why wea re not cutting middleman, and smooth the data directly? This is the idea behind K-Nearest Neighbors regression. The idea is simple. Select a parameter K and This is the only parameter to the algorithm. Then, for a new input x, we find the K nearest neighbors to x in the training set, based on their Euclidean distance ||x−xi||2. Then, our new output y is simply an average of the training outputs.
Call the Trainer and Book your free demo Class
Call now!!! | SevenMentor Pvt Ltd.
© Copyright 2021 | Sevenmentor Pvt Ltd.