Getting Started with NNS: Clustering and Regression

Fred Viole

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchal clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x=seq(-5,5,.05); y=x^3

NNS.part(x,y,Voronoi = T)

## $order
## [1] 6
## 
## $dt
##          x         y quadrant prior.quadrant
##   1:  4.60  97.33600    q1113           q111
##   2:  4.15  71.47338     q113            q11
##   3:  3.80  54.87200  q114143         q11414
##   4:  3.55  44.73888  q114413         q11441
##   5:  3.30  35.93700  q114443         q11444
##  ---                                        
## 197: -2.90 -24.38900    q4241           q424
## 198: -2.85 -23.14913    q4241           q424
## 199: -3.15 -31.25587    q4244           q424
## 200: -3.10 -29.79100    q4244           q424
## 201: -3.05 -28.37262    q4244           q424
## 
## $regression.points
##     quadrant      x            y
##  1:     q111  4.600   97.3360000
##  2:      q11  4.150   71.4733750
##  3:    q1411  2.300   12.1670000
##  4:     q141  2.050    8.6151250
##  5:    q1414  1.800    5.8320000
##  6:    q1441  1.050    1.1576250
##  7:    q4114 -1.000   -1.0000000
##  8:     q412 -1.400   -2.7860000
##  9:    q4141 -1.800   -5.8320000
## 10:     q414 -2.050   -8.6151250
## 11:     q421 -2.650  -18.6891250
## 12:     q424 -3.000  -27.0900000
## 13:    q4411 -3.400  -39.3040000
## 14:     q444 -4.600  -97.3360000
## 15:    q1344  2.625   18.1125000
## 16:    q1444  0.375    0.0534375
## 17:    q4111 -0.325   -0.0349375
## 18:      q44 -4.125  -70.1971875
## 19:     q131  3.025   27.7468125
## 20:     q134  2.800   21.9660000
## 21:     q143  1.425    2.9248125
## 22:     q144  0.750    0.4256250
## 23:     q411 -0.700   -0.3465000
## 24:   q11111  4.925  119.5051250
## 25:   q11114  4.725  105.5328750
## 26:   q11141  4.475   89.6566250
## 27:   q11144  4.275   78.1683750
## 28:   q11411  4.025   65.2452500
## 29:   q14111  2.425   14.2832500
## 30:   q14114  2.175   10.3095000
## 31:   q14141  1.925    7.1513750
## 32:   q14144  1.675    4.7151250
## 33:   q14411  1.175    1.6332500
## 34:   q14414  0.925    0.8001250
## 35:   q41141 -0.875   -0.6781250
## 36:   q41144 -1.125   -1.4343750
## 37:   q41411 -1.675   -4.7151250
## 38:   q41414 -1.925   -7.1513750
## 39:   q41441 -2.175  -10.3095000
## 40:   q41444 -2.375  -13.4187500
## 41:   q44111 -3.275  -35.1571250
## 42:   q44114 -3.525  -43.8333750
## 43:   q44144 -3.975  -62.8447500
## 44:   q44411 -4.275  -78.1683750
## 45:   q44414 -4.475  -89.6566250
## 46:   q44441 -4.725 -105.5328750
## 47:   q44444 -4.925 -119.5051250
## 48:   q11414  3.800   54.9290000
## 49:   q11441  3.550   44.7921250
## 50:   q11444  3.300   35.9865000
## 51:   q14441  0.550    0.1746250
## 52:   q41114 -0.500   -0.1325000
## 53:   q44141 -3.750  -52.7906250
## 54:   q14444  0.175    0.0091875
## 55:   q41111 -0.125   -0.0046875
##     quadrant      x            y

X-only Partitioning

NNS.part offers a partitioning based on \(x\) values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both \(x\) and \(y\) values.

NNS.part(x,y,Voronoi = T,type="XONLY",order=3)

## $order
## [1] 3
## 
## $dt
##          x         y quadrant prior.quadrant
##   1: -2.45 -14.70612     q121            q12
##   2: -2.40 -13.82400     q121            q12
##   3: -2.35 -12.97787     q121            q12
##   4: -2.30 -12.16700     q121            q12
##   5: -2.25 -11.39062     q121            q12
##  ---                                        
## 197: -2.70 -19.68300     q112            q11
## 198: -2.65 -18.60962     q112            q11
## 199: -2.60 -17.57600     q112            q11
## 200: -2.55 -16.58137     q112            q11
## 201: -2.50 -15.62500     q112            q11
## 
## $regression.points
##    quadrant      x          y
## 1:      q12 -1.225  -3.751562
## 2:      q21  1.275   4.064063
## 3:      q22  3.775  59.692188
## 4:      q11 -3.750 -58.828125

Clusters Used in Regression

for(i in 1:3){NNS.part(x,y,order=i,Voronoi = T);NNS.reg(x,y,order=i)}

NNS Regression NNS.reg

NNS.reg can fit any \(f(x)\), for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

Univariate:

NNS.reg(x,y,order=4,noise.reduction = 'off')

## $R2
## [1] 0.9998899
## 
## $MSE
## [1] 6.291015e-05
## 
## $Prediction.Accuracy
## [1] 0.02985075
## 
## $equation
## NULL
## 
## $derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:    67.09000        -5.000        -4.600
##  2:    58.87750        -4.600        -4.125
##  3:    43.66125        -4.125        -3.625
##  4:    34.04250        -3.625        -3.000
##  5:    24.00250        -3.000        -2.650
##  6:    15.96250        -2.650        -2.025
##  7:     9.48250        -2.025        -1.400
##  8:     2.92000        -1.400        -0.600
##  9:     0.78250        -0.600         0.650
## 10:     3.09250         0.650         1.425
## 11:     9.84250         1.425         2.050
## 12:    16.44250         2.050         2.700
## 13:    24.56250         2.700         3.025
## 14:    34.72250         3.025         3.650
## 15:    44.05000         3.650         4.150
## 16:    59.31250         4.150         4.600
## 17:    67.09000         4.600         5.000
## 
## $Point
## NULL
## 
## $Point.est
## numeric(0)
## 
## $regression.points
##          x           y
##  1: -5.000 -125.000000
##  2: -4.600  -98.164000
##  3: -4.125  -70.197187
##  4: -3.625  -48.366563
##  5: -3.000  -27.090000
##  6: -2.650  -18.689125
##  7: -2.025   -8.712562
##  8: -1.400   -2.786000
##  9: -0.600   -0.450000
## 10:  0.650    0.528125
## 11:  1.425    2.924813
## 12:  2.050    9.076375
## 13:  2.700   19.764000
## 14:  3.025   27.746813
## 15:  3.650   49.448375
## 16:  4.150   71.473375
## 17:  4.600   98.164000
## 18:  5.000  125.000000
## 
## $partition
##               y NNS.ID
##   1:  71.473375  q1134
##   2: -71.473375  q4424
##   3: -68.921000  q4421
##   4:  24.389000  q1314
##   5:  25.672375  q1314
##  ---                  
## 197:  -0.008000  q4111
## 198:  -0.003375  q4111
## 199:  -0.001000  q4111
## 200:  -0.000125  q4111
## 201:   0.000000  q4111
## 
## $Fitted
##          y.hat
##   1: -125.0000
##   2: -121.6455
##   3: -118.2910
##   4: -114.9365
##   5: -111.5820
##  ---          
## 197:  111.5820
## 198:  114.9365
## 199:  118.2910
## 200:  121.6455
## 201:  125.0000
## 
## $Fitted.xy
##          x         y     y.hat
##   1: -5.00 -125.0000 -125.0000
##   2: -4.95 -121.2874 -121.6455
##   3: -4.90 -117.6490 -118.2910
##   4: -4.85 -114.0841 -114.9365
##   5: -4.80 -110.5920 -111.5820
##  ---                          
## 197:  4.80  110.5920  111.5820
## 198:  4.85  114.0841  114.9365
## 199:  4.90  117.6490  118.2910
## 200:  4.95  121.2874  121.6455
## 201:  5.00  125.0000  125.0000

Multivariate:

f= function(x,y) x^3+3*y-y^3-3*x
y=x; z=expand.grid(x,y)
g=f(z[,1],z[,2])
NNS.reg(z,g,order='max')

Inter/Extrapolation

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x,y,point.est=...) paramter permits any sized data of similar dimensions to \(x\) and called specifically with $Point.est.

NNS.reg(iris[,1:4],iris[,5],point.est=iris[1:10,1:4])$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x,y,type="CLASS"). Reducing all regressors to a single dimension using the returned equation $equation.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS")$equation

## [1] "Synthetic Independent Variable X* = (0.9644*X1  0.6702*X2  1.0000*X3  1.0000*X4)/4"

Threshold

NNS.reg(x,y,type="CLASS",threshold=...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS",threshold=.35)$equation

## [1] "Synthetic Independent Variable X* = (0.9644*X1  0.6702*X2  1.0000*X3  1.0000*X4)/4"

and the point.est=(...) operates in the same manner as the full regression above, again called with $Point.est.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS",threshold=.35,point.est=iris[1:10,1:4])$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

References

If the user is so motivated, detailed arguments further examples are provided within the following:

*Nonlinear Nonparametric Statistics: Using Partial Moments

*Deriving Nonlinear Correlation Coefficients from Partial Moments

*New Nonparametric Curve-Fitting Using Partitioning, Regression and Partial Derivative Estimation

*Clustering and Curve Fitting by Line Segments

*Classification Using NNS Clustering Analysis