Getting Started with NNS: Clustering and Regression

Fred Viole

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchal clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x=seq(-5,5,.05); y=x^3

NNS.part(x,y,Voronoi = T)

## $order
## [1] 6
## 
## $dt
##          x         y quadrant prior.quadrant
##   1:  4.95  121.2874  q111111         q11111
##   2:  5.00  125.0000  q111111         q11111
##   3:  4.85  114.0841  q111114         q11111
##   4:  4.90  117.6490  q111114         q11111
##   5:  4.75  107.1719  q111141         q11114
##  ---                                        
## 197: -4.75 -107.1719  q444414         q44441
## 198: -4.90 -117.6490  q444441         q44444
## 199: -4.85 -114.0841  q444441         q44444
## 200: -5.00 -125.0000  q444444         q44444
## 201: -4.95 -121.2874  q444444         q44444
## 
## $regression.points
##     quadrant      x            y
##  1:      q11  4.150   71.4733750
##  2:   q11111  4.925  119.5051250
##  3:   q11114  4.725  105.5328750
##  4:   q11141  4.525   92.6946250
##  5:   q11144  4.300   79.5715000
##  6:   q11411  4.025   65.2452500
##  7:   q11414  3.800   54.9290000
##  8:   q11441  3.550   44.7921250
##  9:   q11444  3.300   35.9865000
## 10:     q131  3.025   27.7468125
## 11:     q134  2.800   21.9660000
## 12:    q1344  2.625   18.1125000
## 13:     q141  2.050    8.6151250
## 14:    q1411  2.300   12.1670000
## 15:   q14111  2.425   14.2832500
## 16:   q14114  2.175   10.3095000
## 17:    q1414  1.800    5.8320000
## 18:   q14141  1.925    7.1513750
## 19:   q14144  1.675    4.7151250
## 20:     q143  1.425    2.9248125
## 21:    q1441  1.050    1.1576250
## 22:   q14411  1.175    1.6332500
## 23:   q14414  0.925    0.8001250
## 24:    q1443  0.725    0.3878750
## 25:    q1444  0.350    0.0428750
## 26:   q14441  0.500    0.1325000
## 27:   q14444  0.175    0.0091875
## 28:     q411 -0.700   -0.3465000
## 29:    q4111 -0.325   -0.0349375
## 30:   q41111 -0.125   -0.0046875
## 31:   q41114 -0.500   -0.1325000
## 32:    q4114 -1.000   -1.0000000
## 33:   q41141 -0.875   -0.6781250
## 34:   q41144 -1.125   -1.4343750
## 35:     q412 -1.400   -2.7860000
## 36:     q414 -2.050   -8.6151250
## 37:    q4141 -1.800   -5.8320000
## 38:   q41411 -1.675   -4.7151250
## 39:   q41414 -1.925   -7.1513750
## 40:   q41441 -2.175  -10.3095000
## 41:   q41444 -2.375  -13.4187500
## 42:     q421 -2.650  -18.6891250
## 43:     q424 -3.000  -27.0900000
## 44:      q44 -4.125  -70.1971875
## 45:    q4411 -3.400  -39.3040000
## 46:   q44111 -3.275  -35.1571250
## 47:   q44114 -3.525  -43.8333750
## 48:    q4414 -3.850  -57.0666250
## 49:   q44141 -3.725  -51.7216250
## 50:   q44144 -3.975  -62.8447500
## 51:    q4441 -4.400  -85.1840000
## 52:   q44411 -4.275  -78.1683750
## 53:   q44414 -4.525  -92.6946250
## 54:   q44441 -4.725 -105.5328750
## 55:   q44444 -4.925 -119.5051250
##     quadrant      x            y

X-only Partitioning

NNS.part offers a partitioning based on \(x\) values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both \(x\) and \(y\) values.

NNS.part(x,y,Voronoi = T,type="XONLY",order=3)

## $order
## [1] 3
## 
## $dt
##          x         y quadrant prior.quadrant
##   1: -5.00 -125.0000     q111            q11
##   2: -4.95 -121.2874     q111            q11
##   3: -4.90 -117.6490     q111            q11
##   4: -4.85 -114.0841     q111            q11
##   5: -4.80 -110.5920     q111            q11
##  ---                                        
## 197:  4.80  110.5920     q222            q22
## 198:  4.85  114.0841     q222            q22
## 199:  4.90  117.6490     q222            q22
## 200:  4.95  121.2874     q222            q22
## 201:  5.00  125.0000     q222            q22
## 
## $regression.points
##    quadrant      x          y
## 1:      q11 -3.750 -58.828125
## 2:      q12 -1.225  -3.751562
## 3:      q21  1.275   4.064063
## 4:      q22  3.775  59.692188

Clusters Used in Regression

for(i in 1:3){NNS.part(x,y,order=i,Voronoi = T);NNS.reg(x,y,order=i)}

NNS Regression NNS.reg

NNS.reg can fit any \(f(x)\), for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

Univariate:

NNS.reg(x,y,order=4,noise.reduction = 'off')

## $R2
## [1] 0.9998899
## 
## $MSE
## [1] 6.291015e-05
## 
## $Prediction.Accuracy
## [1] 0.02985075
## 
## $equation
## NULL
## 
## $x.star
## NULL
## 
## $derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:    67.09000        -5.000        -4.600
##  2:    58.87750        -4.600        -4.125
##  3:    43.66125        -4.125        -3.625
##  4:    34.04250        -3.625        -3.000
##  5:    24.00250        -3.000        -2.650
##  6:    15.96250        -2.650        -2.025
##  7:     9.48250        -2.025        -1.400
##  8:     2.92000        -1.400        -0.600
##  9:     0.78250        -0.600         0.650
## 10:     3.09250         0.650         1.425
## 11:     9.84250         1.425         2.050
## 12:    16.44250         2.050         2.700
## 13:    24.56250         2.700         3.025
## 14:    34.72250         3.025         3.650
## 15:    44.05000         3.650         4.150
## 16:    59.31250         4.150         4.600
## 17:    67.09000         4.600         5.000
## 
## $Point
## NULL
## 
## $Point.est
## numeric(0)
## 
## $regression.points
##          x           y
##  1: -5.000 -125.000000
##  2: -4.600  -98.164000
##  3: -4.125  -70.197187
##  4: -3.625  -48.366563
##  5: -3.000  -27.090000
##  6: -2.650  -18.689125
##  7: -2.025   -8.712562
##  8: -1.400   -2.786000
##  9: -0.600   -0.450000
## 10:  0.650    0.528125
## 11:  1.425    2.924813
## 12:  2.050    9.076375
## 13:  2.700   19.764000
## 14:  3.025   27.746813
## 15:  3.650   49.448375
## 16:  4.150   71.473375
## 17:  4.600   98.164000
## 18:  5.000  125.000000
## 
## $Fitted
##          y.hat
##   1: -125.0000
##   2: -121.6455
##   3: -118.2910
##   4: -114.9365
##   5: -111.5820
##  ---          
## 197:  111.5820
## 198:  114.9365
## 199:  118.2910
## 200:  121.6455
## 201:  125.0000
## 
## $Fitted.xy
##          x         y     y.hat NNS.ID
##   1: -5.00 -125.0000 -125.0000  q4444
##   2: -4.95 -121.2874 -121.6455  q4444
##   3: -4.90 -117.6490 -118.2910  q4444
##   4: -4.85 -114.0841 -114.9365  q4444
##   5: -4.80 -110.5920 -111.5820  q4444
##  ---                                 
## 197:  4.80  110.5920  111.5820  q1111
## 198:  4.85  114.0841  114.9365  q1111
## 199:  4.90  117.6490  118.2910  q1111
## 200:  4.95  121.2874  121.6455  q1111
## 201:  5.00  125.0000  125.0000  q1111

Multivariate:

f= function(x,y) x^3+3*y-y^3-3*x
y=x; z=expand.grid(x,y)
g=f(z[,1],z[,2])
NNS.reg(z,g,order='max')

Inter/Extrapolation

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x,y,point.est=...) paramter permits any sized data of similar dimensions to \(x\) and called specifically with $Point.est.

For a classification problem, we simply set NNS.reg(x,y,type="CLASS",...)

NNS.reg(iris[,1:4],iris[,5],point.est=iris[1:10,1:4],type="CLASS")$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x,y,dim.red.method="cor",...). Reducing all regressors to a single dimension using the returned equation $equation.

NNS.reg(iris[,1:4],iris[,5],dim.red.method="cor")$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7825612
## 2:  Sepal.Width  -0.4266576
## 3: Petal.Length   0.9490347
## 4:  Petal.Width   0.9565473

Threshold

NNS.reg(x,y,dim.red.method="cor",threshold=...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[,1:4],iris[,5],dim.red.method="cor",threshold=.75)$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7825612
## 2:  Sepal.Width   0.0000000
## 3: Petal.Length   0.9490347
## 4:  Petal.Width   0.9565473

and the point.est=(...) operates in the same manner as the full regression above, again called with $Point.est.

NNS.reg(iris[,1:4],iris[,5],dim.red.method="cor",threshold=.75,point.est=iris[1:10,1:4])$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

References

If the user is so motivated, detailed arguments further examples are provided within the following:

*Nonlinear Nonparametric Statistics: Using Partial Moments

*Deriving Nonlinear Correlation Coefficients from Partial Moments

*New Nonparametric Curve-Fitting Using Partitioning, Regression and Partial Derivative Estimation

*Clustering and Curve Fitting by Line Segments

*Classification Using NNS Clustering Analysis