find_nn finds the nearest neighbor for a given vector. Neighbors can be found via pearson correlation, cosine similarity, Euclidean distance, or Manhattan distance.

find_nn(vec, neighbors, n = 5, method = "cor")

Arguments

vec

The vector for which neighbors are to be found. This is either a vector saved as object or a row of the neighbors data frame or matrix. If a row of a data frame or matrix is specified, this row is excluded as competitor for the nearest neighbor (as a row will be closest to itself).

neighbors

The potential neighbors, usually given as data frame or matrix.

n

The number of neighbors wanted as output. Defaults to 5.

method

The measure used to determine the nearest neighbors. Either "cor", "cosim", "euclid" (default), or "manhat".

Author

D. Schmitz

Examples


### a vector saved as object & a matrix of neighbors

vector <- runif(50, 0, 10)
data("gdsm_mat")

find_nn(vec = vector, neighbors = gdsm_mat, 3, "cor")
#>         cor nearest_neighbor
#> 1 0.2821214            var13
#> 2 0.2774480            var19
#> 3 0.2180090            var02

find_nn(vec = vector, neighbors = gdsm_mat, 3, "cosim")
#>       cosim nearest_neighbor
#> 1 0.8371108            var13
#> 2 0.8370161            var19
#> 3 0.8199995            var02

find_nn(vec = vector, neighbors = gdsm_mat, 3, "euclid")
#>     euclid nearest_neighbor
#> 1 23.54271            var19
#> 2 23.68078            var02
#> 3 23.91194            var13

find_nn(vec = vector, neighbors = gdsm_mat, 3, "manhat")
#>     manhat nearest_neighbor
#> 1 136.4595            var12
#> 2 139.0598            var13
#> 3 139.4693            var02


### a vector specified by its name & its matrix of neighbors

data("gdsm_mat")

find_nn(vec = "var12", neighbors = gdsm_mat, 3, "cor")
#>         cor nearest_neighbor
#> 1 0.1653992            var13
#> 2 0.1547184            var02
#> 3 0.1090967            var09

find_nn(vec = "var12", neighbors = gdsm_mat, 3, "cosim")
#>       cosim nearest_neighbor
#> 1 0.8564067            var13
#> 2 0.8549425            var14
#> 3 0.8508315            var02

find_nn(vec = "var12", neighbors = gdsm_mat, 3, "euclid")
#>     euclid nearest_neighbor
#> 1 21.77402            var02
#> 2 22.56095            var13
#> 3 23.22168            var09

find_nn(vec = "var12", neighbors = gdsm_mat, 3, "manhat")
#>     manhat nearest_neighbor
#> 1 125.0962            var13
#> 2 125.4922            var02
#> 3 132.2247            var14