# "KDTree"(Data Structure)

"KDTree"

represents k-d tree binary spatial subdivision for a set of real number coordinates.

# Details  • A k-d tree is useful for subdividing a set of n coordinates in dimension so that searches, such as nearest neighbor searches, can be done quickly. Each node has two data elements, indexed by 0 and 1. The data element can be either another k-d tree node or a list {i1,i2,} of point indexes with 0<ik<=n. Each data element has a unique machine integer ID.
•  CreateDataStructure["KDTree",coordinates] create a new "KDTree" from coordinates that can all be represented by real numbers. CreateDataStructure["KDTree",coordinates, n] create a new "KDTree" from coordinates with at most n entries in each subdivistion. Typed[x,"KDTree"] give x the type "KDTree".
• For a data structure ds of type "KDTree", the following operations can be used:
•  ds["DataBounds"] return the overall bounds for the coordinates time: O(1) ds["DataCoordinates"] return the coordinates time: O(1) ds["Graphics"] show the spatial subdivision for coordinates in 2D time: O(n) ds["ID"] return the ID for ds time: O(1) ds["ID", b] return the ID for the b (0 or 1) data element of ds time: O(1) ds["Indexes",b] return the indexes for the coordinates enclosed in the box represented by the b (0 or 1) leaf data element of ds time: O(1) ds["LeafQ", b] return True if the b (0 or 1) data element of ds is a leaf time: O(1) ds["Node",b] return the "KDTree" for the the b (0 or 1) node data element of ds time: O(1) ds["SplitDimension"] return the dimension for the spatial split of ds time: O(1) ds["SplitPosition"] return the position for the spatial split of ds time: O(1) ds["SubtreeIndexes",sd] return all the indexes for the coordinates in the subtree represented by ds time: O(n) ds["SubtreeLength"] find the number of coordinates bound by the subtree represented by ds time: O(n) ds["TreeSize"] return the number of data elements time: O(1) ds["Visualization"] return a visualization of ds time: O(n)

# Examples

open allclose all

## Basic Examples(1)

A new "KDTree" object can be created with CreateDataStructure:

Show graphics for the 2D data:

Test if either of the 0 and 1 data elements are leaves:

Get the points included in the 1-data element leaf and highlight them with red in the graphics:

Get the k-d tree represented by the 0-data element:

The graphics for this colors pink the part of the coordinates not included in the subtree:

Visualize the tree structure of the k-d tree:

Get the split dimension and position for the k-d tree:

Get the coordinate indexes for the 1-element data of the k-d tree.

Test that the coordinates are all above the split position in the split dimension:

Get all coordinate indexes for the 1-element subtree k-d tree and check that they are all below the split position:

## Scope(3)

### Information(1)

A new "KDTree" can be created with CreateDataStructure:

Information about the data structure ds:

### Leaf Size(1)

The subdivision continues until there are at most a given number of coordinates in each box that can be specified as an additional argument in CreateDataStructure.

Limit the number of coordinates to at most 5:

The default is chosen to give a good balance between construction and traversal cost and operation cost on the leaves:

### Arbitrary Precision(1)

If you give the coordinates with more than MachinePrecision digits, then the computations will be done in the appropriate precision:

## Applications(2)

### Nearest Neighbor Search(1)

Set up a program to use a "KDTree" data structure to find the element in a set of coordinates nearest to a given point:

At each node, process the side that the point is in or closest to first:

For a leaf data element, test all the points, and for a node data element, recurse:

Test the function on a set of one million coordinates in three dimensions:

Compare to the result returned by Nearest:

Compare the timing to just computing the Norm of all of the points:

The construction of the k-d tree takes more time than a single scan, but is worth the extra time if several searches are desired:

### Range Search(1)

Set up a program to use a "KDTree" object to find all elements in a set of coordinates within given ranges:

At each node, process each side that fits in the range for the node split dimension:

For a leaf data element, test all the points for inclusion, and for a node data element, recurse:

Here is a dataset of selected properties for all countries for which those properties are known:

Get corresponding numerical data and create a "KDTree" data structure from it:

Find the countries with a population under a million people and GDP greater than 10 billion \$:

Find the countries with population over 10 million people and area under 100000 km2: