VideoObjectTracking

✖
`VideoObjectTracking`

New in 14.2[Experimental]

✖

VideoObjectTracking[video]

detects objects of interest in video and tracks them over video frames.

✖

VideoObjectTracking[objects]

corresponds to and tracks objects, assuming they are from video frames.

✖

VideoObjectTracking[…detector]

uses detector to find objects of interest in the input.

Details and Options

VideoObjectTracking, also known as visual object tracking, tracks unique objects in frames of a video, trying to handle occlusions if possible. Tracked objects are also known as tracklets.
Tracking could automatically detect objects in frames or be performed on a precomputed set of objects.
The result, returned as an ObjectTrackingData object, includes times, labels and various other properties for each tracklet.
Possible settings for objects and their corresponding outputs are:

	{{pos₁₁,pos₁₂,…},…}	tracking points as kpos_ij
	{{bbox₁₁,bbox₁₂,…},…}	tracking boxes as kbbox_ij
	{label₁{bbox₁₁,bbox₁₂,…},…,…}	tracking boxes as {label_i,j}bbox
	{lmat₁,…}	relabeling segments in label matrices lmat_i
	{t₁obj₁,…}	a list of times and objects

By default, objects are detected using ImageBoundingBoxes. Possible settings for detector include:

	f	a detector function that returns supported objects
	"concept"	named concept, as used in "Concept" entities
	"word"	English word, as used in WordData
	wordspec	word sense specification, as used in WordData
	Entity[…]	any appropriate entity
	category₁\|category₂\|…	any of the category_i

Using VideoObjectTracking[{image₁,image₂,…}] is similar to tracking objects across frames of a video.
The following options can be given:
Method Automatic tracking method to use

TargetDevice Automatic the target device on which to perform detection
The possible values for the Method option are:

	"OCSort"	observation-centric SORT (simple, online, real-time) tracking; predicts object trajectories using Kalman estimators
	"RunningBuffer"	offline method, associates objects by comparing a buffer of frames

When tracking label matrices, occlusions are not handled. They can be tracked with Method"RunningBuffer".
With Method->{"OCSort",subopt}, the following suboptions can be specified:

"IOUThreshold"	0.2	intersection over union threshold between bounding boxes
"OcclusionThreshold"	8	number of frames for which history of a tracklet is maintained before expiration
"ORUHistory"	3	length of tracklet history to step back for tracklet re-update
"OCMWeight"	0.2	observation-centric motion weight that accounts for the directionality of moving bounding boxes

With Method->{"RunningBuffer",subopt}, the following suboptions can be specified:

	"MaxCentroidDistance"	Automatic	maximum distance between the centroids for adjacent frames
	"OcclusionThreshold"	8	number of frames for which the history of a tracklet is maintained before expiration

Additional "RunningBuffer" suboptions to specify the contribution to the cost matrix are:

"CentroidWeight"	0.5	centroid distance between components or bounding boxes
"OverlapWeight"	1	overlap of components or bounding boxes
"SizeWeight"	Automatic	size of components or bounding boxes

Examples

open allclose all

Basic Examples (2)Summary of the most common use cases

Detect and track objects in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-gfpxjl

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-vs97nw

Out[2]=2

Detect and track faces in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-jdw8dc

Out[1]=1

Extract the first frame from each sub-video:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-riszdq

Out[2]=2

Scope (7)Survey of the scope of standard use cases

Data (5)

Detect and track objects in a video:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-lul4mu

Out[2]=2

Detect and track objects in a list of images:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-6ozmzb

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-32a8pt

Out[2]=2

Track a list of bounding boxes:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-h18er8

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-ghf6bn

Out[2]=2

Track a list of points:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-2p5wr1

Out[1]=1

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-o1x1rf

Out[2]=2

Track components in a time series of label matrices:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-5rokd6

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-76lyw6

Out[2]=2

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-mh4myi

Out[3]=3

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-u2dyhs

Out[4]=4

Detectors (2)

Automatically detect objects and track them:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-sctuj4

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-u2hl83

Out[2]=2

Specify a detector function to find objects:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-nvd231

Out[3]=3

Specify the category of object to detect and track:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-off8mb

Out[4]=4

Detect and track faces in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-oyt6pm

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-tia2d

Out[2]=2

Options (5)Common values & functionality for each option

Method (4)

"OCSort" (3)

In OCSort, motion is predicted using Kalman estimators. Higher values for "OCMWeight" increase the cost when boxes move away from the predicted positions.

Set up a problem with two sets of moving boxes:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-bng4tg

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-baobxr

Out[2]=2

By default, direction of trajectories will be very flexible. Notice that in the region of large intersection between blue and red boxes, tracking may change direction suddenly:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-ncxhws

Out[3]=3

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-r5is6b

Out[4]=4

Increasing the "WeightOCM" decreases chances of sudden direction changes:

In[5]:=5

✖

https://wolfram.com/xid/0v59uv074crae-1f1svd

Out[5]=5

In[6]:=6

✖

https://wolfram.com/xid/0v59uv074crae-jgy8go

Out[6]=6

The "IOUThreshold" suboption specifies a threshold for intersection over union between boxes in order to consider them as potentially the same object.

Set up a problem with a set of moving bounding boxes with a gap:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-qx20cy

Out[1]=1

A higher threshold for intersection splits the object trajectory into two parts:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-nx8576

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-dp4lmp

Out[3]=3

A lower "IOUThreshold" merges the trajectories:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-bsk5om

In[5]:=5

✖

https://wolfram.com/xid/0v59uv074crae-ns6w50

Out[5]=5

The "OcclusionThreshold" suboption deals with objects that disappear for some time (due to poor detection or occlusion).

Set up a problem with a moving bounding box and remove a couple of frames from the trajectory:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-e1cn5y

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-wyn8lo

Out[2]=2

Without an occlusion threshold, the object is not re-associated to the tracklet once it re-emerges:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-b62dn3

Out[3]=3

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-j7x2qw

Out[4]=4

With a higher occlusion threshold defined, the trajectory is linked back:

In[5]:=5

✖

https://wolfram.com/xid/0v59uv074crae-bgpjqn

Out[5]=5

In[6]:=6

✖

https://wolfram.com/xid/0v59uv074crae-gup1ky

Out[6]=6

"RunningBuffer" (1)

The "RunningBuffer" method can typically better track objects whose trajectories have a jump due to occlusion or fast movement:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-fv3sfz

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-vuk3zk

The "OCSort" method results in different instances of the same hummingbird:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-66jw7

Out[3]=3

"RunningBuffer" links the trajectories together to track the bird as one:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-f6y0up

Out[4]=4

TargetDevice (1)

By default, if no detection function is specified, the detection is performed on CPU:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-c9jm1a

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-8oyab0

Out[2]=2

Set the TargetDevice option to "GPU" to perform the detection on GPU:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-jv3fo8

Out[3]=3

Applications (12)Sample problems that can be solved with this function

Basic Uses (2)

Detect and track objects in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-o64t3i

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-txgpnr

Out[2]=2

Highlight objects on the video; notice all are labeled with their detected classes:

In[32]:=32

✖

https://wolfram.com/xid/0v59uv074crae-k6x2ff

Out[32]=32

Track the detected objects:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-3258k1

Out[3]=3

Highlight tracked detected objects with their corresponding indices:

In[35]:=35

✖

https://wolfram.com/xid/0v59uv074crae-ii3xnd

Out[35]=35

Track labeled components from matrices:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-0t0kt4

Define a segmentation function that works on each frame:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-rp3ltc

Segment video frames and show components:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-vn4okf

Out[3]=3

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-pdknvm

Out[4]=4

Track the components across frames and show tracked components:

In[5]:=5

✖

https://wolfram.com/xid/0v59uv074crae-c93ugi

In[6]:=6

✖

https://wolfram.com/xid/0v59uv074crae-5jmysq

Out[6]=6

Count Objects (3)

Count the number of detected objects in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-ipucjx

Track objects and find unique instances:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-jlqiwm

Out[2]=2

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-rutq61

Out[3]=3

Get the final counts:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-smtdkc

Out[4]=4

Count occurrences of a specific object:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-d5sfdu

Track objects and find unique instances:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-s2yc9

Out[2]=2

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-zepsoc

Out[3]=3

Get the final counts:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-bqiib1

Out[4]=4

Count the number of elephants in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-bz03eo

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-nx9rnb

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-g2b5xp

Out[3]=3

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-kqfr1x

Out[4]=4

Extract Tracked Objects (1)

Detect and track the contents of a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-3uc5fu

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-refli6

Extract the first of the detected labels:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-80zu60

Out[4]=4

Extract the sub-video corresponding to the first tracked object:

In[5]:=5

✖

https://wolfram.com/xid/0v59uv074crae-lri0ne

Out[5]=5

Visualize Motion Trajectories (1)

Track pedestrians in a railway station:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-cfwxde

Detect the bounding boxes and show them over the original video:

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-cqo96e

Out[3]=3

Track the boxes:

In[9]:=9

✖

https://wolfram.com/xid/0v59uv074crae-92esa

Out[10]=10

Plot the trajectories of the centroids of the boxes:

In[12]:=12

✖

https://wolfram.com/xid/0v59uv074crae-m91wsl

Out[12]=12

Overlay the trajectories onto the original video:

In[26]:=26

✖

https://wolfram.com/xid/0v59uv074crae-0pg3sb

Out[27]=27

Analyze Wildlife Videos (3)

Track a herd of migrating elephants:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-8piho

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-dgswnc

Highlight frames with the tracked elephants:

In[6]:=6

✖

https://wolfram.com/xid/0v59uv074crae-ho2oli

Out[6]=6

Track a herd of galloping horses:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-h66pee

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-msca2k

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-en61mk

Out[3]=3

Track a flock of sheep entering a barn:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-ipcxa8

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-h6vn0e

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-bmza8z

Out[3]=3

Analyze Human Videos (2)

Estimate age from the face of each person in a video:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-ly6lx

Out[1]=1

Detect and track faces:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-bwij5l

Out[3]=3

Find tracked faces with longest duration in the video:

In[10]:=10

✖

https://wolfram.com/xid/0v59uv074crae-4kx9cp

Out[10]=10

Construct a timeseries of selected labels:

In[26]:=26

✖

https://wolfram.com/xid/0v59uv074crae-4suku2

Out[26]=26

Compute estimated age for each tracked face:

In[27]:=27

✖

https://wolfram.com/xid/0v59uv074crae-pfrra4

Compute median estimated age for each face:

In[28]:=28

✖

https://wolfram.com/xid/0v59uv074crae-brzmvq

Out[28]=28

Track women dancing on the stage:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-cww4wd

In[2]:=2

✖

https://wolfram.com/xid/0v59uv074crae-eqo62e

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-jlz5z5

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-c1nmab

Extract video of one of the dancers:

In[5]:=5

✖

https://wolfram.com/xid/0v59uv074crae-mboul8

Out[5]=5

Determine the number of taps the performer makes:

In[6]:=6

✖

https://wolfram.com/xid/0v59uv074crae-fhw7s0

In[7]:=7

✖

https://wolfram.com/xid/0v59uv074crae-kd4842

In[8]:=8

✖

https://wolfram.com/xid/0v59uv074crae-cd2bbz

Out[8]=8

Find the number of peaks that correlate with the jumps/taps:

In[9]:=9

✖

https://wolfram.com/xid/0v59uv074crae-ina1tq

In[10]:=10

✖

https://wolfram.com/xid/0v59uv074crae-dxlliy

Out[10]=10

In[11]:=11

✖

https://wolfram.com/xid/0v59uv074crae-izclbb

Out[11]=11

Properties & Relations (1)Properties of the function, and connections to other functions

By default, ImageBoundingBoxes is used to detect objects. Use the YOLO V8 network from the Wolfram Neural Net Repository to perform the detection:

In[1]:=1

✖

https://wolfram.com/xid/0v59uv074crae-b8nu4p

Retrieve the network and its evaluation function:

In[20]:=20

✖

https://wolfram.com/xid/0v59uv074crae-vkvu35

Out[20]=20

In[29]:=29

✖

https://wolfram.com/xid/0v59uv074crae-ylwzsr

Detect and track the object using the YOLO V8 network:

In[31]:=31

✖

https://wolfram.com/xid/0v59uv074crae-dni4lx

Out[31]=31

Highlight the tracked detected objects:

In[28]:=28

✖

https://wolfram.com/xid/0v59uv074crae-3v49d6

Out[28]=28

Neat Examples (1)Surprising or curious use cases

Track the motion of particles undergoing a random walk:

In[3]:=3

✖

https://wolfram.com/xid/0v59uv074crae-shdnv

Extract the centroids of particles from the video:

In[4]:=4

✖

https://wolfram.com/xid/0v59uv074crae-n060tf

Track the particles and extract the trajectories:

In[6]:=6

✖

https://wolfram.com/xid/0v59uv074crae-qnc0sr

Plot the trajectories of all the particles:

In[8]:=8

✖

https://wolfram.com/xid/0v59uv074crae-5urqw5

Out[8]=8

Visualize the motion of the particles with the longest trajectories:

In[22]:=22

✖

https://wolfram.com/xid/0v59uv074crae-wlfrhz

Out[22]=22

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

VideoObjectTracking

✖
`VideoObjectTracking`

Details and Options

Examples

Basic Examples (2)Summary of the most common use cases

Scope (7)Survey of the scope of standard use cases

Data (5)

Detectors (2)

Options (5)Common values & functionality for each option

Method (4)

"OCSort" (3)

"RunningBuffer" (1)

TargetDevice (1)

Applications (12)Sample problems that can be solved with this function

Basic Uses (2)

Count Objects (3)

Extract Tracked Objects (1)

Visualize Motion Trajectories (1)

Analyze Wildlife Videos (3)

Analyze Human Videos (2)

Properties & Relations (1)Properties of the function, and connections to other functions

Neat Examples (1)Surprising or curious use cases

Text

CMS

APA

BibTeX

BibLaTeX

	Method	Automatic	tracking method to use
	TargetDevice	Automatic	the target device on which to perform detection

VideoObjectTracking ✖ VideoObjectTracking

Details and Options

Examples

Basic Examples (2)Summary of the most common use cases

Scope (7)Survey of the scope of standard use cases

Data (5)

Detectors (2)

Options (5)Common values & functionality for each option

Method (4)

"OCSort" (3)

"RunningBuffer" (1)

TargetDevice (1)

Applications (12)Sample problems that can be solved with this function

Basic Uses (2)

Count Objects (3)

Extract Tracked Objects (1)

Visualize Motion Trajectories (1)

Analyze Wildlife Videos (3)

Analyze Human Videos (2)

Properties & Relations (1)Properties of the function, and connections to other functions

Neat Examples (1)Surprising or curious use cases

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX

VideoObjectTracking

✖
`VideoObjectTracking`