ONNX Object Tracker

Category:

AddIn:

Scope:

Code Snippets:

Supports Material List:

Status Screen Widgets:

License:

Object Tracking

Local

Standard EventIDE license

In this article:

Introduction
Key Features
Properties
Practical Use
Technique 1: ONNX-Based Detection
Technique 2: Hybrid Tracking with Template Stabilization
Technique 3: Facial Tracking from Webcam Stream
Notes

The ONNX Object Tracker element performs automatic tracking of a selected visual object using fast ONNX Runtime models. It supports multiple object detection architectures and GPU acceleration via DirectML. During runtime, the element continuously returns the tracked object’s position, size, detection confidence, and processing time.

Introduction

The ONNX Object Tracker element enables real-time tracking of visual objects within a stimulus event. It analyzes the visual content rendered inside its viewport and detects the selected target object using a specified ONNX detection model. The element can operate in three modes:

ONNX Detection – deep learning–based object detection using an ONNX model.
Template Matching – tracking based on a captured template image.
Hybrid – combined ONNX detection and template matching for increased robustness.

The tracker returns object position (X, Y), size (Width, Height), detection confidence (0.0–1.0), and processing time for each detection pass. These values can be used for logging, flow control, adaptive stimulus presentation, or closed-loop experimental paradigms.

GPU acceleration via DirectML can be enabled to improve inference performance on compatible hardware.

Key Features

Real-time object tracking using ONNX Runtime
Support for multiple detection models (e.g., COCO-based architectures)
Optional DirectML GPU acceleration
Hybrid tracking mode (ONNX + template matching)
Adjustable confidence threshold
Configurable detection pace
Runtime access to detection confidence and processing time
Trajectory recording and export
Integration with event structure and flow routes
Viewport-based tracking area definition

Properties

Name	Description	Property Class	Type	On runtime change
Settings
Tracking Mode	Tracking algorithm: ONNX model detection, Template matching (with a grabbed template object), or Hybrid (ONNX + template matching for improved robustness).	General	Int32
Detection Pace	Defines the minimal pace (in ms) of automatic detection passes in response to visual changes in the parent event. Should not be shorter than the detection processing time.	General	Double
Confidence Threshold	Minimum confidence score (0.0–1.0) required for valid object detection. Higher values reduce false positives but may miss detections.	General	Single
Template-Onnx Weight	Weight factor used in Hybrid mode (0.0 = ONNX only, 1.0 = template only). Intermediate values combine both approaches.	General	Single
Grab Template Now	Captures the target template object from the current element viewport for template-based tracking.	Runtime Command	Boolean	Yes
Test Detection Now	Executes a detection pass on the current event surface for validation purposes.	Design	Boolean
Border Visible	Defines whether a visible border is shown around the element viewport during runtime.	General	Boolean
ONNX Settings
Model Path	Path to the ONNX model file used for object detection. The corresponding XML descriptor file with the same name must be present in the same folder.	Design	String
GPU Acceleration	Enables DirectML GPU acceleration for faster inference. Requires compatible hardware.	General	Boolean
Target Class ID	Class ID of the object to track (-1 = any class). Example COCO classes: 0 = person, 2 = car, 16 = bird.	General	Int32
Runtime Status
Detection Confidence	Returns the confidence score (0.0–1.0) from the most recent detection pass.	Status	Single
Processing Time	Returns the processing time of the last detection pass.	Status	clTime
Trajectory Control
Reset Trajectory Now	Clears the recorded object trajectory.	Runtime Command	Boolean	Yes
Save Trajectory Now	Assign a filename to save the object trajectory recorded during runtime.	Runtime Command	String	Yes
Preview Trajectory	Displays a preview of the recorded trajectory.	Runtime Command	Boolean	Yes
Export To Library	Exports the recorded trajectory to the Material Library.	Runtime Command	Boolean	Yes
Visual Appearance
Alpha Masking	Uses rendered content to create an opacity mask on the event surface. Luminance (or its inverse) defines transparency.	Design	Int32
AntiAliasing	If true, rendered content is smoothed. If false, rendering remains pixel-authentic.	General	Boolean
Position	Defines the viewport position on the screen.	General	clPoint
Size	Defines the viewport size on the screen.	General	clSize
Z Order	Indicates the Z-order of the element within the event.	Status	Int32
Pivot Point	Defines alignment of the pivot point relative to the rendering area (affects rotation and scaling center).	General	stAlign
Visible	Defines whether the element is visible during runtime.	General	Boolean
Effects
Transparent Color	Defines the color that becomes transparent in the rendered content.	General	stColor
Transparent Tolerance	Tolerance level for transparent color selection (0 = disabled, 1 = fully transparent).	General	Int32
Color Mask	Multiplies original pixels by the selected mask color (alpha unaffected).	General	stColor
Opacity	Opacity level of rendered content.	General	Int32
Contrast	Contrast adjustment of rendered content.	General	Int32
Brightness	Brightness adjustment of rendered content.	General	Int32
Saturation	Saturation level of rendered content.	General	Double
Pixelation	Pixelation level of rendered content.	General	Int32
Blurring	Radius for Gaussian blur (0 = no blur).	General	Int32
Scrambling	Proportion of scrambled voxels in rendered content.	General	Double
Scrambling Grid Size	Dimensions of scrambling grid (must evenly divide element size).	General	clSize
Positional Jitter
Reset Jitter Now	Resets positional jitter to its initial state.	Runtime Command	Boolean	Yes
Current Jitter	Returns the current positional jitter.	Status	clPoint
Jitter Range	Defines the range of random positional jitter around the element’s position.	General	clSize
Control
Is Enabled	If false, the element is omitted during experiment runtime.	Design	Boolean
Title	Title of the element.	Design	String

Practical Use

The ONNX Object Tracker element can be used in experiments requiring:

Tracking of hands, faces, or specific objects in video stimuli
Real-time facial tracking directly from a live camera stream
Spatial trajectory recording and analysis
Interactive human–computer interaction tasks
Attention and visual search experiments

The element can operate continuously during stimulus presentation and can trigger flow routes based on object location, size, or confidence level.

Technique 1: ONNX-Based Detection

Add the ONNX Object Tracker element to a stimulus event.
Set Model Path to the desired ONNX model (ensure the corresponding XML descriptor file is present in the same directory).
Optionally enable GPU Acceleration for improved performance.
Set Target Class ID (e.g., -1 for any detected class or a specific class ID).
Adjust Confidence Threshold to control detection sensitivity.
Set appropriate Detection Pace (in ms) based on expected processing time.
Use the runtime outputs (Position, Size, Detection Confidence) for:
- Data logging
- Flow route conditions
- Adaptive stimulus control

Technique 2: Hybrid Tracking with Template Stabilization

Add the element to the event and set Tracking Mode to Hybrid.
Select the event and use Grab Template Now to capture the target object.
Adjust Template-Onnx Weight:
1. 0.0 → pure ONNX detection,
2. 1.0 → pure template matching,
3. intermediate values → combined tracking.
Monitor Detection Confidence to ensure stable tracking.
Use trajectory controls:
1. Reset Trajectory Now to clear recorded path,
2. Save Trajectory Now to export trajectory to file,
3. Preview Trajectory for visual inspection.

Hybrid mode is recommended in scenarios with lighting variability, partial occlusion, or moderate visual noise.

Technique 3: Facial Tracking from Webcam Stream

The ONNX Object Tracker element can be paired with a Webcam OpenCV element to implement real-time facial tracking directly from a live camera stream.

Add a Webcam OpenCV element to a stimulus event and configure the desired camera device and resolution.
Ensure the webcam stream is rendered on the event surface (e.g., full screen or defined viewport).
Add the ONNX Object Tracker element to the same event and position its viewport over the webcam stream area.
Set Model Path to a face detection ONNX model (e.g., a lightweight face detector compatible with ONNX Runtime).
Set Target Class ID according to the model specification (or -1 if the model detects faces only).
Adjust Confidence Threshold (e.g., 0.5–0.7 for stable facial detection).
Enable GPU Acceleration if available to reduce inference latency.
Tune Detection Pace to match the webcam frame rate and processing time.

At runtime, the tracker will analyze the live webcam frames and continuously return:

Face position (X, Y),
Face bounding box size (Width, Height),
Detection confidence.

These outputs can be used to:

Log head movement trajectories,
Trigger events when the participant looks toward predefined regions,
Implement gaze- or face-position–dependent stimulus adaptation,
Monitor participant presence and compliance.

Example Use Case

In an attention experiment, a trial may proceed only when the participant’s face is detected within a predefined central region of the screen. If detection confidence drops below a threshold (e.g., participant moves away), the experiment can pause automatically until stable facial tracking resumes.

Notes

Detection Pace should not be shorter than the average processing time to avoid performance degradation.
Larger viewport sizes increase computational load.
GPU acceleration requires a DirectML-compatible graphics card.
Template matching performs best with visually stable and well-defined objects.
Always verify that the ONNX model and its XML descriptor file are correctly paired.
For high-frame-rate video stimuli, carefully tune detection pace to maintain real-time performance.