DepthAI Python API

Instructions for installing, upgrading, and using the DepthAI Python API.

Supported Platforms

The DepthAI API python module is prebuilt for Ubuntu, MaxOS and Windows. For other operating systems and/or Python versions, DepthAI can be built from source.

Installing system dependencies

A couple of basic system dependencies are required to run the DepthAI library. Most of them should be already installed in most of the systems, but in case they are not, we prepared an install script that will make sure all dependencies are installed:

curl -fL http://docs.luxonis.com/install_dependencies.sh | bash

If using Windows, please use this batch script for dependencies installation

Enabling the USB device (only on Linux)

Since the DepthAI is a USB device, in order to communicate with it on the systems that use udev tool, you need to add the udev rules in order to make the device accessible.

The following command will add a new udev rule to your system

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Install from PyPi

Our packages are distributed via PyPi, to install it in your environment use

python3 -m pip install depthai

For other installation options, see Ohter Installation Options.

Test installation

We have depthai repository on our GitHub that contains many helpful examples and prepared neural networks you can use to make your prototyping faster. It also includes the test script, maintained by our contributors, that should help you verify if your setup was correct.

First, clone the depthai repository and install its dependencies

git clone https://github.com/luxonis/depthai.git
cd depthai
python3 -m pip install -r requirements.txt

Now, run the demo script from within depthai to make sure everything is working:

python3 depthai_demo.py

If all goes well a small window video display with overlays for any items for which the class exists in the example 20-class object detector (class list here).

API Reference

depthai.Device

Represents the DepthAI device with the methods to interact with it.

NOTE: Please be aware that all methods except get_available_streams require create_pipeline to be run first,

Example

import depthai
device = depthai.Device('', False)
pipeline = device.create_pipeline(config={
    'streams': ['previewout', 'metaout'],
    'ai': {
        "blob_file": "/path/to/model.blob",
        "blob_file_config": "/path/to/config.json",
    },
})

Methods

__init__(device_id: str, usb2_mode: bool) -> Device

Standard and recomended way to set up the object.

device_id represents the USB port id that the device is connected to. If set to specific value (e.x. "1") it will look for the device in specific USB port, whereas if left empty - '' - it will look for the device on all ports. It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code

usb2_mode, being True/False, allows the DepthAI to communicate using USB2 protocol, not USB3. This lowers the throughput of the pipeline, but allows to use >1m USB cables for connection

__init__(cmd_file: str, device_id: str) -> Device

Development and debug way to initialize the DepthAI device.

cmd_file is a path to firmware .cmd file that will be loaded onto the device for boot.

device_id represents the USB port id that the device is connected to. If set to specific value (e.x. "1") it will look for the device in specific USB port, whereas if left empty - '' - it will look for the device on all ports.
It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code

create_pipeline(config: dict) -> CNNPipeline

Initializes a DepthAI Pipeline, returning the created CNNPipeline if successful and None otherwise.

config(dict) - A dict of pipeline configuration settings. Example key/values for the config:

  {
      # Possible streams:
      #   'color' - 4K color camera preview
      #   'left' - left mono camera preview
      #   'right' - right mono camera preview
      #   'rectified_left' - rectified left camera preview
      #   'rectified_right' - rectified right camera preview
      #   'previewout' - neural network input preview
      #   'metaout' - CNN output tensors
      #   'depth' - the raw depth map, disparity converted to real life distance
      #   'disparity' - disparity map, the diaparity between left and right cameras, in pixels
      #   'disparity_color' - disparity map colorized
      #   'meta_d2h' - device metadata stream
      #   'video' - H.264/H.265 encoded color camera frames
      #   'jpegout' - JPEG encoded color camera frames
      #   'object_tracker' - Object tracker results
      'streams': [
          'left',  # if left is used, it must be in the first position
          'right',
          {'name': 'previewout', 'max_fps': 12.0},  # streams can be specified as objects with additional params
          'metaout',
          # depth-related streams
          {'name': 'depth', 'max_fps': 12.0},
          {'name': 'disparity', 'max_fps': 12.0},
          {'name': 'disparity_color', 'max_fps': 12.0},
      ],
      'depth':
      {
          'calibration_file': consts.resource_paths.calib_fpath,
          'padding_factor': 0.3,
          'depth_limit_m': 10.0, # In meters, for filtering purpose during x,y,z calc
          'confidence_threshold' : 0.5, #Depth is calculated for bounding boxes with confidence higher than this number 
      },
      'ai':
      {
          'blob_file': blob_file,  # MyriadX CNN blob file path
          'blob_file_config': blob_file_config,  # Configuration file for CNN output tensor mapping on host side
          'calc_dist_to_bb': True,  # if True, will include depth information to CNN output tensor
          'keep_aspect_ratio': not args['full_fov_nn'],
      },
      # object tracker
      'ot':
      {
          'max_tracklets'        : 20, # maximum 20 is supported
          'confidence_threshold' : 0.5, # object is tracked only for detections over this threshold
      },
      'board_config':
      {
          'swap_left_and_right_cameras': args['swap_lr'], # True for 1097 (RPi Compute) and 1098OBC (USB w/onboard cameras)
          'left_fov_deg': args['field_of_view'], # Same on 1097 and 1098OBC
          'rgb_fov_deg': args['rgb_field_of_view'],
          'left_to_right_distance_cm': args['baseline'], # Distance between stereo cameras
          'left_to_rgb_distance_cm': args['rgb_baseline'], # Currently unused
          'store_to_eeprom': args['store_eeprom'],
          'clear_eeprom': args['clear_eeprom'],
          'override_eeprom': args['override_eeprom'],
      },
        
      #'video_config':
      #{
      #    'rateCtrlMode': 'cbr',
      #    'profile': 'h265_main', # Options: 'h264_baseline' / 'h264_main' / 'h264_high' / 'h265_main'
      #    'bitrate': 8000000, # When using CBR
      #    'maxBitrate': 8000000, # When using CBR
      #    'keyframeFrequency': 30,
      #    'numBFrames': 0,
      #    'quality': 80 # (0 - 100%) When using VBR
      #}
  }

get_available_streams() -> List[str]

Return a list of all streams supported by the DepthAI library.

  >>> device.get_available_streams()
  ['meta_d2h', 'color', 'left', 'right', 'rectified_left', 'rectified_right', 'disparity', 'depth', 'metaout', 'previewout', 'jpegout', 'video', 'object_tracker']

get_nn_to_depth_bbox_mapping() -> dict

Returns dict that allows to match the CNN output with the disparity info.

Since the RGB camera has a 4K resolution and the neural networks accept only images with specific resolution (like 300x300), the original image is cropped to meet the neural network requirements. On the other side, the disparity frames returned by the neural network are in full resolution available on the mono cameras.

To be able to determine where the CNN previewout image is on the disparity frame, this method should be used as it specifies the offsets and dimensions to use.
```
  >>> device.get_nn_to_depth_bbox_mapping()
  {'max_h': 681, 'max_w': 681, 'off_x': 299, 'off_y': 59}
```

request_af_mode(mode: AutofocusMode)

Set the 4K RGB camera autofocus mode to one of the available Autofocus modes

request_af_trigger()

Manually send trigger action to AutoFocus on 4k RGB camera

request_jpeg()

Capture a JPEG frame from the RGB camera and send it to jpegout stream. The frame is in full available resolution, not cropped to meet the CNN input dimensions.

send_disparity_confidence_threshold()

Function to send disparity confidence threshold for StereoSGBM algorithm. If the disparity value confidence is below the threshold, the value is marked as invalid disparity and treated as background

get_right_homography()

Return a 3x3 homography matrix used to rectify the right stereo camera image.

get_left_homography()

Return a 3x3 homography matrix used to rectify the left stereo camera image.

Note: Requires dual-homography calibration.

get_left_intrinsic()

Return a 3x3 intrinisc calibration matrix of the left stereo camera.

Note: Requires dual-homography calibration.

get_right_intrinsic()

Return a 3x3 intrinisc calibration matrix of the right stereo camera.

Note: Requires dual-homography calibration.

get_rotation()

Return a 3x3 rotation matrix representing the rotation of the right stereo camera w.r.t left stereo camera.

Note: Requires dual-homography calibration.

get_translation()

Return a 3x1 vector repesenting the position of the right stereo camera center w.r.t left stereo camera center.

Note: Requires dual-homography calibration.

depthai.AutofocusMode

An enum with all autofocus modes available

Members

AF_MODE_AUTO This mode sets the Autofocus to a manual mode, where you need to call request_af_trigger to start focusing procedure.

AF_MODE_CONTINUOUS_PICTURE This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is standing still while capturing. Focusing procedure is done as fast as possible.

This is the defaut mode the DepthAI operates in.

AF_MODE_CONTINUOUS_VIDEO This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is trying to capture a smooth video steam. Focusing procedure is slower and avoids focus overshoots

AF_MODE_EDOF This mode disables the autofocus. EDOF stands for Enhanced Depth of Field and is a digital focus.

AF_MODE_MACRO It’s the same operating mode as AF_MODE_AUTO

depthai.CNNPipeline

Pipeline object using which the device is able to send it’s result to the host. Created using [depthai.create_pipeline]

Methods

get_available_data_packets() -> List[depthai.DataPacket]

Returns only data packets produced by the device itself, without CNN results

get_available_nnet_and_data_packets() -> tuple[List[NNetPacket], List[depthai.DataPacket]]

Return both neural network results and data produced by device

depthai.NNetPacket

For any neural network inference output get_tensor can be used. For the specific case of Mobilenet-SSD, YOLO-v3 decoding can be done in the firmware. Decoded objects can be accessed through getDetectedObjects as well in addition to raw output to make the results of this commonly used networks easily accessible. See blob config file for more details about different neural network output formats and how to choose between these formats.

Neural network results packet. It’s not a single result, but a batch of results with additional metadata attached

Methods

getMetadata() -> depthai.FrameMetadata

Returns metadata object containing all proprietary data related to this packet

get_tensor(Union[int, str]) -> numpy.ndarray

Can be used ONLY when in blob config file output_format is set to raw. It returns a shaped numpy array for the specific network output tensor, based on the neural network’s output layer information.

For example: in case of Mobilenet-SSD it returns a [1, 1, 100, 7] shaped array, where numpy.dtype is float16.

Example of usage: nnetpacket.get_tensor(0) or nnetpacket.get_tensor('detection_out')

__getitem__(Union[int, str] -> numpy.ndarray

Same as get_tensor.

Example of usage for Mobilenet-SSD:

nnetpacket[0] or nnetpacket['detection_out'], where 'detection_out' is the name of output layer in case of Mobilenet-SSD

getOutputsList() -> list

Returns all the output tensors in a list for the network.

getOutputsDict() -> dict

Returns all the output tensors in a dictionary for the network. The key is the name of the output layer, the value is the shaped numpy array.

getOutputLayersInfo() -> depthai.TensorInfo

Returns informations about the output layers for the network.

getInputLayersInfo() -> depthai.TensorInfo

Returns informations about the input layers for the network.

getDetectedObjects() -> depthai.Detections

ONLY for detection networks (Mobilenet-SSD, (tiny-)YOLO-v3 based networks) Should be used ONLY when in blob config file output_format is set to detection. Returns the detected objects in Detections format. The network is decoded on device side.

depthai.TensorInfo

Descriptor of the input/output layers/tensors of the network.

When network is loaded the tensor info is automatically printed.

Can be printed using : print(nnetpacket.getInputLayersInfo()) or print(nnetpacket.getOutputLayersInfo()) at runtime.

Attributes

name -> string

Name of the tensor.

dimensions -> list

Shape of tensor array. E.g. : [1, 1, 100, 7]

strides -> list

Strides of tensor array.

data_type -> string

Data type of tensor. E.g. : float16

offset -> int

Offset in the raw output array.

element_size -> int

Size in bytes of one element in the array.

index -> int

Index of the tensor. E.g. : in case of multiple inputs/outputs in the network it marks the order of input/output.

Methods

get_dict() -> dict

Returns TensorInfo in a dictionary where the key is the name of attribute.

get_dimension(Dimension) -> int

Returns the specific dimension of the tensor, for example: tensor_info.get_dimension(depthai.TensorInfo.Dimension.WIDTH) returns the WIDTH of tensor.

depthai.Detections

Container of neural network results decoded on device side.

Example of accessing detections

Assuming the detected objects are stored in detections object.

Number of detections

detections.size() or len(detections)
Accessing the x-th detection

detections[x]
Iterating through all detections

for detection in detections:

handle detection

depthai.Detection

Detected object descriptor.

Attributes

label -> int

Label id of the detected object.

confidence -> float

Confidence score of the detected object in interval [0, 1].

x_min -> float

Top left X coordinate of the detected bounding box. Normalized, in interval [0, 1].

y_min -> float

Top left Y coordinate of the detected bounding box. Normalized, in interval [0, 1].

x_max -> float

Bottom right X coordinate of the detected bounding box. Normalized, in interval [0, 1].

y_max -> float

Bottom right Y coordinate of the detected bounding box. Normalized, in interval [0, 1].

depth_x -> float

Distance to detected bounding box on X axis. Only when depth calculation is enabled (stereo cameras are present on board).

depth_y -> float

Distance to detected bounding box on Y axis. Only when depth calculation is enabled (stereo cameras are present on board).

depth_z -> float

Distance to detected bounding box on Z axis. Only when depth calculation is enabled (stereo cameras are present on board).

Methods

get_dict() -> dict

Returns detected object in a dictionary where the key is the name of attribute.

depthai.TensorInfo.Dimension

Dimension descriptor of tensor shape.

Values

Union[W, WIDTH] -> Width
Union[H, HEIGHT] -> Height
Union[C, CHANNEL] -> Number of channels
Union[N, B, NUMBER, BATCH] -> Number/Batch of inferences

Note: Dimension is mostly meaningful for input tensors since not all neural network models respect the semantics of Dimension for output tensor. E.g. Width might not mean Width.

depthai.DataPacket

DepthAI data packet, containing information generated on the device. Unlike NNetPacket, it contains a single “result” with source stream info

Attributes

stream_name: str

Returns packet source stream. Used to determine the origin of the packet and therefore allows to handle the packets correctly, applying proper handling based on this value

Methods

getData() -> numpy.ndarray

Returns the data as NumPy array, which you can e.x. display the data using OpenCV imshow.

Used with streams that returns frames e.x. previewout, left, right, or encoded data e.x. video, jpegout.

getDataAsStr() -> str

Returns the data as a string, capable to be parsed further.

Used with streams that returns non-array results e.x. meta_d2h which returns JSON object

getMetadata() -> depthai.FrameMetadata

Returns metadata object containing all proprietary data related to this packet

getObjectTracker() -> ObjectTracker

Returns result as an ObjectTracker instance, used only with packets from object_tracker stream

size() -> int

Returns packet data size

depthai.FrameMetadata

Metadata object attached to the packets sent via pipeline.

Methods

getCameraName() -> str

Returns the name of the camera that produced the frame.

getCategory() -> int

Returns the type of the packet, whether it’s a regular frame or arrived from taking a still

getFrameBytesPP() -> int

Returns number of bytes per pixel in the packet’s frame

getFrameHeight() -> int

Returns the height of the packet’s frame

getFrameWidth() -> int

Returns the width of the packet’s frame

getFrameType() -> int

Returns the type of the data that this packet contains.

getInstanceNum() -> int

Returns the camera id that is the source of the current packet

getSequenceNum() -> int

Sequence number is assigned for each frame produced by the camera. It can be used to assure the frames are captured at the same time - e.x. if frames from left and right camera have the same sequence number, you can assume they were taken at the same time

getStride() -> int

Specifies number of bytes till the next row of pixels in the packet’s frame

getTimestamp() -> float

When packet is created, it is assigned a creation timestamp, which can be obtained using this method

depthai.ObjectTracker

Object representing current state of the tracker, obtained by calling getObjectTracker method on a packet from object_tracker stream

Methods

getNrTracklets() -> int

Return the number of available tracklets

getTracklet(tracklet_nr: int) -> Tracklet

Returns the tracklet with specified tracklet_nr. To check how many tracklets there are, please use getNrTracklets method

depthai.Tracklet

Tracklet is representing a single tracked object, is produced by ObjectTracker class. To obtain it, call getTracklet method.

Methods

getId() -> int

Return the tracklet id

getLabel() -> int

Return the tracklet label, being the neural network returned result. Used to identify a class of recognized objects

getLabel() -> str

Return the tracklet status - either NEW, TRACKED, or LOST.

getLeftCoord() -> int

Return the left coordinate of the bounding box of a tracked object

getRightCoord() -> int

Return the right coordinate of the bounding box of a tracked object

getTopCoord() -> int

Return the top coordinate of the bounding box of a tracked object

getBottomCoord() -> int

Return the bottom coordinate of the bounding box of a tracked object

Preparing MyriadX blob file and it’s config

As you can see in this example, basic usage of create_pipeline method consists of specifying desired output streams and AI section, where you specify MyriadX blob and it’s config.

In this section, we’ll describe how to obtain both blob_file and blob_file_config.

Obtaining MyriadX blob

Since we’re utilizing MyriadX VPU, your model needs to be compiled (or accurately - optimized and converted) into the MyriadX blob file, which will be sent to the device and executed.

Easiest way to obtain this blob is to use our online BlobConverter app. It has all tools needed for compilation so you don’t need to setup anything - and you can even download a blob for the model from OpenVINO model zoo

If you’d like, you can also compile the blob yourself. You’ll need to install OpenVINO toolkit, then use Model Optimizer and Myriad Compiler in order to obtain MyriadX blob. We’ve documented example usage of these compilers here

Creating Blob configuration file

If config file is not provided then there is no decoding done on device => output_format is set to raw. The decoding must be done on host side, by the user.

Currently there is support to decode Mobilenet-SSD and (tiny-)YOLO-v3 based networks on the device. For that config file is required with network specific parameters.

Example for tiny-yolo-v3 network:

{
    "NN_config":
    {
        "output_format" : "detection",
        "NN_family" : "YOLO",
        "NN_specific_metadata" :
        { 
            "classes" : 80,
            "coordinates" : 4,
            "anchors" : [10,14, 23,27, 37,58, 81,82, 135,169, 344,319],
            "anchor_masks" : 
            {
                "side26" : [1,2,3],
                "side13" : [3,4,5]
            },
            "iou_threshold" : 0.5,
            "confidence_threshold" : 0.5
        }
    },
    "mappings":
    {
        "labels":
        [
            "person",
            "bicycle",
            "car",
            ...
        ]
    }
}

NN_config - configuration for the network
- output_format
  - "detection" - decoding done on device, the received packet is in Detections format
  - "raw" - decoding done on host
- NN_family - "YOLO" or "mobilenet”
- NN_specific_metadata - only for "YOLO"
  - classes - number of classes
  - coordinates - number of coordinates
  - anchors - anchors for YOLO network
  - anchor_masks - anchor mask for each output layer : 26x26, 13x13 (+ 52x52 for full YOLO-v3)
  - iou_threshold - intersection over union threshold for detected object
  - confidence_threshold - score confidence threshold for detected object
mappings
- labels - label mapping for detected object ID

Example decoding for tiny-yolo-v3, yolo-v3, mobilenet-ssd when output_format is set to detection:

nnet_packets, data_packets = p.get_available_nnet_and_data_packets(blocking=True)
...

in_layers = nnet_packet.getInputLayersInfo() #get input layer information
# print(in_layers) #print input layer info for debugging
input_width  = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.W) #width of input image
input_height = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.H) #height of input image

detections = nnet_packet.getDetectedObjects() #get detection container
objects = list() #create empty list of filtered objects

for detection in detections:
    detection_dict = detection.get_dict()
    # scale normalized coordinates to image coordinates
    detection_dict["x_min"] = int(detection_dict["x_min"] * input_width)
    detection_dict["y_min"] = int(detection_dict["y_min"] * input_height)
    detection_dict["x_max"] = int(detection_dict["x_max"] * input_width)
    detection_dict["y_max"] = int(detection_dict["y_max"] * input_height)
    objects.append(detection_dict)

return objects

Example of decoding for full yolo-v3 and tiny-yolo-v3 on host and device

yolo-v3-decoding

Example of decoding for mobilenet based networks on host and device

mobilenet-decoding

Other installation methods

To get the latest and yet unreleased features from our source code, you can go ahead and compile depthai package manually.

Dependencies to build from source

CMake > 3.2.0
Generation tool (Ninja, make, …)
C/C++ compiler
libusb1 development package

Ubuntu, Raspberry Pi OS, … (Debian based systems)

On Debian based systems (Raspberyy Pi OS, Ubuntu, …) these can be acquired by running:

sudo apt-get -y install cmake libusb-1.0-0-dev build-essential

macOS (Mac OS X)

Assuming a stock Mac OS X install, depthai-python library needs following dependencies

HomeBrew (If it’s not installed already)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" 

Python, libusb, CMake, wget
```
brew install coreutils python3 cmake libusb wget
```
And now you’re ready to clone the depthai-python from Github and build it for Mac OS X.

You can install them all with the following command (if using Ubuntu)

Install using GitHub commit

Pip allows users to install the packages from specific commits, even if they are not yet released on PyPi.

To do so, use the command below - and be sure to replace the <commit_sha> with the correct commit hash from here

python3 -m pip install git+https://github.com/luxonis/depthai-python.git@<commit_sha>

Using/Testing a Specific Branch/PR

From time to time, it may be of interest to use a specific branch. This may occur, for example, because we have listened to your feature request and implemented a quick implementation in a branch. Or it could be to get early access to a feature that is soaking in our develop for stability purposes before being merged into main.

So when working in the depthai repository, using a branch can be accomplished with the following commands. For this example, the branch that we will try out is develop (which is the branch we use to soak new features before merging them into main):

Prior to running the following, you can either clone the respository independently (for not over-writing any of your local changes) or simply do a git pull first.

git checkout develop
python3 -m pip install -U pip
python3 -m pip install -r requirements.txt

Install from source

If desired, you can also install the package from the source code itself - it will allow you to make the changes to the API and see them live in action.

To do so, first download the repository and then add the package to your python interpreter in development mode

git clone https://github.com/luxonis/depthai-python.git
cd depthai-python
git submodule update --init --recursive
python3 setup.py develop  # you may need to add sudo if using system interpreter instead of virtual environment

If you want to use other branch (e.g. develop) than default (main), you can do so by typing

git checkout develop  # replace the "develop" with a desired branch name
git submodule update --recursive
python3 setup.py develop

Or, if you want to checkout a specific commit, type

git checkout <commit_sha>
git submodule update --recursive
python3 setup.py develop

This website is outdated!

Latest docs available at docs.luxonis.com

Products

Tutorials

Code Samples

Custom Training

DepthAI Python API

Supported Platforms

Installing system dependencies

Enabling the USB device (only on Linux)

Install from PyPi

Test installation

API Reference

depthai.Device

Example

Methods

depthai.AutofocusMode

Members

depthai.CNNPipeline

Methods

depthai.NNetPacket

Methods

depthai.TensorInfo

Attributes

Methods

depthai.Detections

Example of accessing detections

depthai.Detection

Attributes

Methods

depthai.TensorInfo.Dimension

Values

depthai.DataPacket

Attributes

Methods

depthai.FrameMetadata

Methods

depthai.ObjectTracker

Methods

depthai.Tracklet

Methods

Preparing MyriadX blob file and it’s config

Obtaining MyriadX blob

Creating Blob configuration file

Other installation methods

Dependencies to build from source

Ubuntu, Raspberry Pi OS, … (Debian based systems)

macOS (Mac OS X)

Install using GitHub commit

Using/Testing a Specific Branch/PR

Install from source