DepthAI Python API
Instructions for installing, upgrading, and using the DepthAI Python API.
Supported Platforms
The DepthAI API python module is prebuilt for Ubuntu, MaxOS and Windows. For other operating systems and/or Python versions, DepthAI can be built from source.
Installing system dependencies
A couple of basic system dependencies are required to run the DepthAI library. Most of them should be already installed in most of the systems, but in case they are not, we prepared an install script that will make sure all dependencies are installed:
curl -fL http://docs.luxonis.com/install_dependencies.sh | bash
If using Windows, please use this batch script for dependencies installation
Enabling the USB device (only on Linux)
Since the DepthAI is a USB device, in order to communicate with it on the systems that use udev
tool, you
need to add the udev rules in order to make the device accessible.
The following command will add a new udev rule to your system
echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
Install from PyPi
Our packages are distributed via PyPi, to install it in your environment use
python3 -m pip install depthai
For other installation options, see Ohter Installation Options.
Test installation
We have depthai
repository on our GitHub that contains many helpful examples and
prepared neural networks you can use to make your prototyping faster. It also includes the test script, maintained by
our contributors, that should help you verify if your setup was correct.
First, clone the depthai
repository and install its dependencies
git clone https://github.com/luxonis/depthai.git
cd depthai
python3 -m pip install -r requirements.txt
Now, run the demo script from within depthai to make sure everything is working:
python3 depthai_demo.py
If all goes well a small window video display with overlays for any items for which the class exists in the example 20-class object detector (class list here).
API Reference
depthai.Device
Represents the DepthAI device with the methods to interact with it.
NOTE: Please be aware that all methods except get_available_streams
require create_pipeline
to be run first,
Example
import depthai
device = depthai.Device('', False)
pipeline = device.create_pipeline(config={
'streams': ['previewout', 'metaout'],
'ai': {
"blob_file": "/path/to/model.blob",
"blob_file_config": "/path/to/config.json",
},
})
Methods
-
__init__(device_id: str, usb2_mode: bool) -> Device
Standard and recomended way to set up the object.
device_id represents the USB port id that the device is connected to. If set to specific value (e.x.
"1"
) it will look for the device in specific USB port, whereas if left empty -''
- it will look for the device on all ports. It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the codeusb2_mode, being
True/False
, allows the DepthAI to communicate using USB2 protocol, not USB3. This lowers the throughput of the pipeline, but allows to use >1m USB cables for connection
-
__init__(cmd_file: str, device_id: str) -> Device
Development and debug way to initialize the DepthAI device.
cmd_file is a path to firmware
.cmd
file that will be loaded onto the device for boot.device_id represents the USB port id that the device is connected to. If set to specific value (e.x.
"1"
) it will look for the device in specific USB port, whereas if left empty -''
- it will look for the device on all ports.
It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code
-
create_pipeline(config: dict) -> CNNPipeline
Initializes a DepthAI Pipeline, returning the created
CNNPipeline
if successful andNone
otherwise.config(dict) - A
dict
of pipeline configuration settings. Example key/values for the config:{ # Possible streams: # 'color' - 4K color camera preview # 'left' - left mono camera preview # 'right' - right mono camera preview # 'rectified_left' - rectified left camera preview # 'rectified_right' - rectified right camera preview # 'previewout' - neural network input preview # 'metaout' - CNN output tensors # 'depth' - the raw depth map, disparity converted to real life distance # 'disparity' - disparity map, the diaparity between left and right cameras, in pixels # 'disparity_color' - disparity map colorized # 'meta_d2h' - device metadata stream # 'video' - H.264/H.265 encoded color camera frames # 'jpegout' - JPEG encoded color camera frames # 'object_tracker' - Object tracker results 'streams': [ 'left', # if left is used, it must be in the first position 'right', {'name': 'previewout', 'max_fps': 12.0}, # streams can be specified as objects with additional params 'metaout', # depth-related streams {'name': 'depth', 'max_fps': 12.0}, {'name': 'disparity', 'max_fps': 12.0}, {'name': 'disparity_color', 'max_fps': 12.0}, ], 'depth': { 'calibration_file': consts.resource_paths.calib_fpath, 'padding_factor': 0.3, 'depth_limit_m': 10.0, # In meters, for filtering purpose during x,y,z calc 'confidence_threshold' : 0.5, #Depth is calculated for bounding boxes with confidence higher than this number }, 'ai': { 'blob_file': blob_file, # MyriadX CNN blob file path 'blob_file_config': blob_file_config, # Configuration file for CNN output tensor mapping on host side 'calc_dist_to_bb': True, # if True, will include depth information to CNN output tensor 'keep_aspect_ratio': not args['full_fov_nn'], }, # object tracker 'ot': { 'max_tracklets' : 20, # maximum 20 is supported 'confidence_threshold' : 0.5, # object is tracked only for detections over this threshold }, 'board_config': { 'swap_left_and_right_cameras': args['swap_lr'], # True for 1097 (RPi Compute) and 1098OBC (USB w/onboard cameras) 'left_fov_deg': args['field_of_view'], # Same on 1097 and 1098OBC 'rgb_fov_deg': args['rgb_field_of_view'], 'left_to_right_distance_cm': args['baseline'], # Distance between stereo cameras 'left_to_rgb_distance_cm': args['rgb_baseline'], # Currently unused 'store_to_eeprom': args['store_eeprom'], 'clear_eeprom': args['clear_eeprom'], 'override_eeprom': args['override_eeprom'], }, #'video_config': #{ # 'rateCtrlMode': 'cbr', # 'profile': 'h265_main', # Options: 'h264_baseline' / 'h264_main' / 'h264_high' / 'h265_main' # 'bitrate': 8000000, # When using CBR # 'maxBitrate': 8000000, # When using CBR # 'keyframeFrequency': 30, # 'numBFrames': 0, # 'quality': 80 # (0 - 100%) When using VBR #} }
-
get_available_streams() -> List[str]
Return a list of all streams supported by the DepthAI library.
>>> device.get_available_streams() ['meta_d2h', 'color', 'left', 'right', 'rectified_left', 'rectified_right', 'disparity', 'depth', 'metaout', 'previewout', 'jpegout', 'video', 'object_tracker']
-
get_nn_to_depth_bbox_mapping() -> dict
Returns dict that allows to match the CNN output with the disparity info.
Since the RGB camera has a 4K resolution and the neural networks accept only images with specific resolution (like 300x300), the original image is cropped to meet the neural network requirements. On the other side, the disparity frames returned by the neural network are in full resolution available on the mono cameras.
To be able to determine where the CNN previewout image is on the disparity frame, this method should be used as it specifies the offsets and dimensions to use.
>>> device.get_nn_to_depth_bbox_mapping() {'max_h': 681, 'max_w': 681, 'off_x': 299, 'off_y': 59}
-
request_af_mode(mode: AutofocusMode)
Set the 4K RGB camera autofocus mode to one of the available Autofocus modes
-
Manually send trigger action to AutoFocus on 4k RGB camera
-
Capture a JPEG frame from the RGB camera and send it to
jpegout
stream. The frame is in full available resolution, not cropped to meet the CNN input dimensions.
-
send_disparity_confidence_threshold()
Function to send disparity confidence threshold for StereoSGBM algorithm. If the disparity value confidence is below the threshold, the value is marked as invalid disparity and treated as background
-
Return a 3x3 homography matrix used to rectify the right stereo camera image.
-
Return a 3x3 homography matrix used to rectify the left stereo camera image.
error Note: Requires dual-homography calibration.
-
Return a 3x3 intrinisc calibration matrix of the left stereo camera.
error Note: Requires dual-homography calibration.
-
Return a 3x3 intrinisc calibration matrix of the right stereo camera.
error Note: Requires dual-homography calibration.
-
Return a 3x3 rotation matrix representing the rotation of the right stereo camera w.r.t left stereo camera.
error Note: Requires dual-homography calibration.
-
Return a 3x1 vector repesenting the position of the right stereo camera center w.r.t left stereo camera center.
error Note: Requires dual-homography calibration.
depthai.AutofocusMode
An enum with all autofocus modes available
Members
- AF_MODE_AUTO
This mode sets the Autofocus to a manual mode, where you need to call
request_af_trigger
to start focusing procedure.
-
AF_MODE_CONTINUOUS_PICTURE This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is standing still while capturing. Focusing procedure is done as fast as possible.
This is the defaut mode the DepthAI operates in.
- AF_MODE_CONTINUOUS_VIDEO This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is trying to capture a smooth video steam. Focusing procedure is slower and avoids focus overshoots
- AF_MODE_EDOF This mode disables the autofocus. EDOF stands for Enhanced Depth of Field and is a digital focus.
- AF_MODE_MACRO It’s the same operating mode as AF_MODE_AUTO
depthai.CNNPipeline
Pipeline object using which the device is able to send it’s result to the host. Created using [depthai.create_pipeline]
Methods
-
get_available_data_packets() -> List[depthai.DataPacket]
Returns only data packets produced by the device itself, without CNN results
-
get_available_nnet_and_data_packets() -> tuple[List[NNetPacket], List[depthai.DataPacket]]
Return both neural network results and data produced by device
depthai.NNetPacket
For any neural network inference output get_tensor can be used. For the specific case of Mobilenet-SSD
, YOLO-v3
decoding can be done in the firmware. Decoded objects can be accessed through getDetectedObjects as well in addition to raw output to make the results of this commonly used networks easily accessible.
See blob config file for more details about different neural network output formats and how to choose between these formats.
Neural network results packet. It’s not a single result, but a batch of results with additional metadata attached
Methods
-
getMetadata() -> depthai.FrameMetadata
Returns metadata object containing all proprietary data related to this packet
-
get_tensor(Union[int, str]) -> numpy.ndarray
Can be used ONLY when in blob config file
output_format
is set toraw
. It returns a shaped numpy array for the specific network output tensor, based on the neural network’s output layer information.For example: in case of
Mobilenet-SSD
it returns a[1, 1, 100, 7]
shaped array, wherenumpy.dtype
isfloat16
.Example of usage:
nnetpacket.get_tensor(0)
ornnetpacket.get_tensor('detection_out')
-
__getitem__(Union[int, str] -> numpy.ndarray
Same as get_tensor.
Example of usage for
Mobilenet-SSD
:nnetpacket[0]
ornnetpacket['detection_out']
, where'detection_out'
is the name of output layer in case ofMobilenet-SSD
-
getOutputsList() -> list
Returns all the output tensors in a list for the network.
-
getOutputsDict() -> dict
Returns all the output tensors in a dictionary for the network. The
key
is thename
of the output layer, thevalue
is theshaped numpy array
.
-
getOutputLayersInfo() -> depthai.TensorInfo
Returns informations about the output layers for the network.
-
getInputLayersInfo() -> depthai.TensorInfo
Returns informations about the input layers for the network.
-
getDetectedObjects() -> depthai.Detections
ONLY for detection networks (
Mobilenet-SSD
,(tiny-)YOLO-v3
based networks) Should be used ONLY when in blob config fileoutput_format
is set todetection
. Returns the detected objects in Detections format. The network is decoded on device side.
depthai.TensorInfo
Descriptor of the input/output layers/tensors of the network.
When network is loaded the tensor info is automatically printed.
Can be printed using : print(nnetpacket.getInputLayersInfo())
or print(nnetpacket.getOutputLayersInfo())
at runtime.
Attributes
-
name -> string
Name of the tensor.
-
dimensions -> list
Shape of tensor array. E.g. :
[1, 1, 100, 7]
-
strides -> list
Strides of tensor array.
-
data_type -> string
Data type of tensor. E.g. :
float16
-
offset -> int
Offset in the raw output array.
-
element_size -> int
Size in bytes of one element in the array.
-
index -> int
Index of the tensor. E.g. : in case of multiple inputs/outputs in the network it marks the order of input/output.
Methods
-
get_dict() -> dict
Returns TensorInfo in a dictionary where the
key
is the name of attribute.
-
get_dimension(Dimension) -> int
Returns the specific dimension of the tensor, for example:
tensor_info.get_dimension(depthai.TensorInfo.Dimension.WIDTH)
returns theWIDTH
of tensor.
depthai.Detections
Container of neural network results decoded on device side.
Example of accessing detections
Assuming the detected objects are stored in detections
object.
-
Number of detections
detections.size()
orlen(detections)
-
Accessing the
x
-th detectiondetections[
x]
-
Iterating through all detections
for detection in detections:
handle detection
depthai.Detection
Detected object descriptor.
Attributes
-
label -> int
Label id of the detected object.
-
confidence -> float
Confidence score of the detected object in interval [0, 1].
-
x_min -> float
Top left
X
coordinate of the detected bounding box. Normalized, in interval [0, 1].
-
y_min -> float
Top left
Y
coordinate of the detected bounding box. Normalized, in interval [0, 1].
-
x_max -> float
Bottom right
X
coordinate of the detected bounding box. Normalized, in interval [0, 1].
-
y_max -> float
Bottom right
Y
coordinate of the detected bounding box. Normalized, in interval [0, 1].
-
depth_x -> float
Distance to detected bounding box on
X
axis. Only when depth calculation is enabled (stereo cameras are present on board).
-
depth_y -> float
Distance to detected bounding box on
Y
axis. Only when depth calculation is enabled (stereo cameras are present on board).
-
depth_z -> float
Distance to detected bounding box on
Z
axis. Only when depth calculation is enabled (stereo cameras are present on board).
Methods
-
get_dict() -> dict
Returns detected object in a dictionary where the
key
is the name of attribute.
depthai.TensorInfo.Dimension
Dimension descriptor of tensor shape.
Values
- Union[W, WIDTH] -> Width
- Union[H, HEIGHT] -> Height
- Union[C, CHANNEL] -> Number of channels
- Union[N, B, NUMBER, BATCH] -> Number/Batch of inferences
Note: Dimension
is mostly meaningful for input tensors since not all neural network models respect the semantics of Dimension
for output tensor. E.g. Width
might not mean Width
.
depthai.DataPacket
DepthAI data packet, containing information generated on the device. Unlike NNetPacket, it contains a single “result” with source stream info
Attributes
-
stream_name: str
Returns packet source stream. Used to determine the origin of the packet and therefore allows to handle the packets correctly, applying proper handling based on this value
Methods
-
getData() -> numpy.ndarray
Returns the data as NumPy array, which you can e.x. display the data using OpenCV
imshow
.Used with streams that returns frames e.x.
previewout
,left
,right
, or encoded data e.x.video
,jpegout
.
-
getDataAsStr() -> str
Returns the data as a string, capable to be parsed further.
Used with streams that returns non-array results e.x.
meta_d2h
which returns JSON object
-
getMetadata() -> depthai.FrameMetadata
Returns metadata object containing all proprietary data related to this packet
-
getObjectTracker() -> ObjectTracker
Returns result as an ObjectTracker instance, used only with packets from
object_tracker
stream
-
size() -> int
Returns packet data size
depthai.FrameMetadata
Metadata object attached to the packets sent via pipeline.
Methods
-
getCameraName() -> str
Returns the name of the camera that produced the frame.
-
getCategory() -> int
Returns the type of the packet, whether it’s a regular frame or arrived from taking a still
-
getFrameBytesPP() -> int
Returns number of bytes per pixel in the packet’s frame
-
getFrameHeight() -> int
Returns the height of the packet’s frame
-
getFrameWidth() -> int
Returns the width of the packet’s frame
-
getFrameType() -> int
Returns the type of the data that this packet contains.
-
getInstanceNum() -> int
Returns the camera id that is the source of the current packet
-
getSequenceNum() -> int
Sequence number is assigned for each frame produced by the camera. It can be used to assure the frames are captured at the same time - e.x. if frames from left and right camera have the same sequence number, you can assume they were taken at the same time
-
getStride() -> int
Specifies number of bytes till the next row of pixels in the packet’s frame
-
getTimestamp() -> float
When packet is created, it is assigned a creation timestamp, which can be obtained using this method
depthai.ObjectTracker
Object representing current state of the tracker, obtained by calling getObjectTracker
method on a packet from object_tracker
stream
Methods
-
getNrTracklets() -> int
Return the number of available tracklets
-
getTracklet(tracklet_nr: int) -> Tracklet
Returns the tracklet with specified
tracklet_nr
. To check how many tracklets there are, please usegetNrTracklets
method
depthai.Tracklet
Tracklet is representing a single tracked object, is produced by ObjectTracker
class.
To obtain it, call getTracklet
method.
Methods
-
getId() -> int
Return the tracklet id
-
getLabel() -> int
Return the tracklet label, being the neural network returned result. Used to identify a class of recognized objects
-
getLabel() -> str
Return the tracklet status - either
NEW
,TRACKED
, orLOST
.
-
getLeftCoord() -> int
Return the left coordinate of the bounding box of a tracked object
-
getRightCoord() -> int
Return the right coordinate of the bounding box of a tracked object
-
getTopCoord() -> int
Return the top coordinate of the bounding box of a tracked object
-
getBottomCoord() -> int
Return the bottom coordinate of the bounding box of a tracked object
Preparing MyriadX blob file and it’s config
As you can see in this example, basic usage of create_pipeline
method consists of specifying desired output
streams and AI section, where you specify MyriadX blob and it’s config.
In this section, we’ll describe how to obtain both blob_file
and blob_file_config
.
Obtaining MyriadX blob
Since we’re utilizing MyriadX VPU, your model needs to be compiled (or accurately - optimized and converted) into the MyriadX blob file, which will be sent to the device and executed.
Easiest way to obtain this blob is to use our online BlobConverter app. It has all tools needed for compilation so you don’t need to setup anything - and you can even download a blob for the model from OpenVINO model zoo
If you’d like, you can also compile the blob yourself. You’ll need to install OpenVINO toolkit, then use Model Optimizer and Myriad Compiler in order to obtain MyriadX blob. We’ve documented example usage of these compilers here
Creating Blob configuration file
If config file is not provided then there is no decoding done on device => output_format
is set to raw
. The decoding must be done on host side, by the user.
Currently there is support to decode Mobilenet-SSD
and (tiny-)YOLO-v3
based networks on the device.
For that config file is required with network specific parameters.
Example for tiny-yolo-v3
network:
{
"NN_config":
{
"output_format" : "detection",
"NN_family" : "YOLO",
"NN_specific_metadata" :
{
"classes" : 80,
"coordinates" : 4,
"anchors" : [10,14, 23,27, 37,58, 81,82, 135,169, 344,319],
"anchor_masks" :
{
"side26" : [1,2,3],
"side13" : [3,4,5]
},
"iou_threshold" : 0.5,
"confidence_threshold" : 0.5
}
},
"mappings":
{
"labels":
[
"person",
"bicycle",
"car",
...
]
}
}
NN_config
- configuration for the networkoutput_format
"detection"
- decoding done on device, the received packet is in Detections format"raw"
- decoding done on host
NN_family
-"YOLO"
or"mobilenet
”NN_specific_metadata
- only for"YOLO"
classes
- number of classescoordinates
- number of coordinatesanchors
- anchors for YOLO networkanchor_masks
- anchor mask for each output layer :26x26
,13x13
(+52x52
for full YOLO-v3)iou_threshold
- intersection over union threshold for detected objectconfidence_threshold
- score confidence threshold for detected object
mappings
labels
- label mapping for detected object ID
Example decoding for tiny-yolo-v3
, yolo-v3
, mobilenet-ssd
when output_format is set to detection
:
nnet_packets, data_packets = p.get_available_nnet_and_data_packets(blocking=True)
...
in_layers = nnet_packet.getInputLayersInfo() #get input layer information
# print(in_layers) #print input layer info for debugging
input_width = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.W) #width of input image
input_height = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.H) #height of input image
detections = nnet_packet.getDetectedObjects() #get detection container
objects = list() #create empty list of filtered objects
for detection in detections:
detection_dict = detection.get_dict()
# scale normalized coordinates to image coordinates
detection_dict["x_min"] = int(detection_dict["x_min"] * input_width)
detection_dict["y_min"] = int(detection_dict["y_min"] * input_height)
detection_dict["x_max"] = int(detection_dict["x_max"] * input_width)
detection_dict["y_max"] = int(detection_dict["y_max"] * input_height)
objects.append(detection_dict)
return objects
Example of decoding for full yolo-v3
and tiny-yolo-v3
on host and device
Example of decoding for mobilenet
based networks on host and device
Other installation methods
To get the latest and yet unreleased features from our source code, you can go ahead and compile depthai package manually.
Dependencies to build from source
- CMake > 3.2.0
- Generation tool (Ninja, make, …)
- C/C++ compiler
- libusb1 development package
Ubuntu, Raspberry Pi OS, … (Debian based systems)
On Debian based systems (Raspberyy Pi OS, Ubuntu, …) these can be acquired by running:
sudo apt-get -y install cmake libusb-1.0-0-dev build-essential
macOS (Mac OS X)
Assuming a stock Mac OS X install, depthai-python library needs following dependencies
- HomeBrew (If it’s not installed already)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
- Python, libusb, CMake, wget
brew install coreutils python3 cmake libusb wget
And now you’re ready to clone the depthai-python from Github and build it for Mac OS X.
You can install them all with the following command (if using Ubuntu)
Install using GitHub commit
Pip allows users to install the packages from specific commits, even if they are not yet released on PyPi.
To do so, use the command below - and be sure to replace the <commit_sha>
with the correct commit hash from here
python3 -m pip install git+https://github.com/luxonis/depthai-python.git@<commit_sha>
Using/Testing a Specific Branch/PR
From time to time, it may be of interest to use a specific branch. This may occur, for example, because we have listened to your feature request and implemented a quick implementation in a branch. Or it could be to get early access to a feature that is soaking in our develop
for stability purposes before being merged into main
.
So when working in the depthai repository, using a branch can be accomplished with the following commands. For this example, the branch
that we will try out is develop
(which is the branch we use to soak new features before merging them into main
):
Prior to running the following, you can either clone the respository independently (for not over-writing any of your local changes) or simply do a git pull
first.
git checkout develop
python3 -m pip install -U pip
python3 -m pip install -r requirements.txt
Install from source
If desired, you can also install the package from the source code itself - it will allow you to make the changes to the API and see them live in action.
To do so, first download the repository and then add the package to your python interpreter in development mode
git clone https://github.com/luxonis/depthai-python.git
cd depthai-python
git submodule update --init --recursive
python3 setup.py develop # you may need to add sudo if using system interpreter instead of virtual environment
If you want to use other branch (e.g. develop
) than default (main
), you can do so by typing
git checkout develop # replace the "develop" with a desired branch name
git submodule update --recursive
python3 setup.py develop
Or, if you want to checkout a specific commit, type
git checkout <commit_sha>
git submodule update --recursive
python3 setup.py develop