cerebras.modelzoo.data.multimodal.datasets.features.Bbox#

class cerebras.modelzoo.data.multimodal.datasets.features.Bbox(XMin, YMin, XMax, YMax, ClassLabel, ClassIntID, ClassID=None, IsOccluded=None, IsTruncated=None, IsGroupOf=None, IsDepiction=None, IsInside=None, IsTrainable=None, Source=None, Confidence=None)[source]#

Bases: object

Source: indicates how the box was made:

xclick: are manually drawn boxes using the method presented in [1], were the annotators click on the four extreme points of the object. In V6 we release the actual 4 extreme points for all xclick boxes in train (13M), see below. activemil: are boxes produced using an enhanced version of the method [2]. These are human verified to be accurate at IoU>0.7.

LabelName: the MID of the object class this box belongs to. Confidence: a dummy value, always 1. XMin, XMax, YMin, YMax: coordinates of the box, in normalized image coordinates. XMin is in [0,1], where 0 is the leftmost pixel, and 1 is the rightmost pixel in the image. Y coordinates go from the top pixel (0) to the bottom pixel (1). IsOccluded: Indicates that the object is occluded by another object in the image. IsTruncated: Indicates that the object extends beyond the boundary of the image. IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching. IsDepiction: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance). IsInside: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building). For each of them, value 1 indicates present, 0 not present, and -1 unknown.

Methods

bbox_to_tensor

labelID_to_tensor

Attributes

ClassID

Confidence

IsDepiction

IsGroupOf

IsInside

IsOccluded

IsTrainable

IsTruncated

Source

XMin

YMin

XMax

YMax

ClassLabel

ClassIntID