Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

Overview

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection.

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

Usually, object detection models trains to detect common classes of objects such as "car", "person", "cup", "bottle". But sometimes we need to detect more complex classes such as "lady in the red dress", "bottle of whiskey", or "where is my red cup" instead of "person", "bottle", "cup" respectively. One way to solve this problem is to train more complex detectors that can detect more complex classes, but we propose to use text-driven object detection that allows detecting any complex classes that can be described by natural language. This library is written to rank predicted bounding boxes using text/image descriptions of complex classes.

Install package

pip install pytorch_clip_bbox

Install the latest version

pip install --upgrade git+https://github.com/bes-dev/pytorch_clip_bbox.git

Features

  • The library supports multiple prompts (images or texts) as targets for filtering.
  • The library automatically detects the language of the input text, and multilingual translate it via google translate.
  • The library supports the original CLIP model by OpenAI and ruCLIP model by SberAI.
  • Simple integration with different object detection models.

Usage

We provide examples to integrate our library with different popular object detectors like: YOLOv5, MaskRCNN. Please, follow to examples to find more examples.

Simple example to integrate pytorch_clip_bbox with MaskRCNN model

$ pip install -r wheel cython opencv-python numpy torch torchvision pytorch_clip_bbox
args.confidence][-1] boxes = [[int(b) for b in box] for box in list(pred[0]['boxes'].detach().cpu().numpy())][:pred_threshold + 1] masks = (pred[0]['masks'] > 0.5).squeeze().detach().cpu().numpy()[:pred_threshold + 1] ranking = clip_bbox(image, boxes, top_k=args.top_k) for key in ranking.keys(): if key == "loss": continue for box in ranking[key]["ranking"]: mask, color = get_coloured_mask(masks[box["idx"]]) image = cv2.addWeighted(image, 1, mask, 0.5, 0) x1, y1, x2, y2 = box["rect"] cv2.rectangle(image, (x1, y1), (x2, y2), color, 6) cv2.rectangle(image, (x1, y1), (x2, y1-100), color, -1) cv2.putText(image, ranking[key]["src"], (x1 + 5, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 4, (0, 0, 0), thickness=5) if args.output_image is None: cv2.imshow("image", image) cv2.waitKey() else: cv2.imwrite(args.output_image, image) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("-i", "--image", type=str, help="Input image.") parser.add_argument("--device", type=str, default="cuda:0", help="inference device.") parser.add_argument("--confidence", type=float, default=0.7, help="confidence threshold [MaskRCNN].") parser.add_argument("--text-prompt", type=str, default=None, help="Text prompt.") parser.add_argument("--image-prompt", type=str, default=None, help="Image prompt.") parser.add_argument("--clip-type", type=str, default="clip_vit_b32", help="Type of CLIP model [ruclip, clip_vit_b32, clip_vit_b16].") parser.add_argument("--top-k", type=int, default=1, help="top_k predictions will be returned.") parser.add_argument("--output-image", type=str, default=None, help="Output image name.") args = parser.parse_args() main(args)">
import argparse
import random
import cv2
import numpy as np
import torch
import torchvision.transforms as T
import torchvision
from pytorch_clip_bbox import ClipBBOX

def get_coloured_mask(mask):
    colours = [[0, 255, 0],[0, 0, 255],[255, 0, 0],[0, 255, 255],[255, 255, 0],[255, 0, 255],[80, 70, 180],[250, 80, 190],[245, 145, 50],[70, 150, 250],[50, 190, 190]]
    r = np.zeros_like(mask).astype(np.uint8)
    g = np.zeros_like(mask).astype(np.uint8)
    b = np.zeros_like(mask).astype(np.uint8)
    c = colours[random.randrange(0,10)]
    r[mask == 1], g[mask == 1], b[mask == 1] = c
    coloured_mask = np.stack([r, g, b], axis=2)
    return coloured_mask, c

def main(args):
    # build detector
    detector = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True).eval().to(args.device)
    clip_bbox = ClipBBOX(clip_type=args.clip_type).to(args.device)
    # add prompts
    if args.text_prompt is not None:
        for prompt in args.text_prompt.split(","):
            clip_bbox.add_prompt(text=prompt)
    if args.image_prompt is not None:
        image = cv2.cvtColor(cv2.imread(args.image_prompt), cv2.COLOR_BGR2RGB)
        image = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0)
        image = img / 255.0
        clip_bbox.add_prompt(image=image)
    image = cv2.imread(args.image)
    pred = detector([
        T.ToTensor()(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)).to(args.device)
    ])
    pred_score = list(pred[0]['scores'].detach().cpu().numpy())
    pred_threshold = [pred_score.index(x) for x in pred_score if x > args.confidence][-1]
    boxes = [[int(b) for b in box] for box in list(pred[0]['boxes'].detach().cpu().numpy())][:pred_threshold + 1]
    masks = (pred[0]['masks'] > 0.5).squeeze().detach().cpu().numpy()[:pred_threshold + 1]
    ranking = clip_bbox(image, boxes, top_k=args.top_k)
    for key in ranking.keys():
        if key == "loss":
            continue
        for box in ranking[key]["ranking"]:
            mask, color = get_coloured_mask(masks[box["idx"]])
            image = cv2.addWeighted(image, 1, mask, 0.5, 0)
            x1, y1, x2, y2 = box["rect"]
            cv2.rectangle(image, (x1, y1), (x2, y2), color, 6)
            cv2.rectangle(image, (x1, y1), (x2, y1-100), color, -1)
            cv2.putText(image, ranking[key]["src"], (x1 + 5, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 4, (0, 0, 0), thickness=5)
    if args.output_image is None:
        cv2.imshow("image", image)
        cv2.waitKey()
    else:
        cv2.imwrite(args.output_image, image)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--image", type=str, help="Input image.")
    parser.add_argument("--device", type=str, default="cuda:0", help="inference device.")
    parser.add_argument("--confidence", type=float, default=0.7, help="confidence threshold [MaskRCNN].")
    parser.add_argument("--text-prompt", type=str, default=None, help="Text prompt.")
    parser.add_argument("--image-prompt", type=str, default=None, help="Image prompt.")
    parser.add_argument("--clip-type", type=str, default="clip_vit_b32", help="Type of CLIP model [ruclip, clip_vit_b32, clip_vit_b16].")
    parser.add_argument("--top-k", type=int, default=1, help="top_k predictions will be returned.")
    parser.add_argument("--output-image", type=str, default=None, help="Output image name.")
    args = parser.parse_args()
    main(args)
Owner
Sergei Belousov
Sergei Belousov
State of the Art Neural Networks for Generative Deep Learning

pyradox-generative State of the Art Neural Networks for Generative Deep Learning Table of Contents pyradox-generative Table of Contents Installation U

Ritvik Rastogi 8 Sep 29, 2022
Draw like Bob Ross using the power of Neural Networks (With PyTorch)!

Draw like Bob Ross using the power of Neural Networks! (+ Pytorch) Learning Process Visualization Getting started Install dependecies Requires python3

Kendrick Tan 116 Mar 07, 2022
Protect against subdomain takeover

domain-protect scans Amazon Route53 across an AWS Organization for domain records vulnerable to takeover deploy to security audit account scan your en

OVO Technology 0 Nov 17, 2022
Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.

simplified_mediapipe_face_landmarks Extracts essential Mediapipe face landmarks and arranges them in a sequenced order. The default 478 Mediapipe face

Irfan 13 Oct 04, 2022
This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

GBW This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices. Ap

Andi Han 0 Oct 22, 2021
Data labels and scripts for fastMRI.org

fastMRI+: Clinical pathology annotations for the fastMRI dataset The fastMRI dataset is a publicly available MRI raw (k-space) dataset. It has been us

Microsoft 51 Dec 22, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
Workshop Materials Delivered on 28/02/2022

intro-to-cnn-p1 Repo for hosting workshop materials delivered on 28/02/2022 Questions you will answer in this workshop Learning Objectives What are co

Beginners Machine Learning 5 Feb 28, 2022
[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Garment4D [PDF] | [OpenReview] | [Project Page] Overview This is the codebase for our NeurIPS 2021 paper Garment4D: Garment Reconstruction from Point

Fangzhou Hong 112 Dec 23, 2022
Deep Multi-Magnification Network for multi-class tissue segmentation of whole slide images

Deep Multi-Magnification Network This repository provides training and inference codes for Deep Multi-Magnification Network published here. Deep Multi

Computational Pathology 12 Aug 06, 2022
Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

Takahiro Maeda 12 Oct 10, 2022
An adaptive hierarchical energy management strategy for hybrid electric vehicles

An adaptive hierarchical energy management strategy This project contains the source code of an adaptive hierarchical EMS combining heuristic equivale

19 Dec 13, 2022
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

Yue Zhao 6.6k Jan 03, 2023
A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

LSTM-Time-Series-Prediction A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi Contest. The Link of the Cont

KevinCHEN 1 Jun 13, 2022
Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline. The pipeline accepts english text as input and returns the French translation.

Afropunk Technologist 1 Jan 24, 2022
Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

Graph-Based Local Trajectory Planner The graph-based local trajectory planner is python-based and comes with open interfaces as well as debug, visuali

TUM - Institute of Automotive Technology 160 Jan 04, 2023
Face Detection & Age Gender & Expression & Recognition

Face Detection & Age Gender & Expression & Recognition

Sajjad Ayobi 188 Dec 28, 2022
This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Influence Selection for Active Learning (ISAL) This project hosts the code for implementing the ISAL algorithm for object detection and image classifi

25 Sep 11, 2022
Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

Pacman AI Jussi Doherty CAP 4601 - Introduction to Artificial Intelligence - Fall 2020 Python version 3.0+ Source of this project This repo contains a

Jussi Doherty 1 Jan 03, 2022
PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

IBRNet: Learning Multi-View Image-Based Rendering PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021. IBRN

Google Interns 371 Jan 03, 2023