Object Detection with Azure Computer Vision
I recently came across a service in Azure called Computer Vision, which is part of a collection of Azure AI services. I decided to explore these services, starting with object detection in images.
This article focuses on my understanding and experience with it.
Setting Up Azure Computer Vision
We began by creating an Azure Computer Vision service, which is linked to an LLM model. This service provides a KEY and an ENDPOINT URL, which we use for detecting objects in images.
![Azure Computer Vision]()
Setting Up the Python Project
Once we had the necessary credentials, we set up a Python project and populated it with a collection of images downloaded from the internet.
For interacting with Azure Computer Vision, I used the azure-cognitiveservices-vision-computervision Python package, which is provided by Microsoft.
Code Flow
The overall logic of the implementation is as follows:
- Read images from a folder: The code scans a directory and its nested subdirectories using recursion.
- Resize images if necessary: To avoid errors due to large file sizes, images are resized to fit within a specified limit.
- Send images to Azure for analysis: The image is read as bytes and sent to Azure for object detection.
- Store detection results: The detected objects and persons in the images are saved in a
output.txt
file for better understanding.
Code Implementation
import time
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from PIL import Image
import os
import json
from dotenv import load_dotenv
# read values from .env file
load_dotenv()
AZURE_VISION_KEY = os.getenv("AZURE_VISION_KEY")
AZURE_VISION_ENDPOINT = os.getenv("AZURE_VISION_ENDPOINT")
IMAGE_FOLDER = os.getenv("IMAGE_FOLDER")
OUTPUT_FILE = os.getenv("OUTPUT_FILE")
objects_list = ["keyboard", "computer", "laptop", "key", "cow", "mammal", "chair", "nature", "mountain", "ice", "hills"]
client = ComputerVisionClient(AZURE_VISION_ENDPOINT, CognitiveServicesCredentials(AZURE_VISION_KEY))
visual_features = [VisualFeatureTypes.objects, VisualFeatureTypes.color]
def resize_image(image_path, max_size=4 * 1024 * 1024, target_size=(2048, 2048)):
"""
Resize the image if it exceeds the max size.
"""
with open(image_path, "rb") as image_file:
image_data = image_file.read()
if len(image_data) > max_size:
image = Image.open(image_path)
image.thumbnail(target_size, Image.LANCZOS)
image.save(image_path, format=image.format)
print(f"Resized image {image_path} to fit within the size limit.")
return image_path
def detect_object(image_path, object_list):
"""
Detect objects in the image and print the objects that are in the object_list.
"""
print(f"Detecting objects in {image_path}")
image_path = resize_image(image_path)
with open(image_path, "rb") as image_stream:
result = client.analyze_image_in_stream(image_stream, visual_features)
response = json.dumps(result.as_dict(), indent=4)
write_to_file("=====================================\n")
write_to_file(f"Image: {image_path}\n")
write_to_file(f"ImageData: {response}\n")
objects = result.objects
for obj in objects:
objects_split = obj.object_property.split(" ")
if any(x in objects_split for x in object_list):
print(f"Found {obj.object_property} in the image with confidence {obj.confidence}")
def write_to_file(data):
"""
Write data to the output file.
"""
with open(OUTPUT_FILE, "a") as file:
file.write(data)
def loop_through_images(file_dir_path):
"""
Loop through the images in the directory and call detect_object function.
"""
print('Inside directory:', file_dir_path)
for image in os.listdir(file_dir_path):
time.sleep(6)
if image == ".DS_Store":
continue
image_path = os.path.join(file_dir_path, image)
if os.path.isdir(image_path):
print(image_path)
loop_through_images(image_path)
else:
detect_object(image_path, object_list=objects_list)
def empty_output_file():
"""
Empty the output file.
"""
with open(OUTPUT_FILE, "w") as file:
file.write("")
file.close()
empty_output_file()
loop_through_images(file_dir_path=IMAGE_FOLDER)
Sample Output
The output.txt
file contains object detection results in JSON format. Below is an example of the output for different images:
=====================================
Image: images/2ppl2.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "White",
"dominant_color_background": "White",
"dominant_colors": [
"White"
],
"accent_color": "B66315",
"is_bw_img": false
},
"objects": [
{
"rectangle": {
"x": 66,
"y": 123,
"w": 192,
"h": 321
},
"object_property": "person",
"confidence": 0.679
}
],
"request_id": "68007b6d-709b-4291-997f-0dc138985c18",
"metadata": {
"width": 280,
"height": 453,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
=====================================
Image: images/6ppl.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "White",
"dominant_color_background": "White",
"dominant_colors": [
"White",
"Grey",
"Black"
],
"accent_color": "8A4E41",
"is_bw_img": false
},
"objects": [
{
"rectangle": {
"x": 47,
"y": 162,
"w": 79,
"h": 101
},
"object_property": "car",
"confidence": 0.541,
"parent": {
"object_property": "Land vehicle",
"confidence": 0.55,
"parent": {
"object_property": "Vehicle",
"confidence": 0.551
}
}
},
{
"rectangle": {
"x": 473,
"y": 139,
"w": 117,
"h": 138
},
"object_property": "car",
"confidence": 0.777,
"parent": {
"object_property": "Land vehicle",
"confidence": 0.782,
"parent": {
"object_property": "Vehicle",
"confidence": 0.782
}
}
},
{
"rectangle": {
"x": 1,
"y": 168,
"w": 99,
"h": 138
},
"object_property": "car",
"confidence": 0.693,
"parent": {
"object_property": "Land vehicle",
"confidence": 0.703,
"parent": {
"object_property": "Vehicle",
"confidence": 0.703
}
}
},
{
"rectangle": {
"x": 129,
"y": 123,
"w": 60,
"h": 204
},
"object_property": "person",
"confidence": 0.759
},
{
"rectangle": {
"x": 189,
"y": 133,
"w": 54,
"h": 189
},
"object_property": "person",
"confidence": 0.669
},
{
"rectangle": {
"x": 229,
"y": 136,
"w": 61,
"h": 191
},
"object_property": "person",
"confidence": 0.829
},
{
"rectangle": {
"x": 280,
"y": 119,
"w": 70,
"h": 212
},
"object_property": "person",
"confidence": 0.889
},
{
"rectangle": {
"x": 351,
"y": 117,
"w": 69,
"h": 224
},
"object_property": "person",
"confidence": 0.88
},
{
"rectangle": {
"x": 407,
"y": 116,
"w": 68,
"h": 219
},
"object_property": "person",
"confidence": 0.878
},
{
"rectangle": {
"x": 536,
"y": 161,
"w": 64,
"h": 162
},
"object_property": "car",
"confidence": 0.767,
"parent": {
"object_property": "Land vehicle",
"confidence": 0.778,
"parent": {
"object_property": "Vehicle",
"confidence": 0.778
}
}
}
],
"request_id": "3de58328-df9e-4a21-9272-02fcdbfb83d6",
"metadata": {
"width": 600,
"height": 400,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
=====================================
Image: images/bw.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "Grey",
"dominant_color_background": "Grey",
"dominant_colors": [
"Grey",
"Black",
"White"
],
"accent_color": "535654",
"is_bw_img": true
},
"objects": [
{
"rectangle": {
"x": 370,
"y": 0,
"w": 150,
"h": 39
},
"object_property": "window",
"confidence": 0.511
},
{
"rectangle": {
"x": 151,
"y": 258,
"w": 341,
"h": 486
},
"object_property": "arch",
"confidence": 0.541
}
],
"request_id": "0ff3ccef-72e1-4517-82eb-5d85abfa6664",
"metadata": {
"width": 600,
"height": 800,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
=====================================
Image: images/manyppl.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "Black",
"dominant_color_background": "Black",
"dominant_colors": [
"Black",
"Grey"
],
"accent_color": "102728",
"is_bw_img": false
},
"objects": [
{
"rectangle": {
"x": 69,
"y": 94,
"w": 90,
"h": 213
},
"object_property": "person",
"confidence": 0.757
},
{
"rectangle": {
"x": 141,
"y": 130,
"w": 44,
"h": 159
},
"object_property": "person",
"confidence": 0.659
},
{
"rectangle": {
"x": 289,
"y": 117,
"w": 43,
"h": 171
},
"object_property": "person",
"confidence": 0.543
},
{
"rectangle": {
"x": 305,
"y": 118,
"w": 65,
"h": 182
},
"object_property": "person",
"confidence": 0.829
},
{
"rectangle": {
"x": 366,
"y": 129,
"w": 54,
"h": 167
},
"object_property": "person",
"confidence": 0.847
},
{
"rectangle": {
"x": 430,
"y": 139,
"w": 47,
"h": 142
},
"object_property": "person",
"confidence": 0.772
},
{
"rectangle": {
"x": 560,
"y": 122,
"w": 40,
"h": 180
},
"object_property": "person",
"confidence": 0.843
},
{
"rectangle": {
"x": 227,
"y": 127,
"w": 59,
"h": 196
},
"object_property": "person",
"confidence": 0.785
},
{
"rectangle": {
"x": 479,
"y": 134,
"w": 73,
"h": 181
},
"object_property": "person",
"confidence": 0.826
}
],
"request_id": "918b1ee2-b5b7-43de-8a3d-ab273b1f44fc",
"metadata": {
"width": 600,
"height": 338,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
=====================================
Image: images/3ppl.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "Green",
"dominant_color_background": "Brown",
"dominant_colors": [
"Brown"
],
"accent_color": "7DB01B",
"is_bw_img": false
},
"objects": [
{
"rectangle": {
"x": 147,
"y": 139,
"w": 158,
"h": 439
},
"object_property": "person",
"confidence": 0.823
},
{
"rectangle": {
"x": 386,
"y": 119,
"w": 193,
"h": 476
},
"object_property": "person",
"confidence": 0.876
},
{
"rectangle": {
"x": 276,
"y": 337,
"w": 130,
"h": 256
},
"object_property": "person",
"confidence": 0.799
}
],
"request_id": "b3766a91-60b5-426b-9356-e859bf59e053",
"metadata": {
"width": 600,
"height": 900,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
=====================================
Image: images/1ppl.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "Black",
"dominant_color_background": "Black",
"dominant_colors": [
"Black"
],
"accent_color": "3A6E91",
"is_bw_img": false
},
"objects": [
{
"rectangle": {
"x": 145,
"y": 11,
"w": 356,
"h": 381
},
"object_property": "person",
"confidence": 0.901
}
],
"request_id": "79833d20-c39e-4fcd-b70f-7bc89bade3be",
"metadata": {
"width": 600,
"height": 400,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
=====================================
Image: images/2ppl1.jpeg
ImageData: {
"color": {
"dominant_color_foreground": "White",
"dominant_color_background": "White",
"dominant_colors": [
"White",
"Blue"
],
"accent_color": "0C78BF",
"is_bw_img": false
},
"objects": [
{
"rectangle": {
"x": 174,
"y": 26,
"w": 148,
"h": 273
},
"object_property": "person",
"confidence": 0.748
},
{
"rectangle": {
"x": 273,
"y": 69,
"w": 223,
"h": 329
},
"object_property": "person",
"confidence": 0.844
},
{
"rectangle": {
"x": 3,
"y": 148,
"w": 496,
"h": 259
},
"object_property": "car",
"confidence": 0.539,
"parent": {
"object_property": "Land vehicle",
"confidence": 0.827,
"parent": {
"object_property": "Vehicle",
"confidence": 0.852
}
}
}
],
"request_id": "26d9b732-ffdf-47c3-9a62-7ed327140dc6",
"metadata": {
"width": 600,
"height": 450,
"format": "Jpeg"
},
"model_version": "2021-05-01"
}
Now that we have the result, I wanted to discuss a couple of behaviors that I noticed in the image detection
![Demo]()
In the above image, the two people do not have space between them; due to this, the model detects them as a single person. This is even when the image quality is good.
![Demo]()
As the above image is black and white, the response does add a property in response json "is_bw_img": true, which specifies that the image is indeed a black and white image.
While the model is quite smart, one of the requirements is that the image quality should be decent enough for the model to detect objects.
I have pushed the code to the github repo: https://github.com/vipulm124/azure-vision-object-detection
Please feel free to clone and play around.