[openCV] YOLO object detection (video)

Computer Vision 2022. 7. 30. 15:13

openCV로 YOLO를 실행하기 위해서 먼저 훈련된 모델과 설정 파일을 다운로드 해줘야한다.

여기서 coco.names, yolov3.cfg, yolov3.weights 세 가지 파일을 다운로드 받아서 py 파일과 같은 경로에 넣어준다.

YOLO: Real-Time Object Detection

YOLO: Real-Time Object Detection You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev. Comparison to Other Detectors YOLOv3 is extremel

pjreddie.com

yolo를 로드하는 코드는 밑의 블로그를 참고해서 작성했다.

https://bong-sik.tistory.com/16

Python으로 OpenCV를 사용하여 YOLO Object detection

이번엔 뜬금없이 영상처리다... 살면서 한번도 안해봤고 해볼거라고 생각도 못했음. 하지만 시키니까 합니다... https://pysource.com/2019/06/27/yolo-object-detection-using-opencv-with-python/ YOLO object d..

bong-sik.tistory.com

전체 코드는 아래와 같다.

import cv2 as cv
import numpy as np

# YOLO load
net = cv.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

img = cv.imread("sample.jpg")
img = cv.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        
        # confidence가 0.5 이상인 경우에만 출력
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        color = colors[i]
        cv.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv.putText(img, label, (x, y + 30), cv.FONT_HERSHEY_PLAIN, 3, color, 3)
        
cv.imshow("Detection", img)
cv.waitKey(0)
cv.destroyAllWindows()

아래 줄에서 오류가 나서

output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

이렇게 바꿔주었더니 문제없이 실행이 되었다.

output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

사진에 테스트 해보고 나니까 비디오로 해보고 싶다는 생각이 들었다.

동영상에서 프레임을 추출해서 7 프레임마다 object detection을 수행하고 이미지로 보여주도록 코드를 수정했다.

직접 해보니까 정말 재밌다!!

전체코드

import cv2 as cv
import numpy as np

# YOLO 로드
net = cv.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# 이미지 가져오기
video = cv.VideoCapture('IMG_4753.mov')

count = 0
while video.isOpened():
    ret, img = video.read()
    if not ret:
        break
    if int(video.get(1)) % 7 == 0:
        img = cv.resize(img, None, fx=0.4, fy=0.4)
        height, width, channels = img.shape

        # Detecting objects
        blob = cv.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
        net.setInput(blob)
        outs = net.forward(output_layers)

        # 정보를 화면에 표시
        class_ids = []
        confidences = []
        boxes = []
        for out in outs:
            for detection in out:
                scores = detection[5:]
                class_id = np.argmax(scores)
                confidence = scores[class_id]
                if confidence > 0.5:
                    # Object detected
                    center_x = int(detection[0] * width)
                    center_y = int(detection[1] * height)
                    w = int(detection[2] * width)
                    h = int(detection[3] * height)
                    # 좌표
                    x = int(center_x - w / 2)
                    y = int(center_y - h / 2)
                    boxes.append([x, y, w, h])
                    confidences.append(float(confidence))
                    class_ids.append(class_id)

        indexes = cv.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
        font = cv.FONT_HERSHEY_PLAIN
        for i in range(len(boxes)):
            if i in indexes:
                x, y, w, h = boxes[i]
                label = str(classes[class_ids[i]])
                color = colors[i]
                cv.rectangle(img, (x, y), (x + w, y + h), color, 2)
                cv.putText(img, label, (x, y + 30), font, 3, color, 3)

        cv.imshow('object_detection', img)
        key = cv.waitKey(1)
        if key == 32:  # Space
            key = cv.waitKey(0)
        if key == 27:  # ESC
            break
cv.destroyAllWindows()

저작자표시

'Computer Vision' 카테고리의 다른 글

OverFeat 논문 요약 (1)	2023.03.06
You Only Look Once (YOLO) 논문 요약 (0)	2023.02.22
Ground-aware Monocular 3D Object Detection for Autonomous Driving 논문 공부 (0)	2022.08.05
[openCV] Object height 알아내기(python) (0)	2022.07.25
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer 논문 공부 (0)	2022.07.12

ABOUT ME

JH's Tech Blog JH's Tech Blog

'Computer Vision' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'Computer Vision' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바