← Back to Blog
pythonopencvmediapipecomputer-vision

Gesture-Controlled Camera Filters Using Python, OpenCV & MediaPipe

Gesture-Controlled Camera Filters Using Python, OpenCV & MediaPipe

What if you could switch camera filters just by holding up fingers? No keyboard, no mouse — pure gesture control.

In this tutorial we'll build a real-time system that detects hand gestures via webcam and applies different visual filters depending on how many fingers you're showing.


Technologies

Library Role
Python Primary language
OpenCV Video capture and image processing
MediaPipe Hand landmark detection
NumPy Matrix operations for filters

How Gesture Detection Works

MediaPipe detects 21 landmarks per hand. We determine if a finger is "up" by comparing the fingertip landmark position to the lower joint:

  • Thumb — compared on the x-axis (horizontal)
  • Other fingers — compared on the y-axis (vertical)
text
Fingertip y < Lower joint y  →  Finger is UP ✓

Project Structure

text
gesture-filters/
├── app.py              # Main loop
├── filters.py          # Filter functions
├── gesture.py          # Hand detection logic
└── requirements.txt
text
# requirements.txt
opencv-python
mediapipe
numpy

Gesture Detection Module

python
# gesture.py
import mediapipe as mp

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=1, min_detection_confidence=0.7)
mp_draw = mp.solutions.drawing_utils

TIP_IDS = [4, 8, 12, 16, 20]  # thumb, index, middle, ring, pinky


def count_fingers(frame):
    rgb = frame[:, :, ::-1]  # BGR to RGB
    result = hands.process(rgb)

    if not result.multi_hand_landmarks:
        return 0, frame

    hand = result.multi_hand_landmarks[0]
    lm = hand.landmark
    mp_draw.draw_landmarks(frame, hand, mp_hands.HAND_CONNECTIONS)

    fingers = []

    # Thumb (horizontal comparison)
    fingers.append(1 if lm[TIP_IDS[0]].x < lm[TIP_IDS[0] - 1].x else 0)

    # Other four fingers (vertical comparison)
    for tip in TIP_IDS[1:]:
        fingers.append(1 if lm[tip].y < lm[tip - 2].y else 0)

    return sum(fingers), frame

Filters Module

python
# filters.py
import cv2
import numpy as np


def apply_grayscale(frame):
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    return cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)


def apply_blur(frame):
    return cv2.GaussianBlur(frame, (21, 21), 0)


def apply_sepia(frame):
    kernel = np.array([
        [0.272, 0.534, 0.131],
        [0.349, 0.686, 0.168],
        [0.393, 0.769, 0.189],
    ])
    sepia = cv2.transform(frame, kernel)
    return np.clip(sepia, 0, 255).astype(np.uint8)


def apply_edges(frame):
    edges = cv2.Canny(frame, 100, 200)
    return cv2.cvtColor(edges, cv2.COLOR_GRAY2BGR)


FILTERS = {
    0: ("No Filter",  lambda f: f),
    1: ("Grayscale",  apply_grayscale),
    2: ("Blur",       apply_blur),
    3: ("Sepia",      apply_sepia),
    4: ("Edge Detect",apply_edges),
}

Main Application

python
# app.py
import cv2
from gesture import count_fingers
from filters import FILTERS

cap = cv2.VideoCapture(0)
print("Show fingers to switch filters. Press 'q' to quit.")

while True:
    ret, frame = cap.read()
    if not ret:
        break

    frame = cv2.flip(frame, 1)  # Mirror effect
    finger_count, frame = count_fingers(frame)

    name, fn = FILTERS.get(finger_count, FILTERS[0])
    output = fn(frame)

    # HUD overlay
    cv2.putText(output, f"Fingers: {finger_count}", (10, 40),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.putText(output, f"Filter: {name}", (10, 80),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    cv2.imshow("Gesture Filters", output)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Running It

bash
pip install opencv-python mediapipe numpy
python app.py

Hold up fingers in front of your webcam:

Fingers Filter
0 Normal
1 Grayscale
2 Blur
3 Sepia
4 Edge Detection

Ideas for Extension

  • Gesture cooldown — prevent rapid switching with a 1-second delay
  • Both hands — left hand for filters, right hand for intensity
  • Face filters — combine with MediaPipe Face Mesh
  • Record — save filtered video with OpenCV's VideoWriter
  • Desktop app — package with PyInstaller

This is a great project to demonstrate real-time computer vision skills. The same architecture applies to sign language recognition, fitness tracking, and AR applications.

Share this article
← Back to all articles