MWLDE
Work/CSCI323 Modern AI

ASL Alphabet Recognition

A deep learning system classifying ASL hand gestures using CNNs. Compared a lightweight baseline against VGG16/ResNet transfer learning on ~27,000 labelled images. The baseline achieved 100% validation accuracy in under 60 seconds.

PythonTensorFlow/KerasOpenCVScikit-learnGoogle Colab
Spring 2026 · CSCI323 Modern AI
GitHub
hero image
Overview

A deep learning system for classifying American Sign Language hand gestures. The project compared multiple CNN architectures — a lightweight custom baseline against industry-standard transfer learning approaches (VGG16, ResNet50) trained on a dataset of approximately 27,000 labelled images.

The goal was to understand the real trade-offs between model complexity and performance: does a deeper, pre-trained network always win? The results were surprising.

The Approach

Three architectures were evaluated side by side:

  1. Baseline CNN — a lightweight custom model designed for speed
  2. VGG16 — fine-tuned from ImageNet weights
  3. ResNet50 — fine-tuned from ImageNet weights

Each model was trained on the same dataset split and evaluated on held-out validation data. Training was done on Google Colab with GPU acceleration.

Model Architecture

The baseline CNN was intentionally lean — three convolutional blocks with max pooling, followed by a dense head with dropout. Simple enough to train in under a minute, expressive enough to learn the task.

Code
def build_model(num_classes=29):
    model = Sequential([
        Conv2D(32, (3,3), activation='relu',
               input_shape=(64, 64, 3)),
        MaxPooling2D(2, 2),
        Conv2D(64, (3,3), activation='relu'),
        MaxPooling2D(2, 2),
        Conv2D(128, (3,3), activation='relu'),
        Flatten(),
        Dense(128, activation='relu'),
        Dropout(0.5),
        Dense(num_classes, activation='softmax'),
    ])
    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy'],
    )
    return model
Results

The baseline model outperformed the transfer learning approaches on this task — 100% validation accuracy in under 60 seconds of training. The dataset was clean and domain-specific enough that a simple architecture, well-tuned, was sufficient.

100%
Val. Accuracy
27k
Images
<60s
Train Time
Gallery

Training curves and the per-class confusion matrix from the final baseline evaluation run.

Figure 1 — Training curves
Figure 2 — Confusion matrix
What I Learned

Transfer learning is a powerful default, but not always the right answer. For well-constrained, domain-specific tasks with clean data, a simple purpose-built model can match or exceed large pre-trained networks at a fraction of the compute cost.

The other lesson: benchmarking matters. Without running all three architectures on the same data under the same conditions, I'd have guessed the more complex models would win.