Categories
AI Programming Python

Ditching Ultralytics: Training YOLOX for Aircraft Detection

I’ve been building SkySpottr, an AR app overlaying aircraft information on your phone’s screen, using your device’s location, orientation, and incoming aircraft data (ADS-B) to predict where planes should appear on screen, then uses a YOLO model to lock onto the actual aircraft and refine the overlay. YOLOv8 worked great for this… until I actually read the license.

Welcome to Austin’s Nerdy Things, where we train from scratch entire neural networks to avoid talking to lawyers.

The Problem with Ultralytics

YOLOvWhatver is excellent. Fast, accurate, easy to use, great documentation. But Ultralytics licenses it under AGPL-3.0, which means if you use it in a product, you either need to open-source your entire application or pay for a commercial license. For a side project AR app that I might eventually monetize? That’s a hard pass.

Enter YOLOX from Megvii (recommended by either ChatGPT or Claude, can’t remember which, as an alternative). MIT licensed. Do whatever you want with it. The catch? You have to train your own models from scratch instead of using Ultralytics’ pretrained weights and easy fine-tuning pipeline. I have since learned there are some pretrained models. I didn’t use them.

So training from scratch is what I did. Over a few late nights in December 2025, I went from zero YOLOX experience to running custom-trained aircraft detection models in my iOS app. Here’s how it went.

The Setup

Hardware: RTX 3090 on my Windows machine, COCO2017 dataset on network storage (which turned out to be totally fine for training speed), and way too many terminal windows open.

I started with the official YOLOX repo and the aircraft class from COCO2017. The dataset has about 3,000 training images with airplanes, which is modest but enough to get started.

git clone https://github.com/Megvii-BaseDetection/YOLOX
pip install -v -e .

The first training run failed immediately because I forgot to install YOLOX as a package. Classic. Then it failed again because I was importing a class that didn’t exist in the version I had. Claude (who was helping me through this, and hallucinated said class) apologized and fixed the import. We got there eventually.

Training Configs: Nano, Tiny, Small, and “Nanoish”

YOLOX has a nice inheritance-based config system. You create a Python file, inherit from a base experiment class, and override what you want. I ended up with four different configs:

  • yolox_nano_aircraft.py – The smallest. 0.9M params, 1.6 GFLOPs. Runs on anything.
  • yolox_tiny_aircraft.py – Slightly bigger with larger input size for small object detection.
  • yolox_small_aircraft.py – 5M params, 26 GFLOPs. The “serious” model.
  • yolox_nanoish_aircraft.py – My attempt at something between nano and tiny.

The “nanoish” config was my own creation where I tried to find a sweet spot. I bumped the width multiplier from 0.25 to 0.33 and… immediately got a channel mismatch error because 0.33 doesn’t divide evenly into the architecture. Turns out you can’t just pick arbitrary numbers. I am a noob at these things. Lesson learned.

After some back-and-forth, I settled on a config with 0.3125 width (which is 0.25 \* 1.25, mathematically clean) and 512×512 input. This gave me roughly 1.2M params – bigger than nano, smaller than tiny, and it actually worked.

Here’s the small model config – the one that ended up in production. The key decisions are width = 0.50 (2x wider than nano for better feature extraction), 640×640 input for small object detection, and full mosaic + mixup augmentation:

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()

        # Model config - YOLOX-Small architecture
        self.num_classes = 1  # Single class: airplane
        self.depth = 0.33
        self.width = 0.50  # 2x wider than nano for better feature extraction

        # Input/output config - larger input helps small object detection
        self.input_size = (640, 640)
        self.test_size = (640, 640)
        self.multiscale_range = 5  # Training will vary from 480-800

        # Data augmentation
        self.mosaic_prob = 1.0
        self.mosaic_scale = (0.1, 2.0)
        self.enable_mixup = True
        self.mixup_prob = 1.0
        self.flip_prob = 0.5
        self.hsv_prob = 1.0

        # Training config
        self.warmup_epochs = 5
        self.max_epoch = 400
        self.no_aug_epochs = 100
        self.basic_lr_per_img = 0.01 / 64.0
        self.scheduler = "yoloxwarmcos"

    def get_model(self):
        from yolox.models import YOLOX, YOLOPAFPN, YOLOXHead

        in_channels = [256, 512, 1024]
        # Small uses standard convolutions (no depthwise)
        backbone = YOLOPAFPN(self.depth, self.width, in_channels=in_channels, act=self.act)
        head = YOLOXHead(self.num_classes, self.width, in_channels=in_channels, act=self.act)
        self.model = YOLOX(backbone, head)
        return self.model

And the nanoish config for comparison – note the depthwise=True and the width of 0.3125 (5/16) that I landed on after the channel mismatch debacle:

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()

        self.num_classes = 1
        self.depth = 0.33
        self.width = 0.3125  # 5/16 - halfway between nano (0.25) and tiny (0.375)

        self.input_size = (512, 512)
        self.test_size = (512, 512)

        # Lighter augmentation than small - this model is meant to be fast
        self.mosaic_prob = 0.5
        self.mosaic_scale = (0.5, 1.5)
        self.enable_mixup = False

    def get_model(self):
        from yolox.models import YOLOX, YOLOPAFPN, YOLOXHead

        in_channels = [256, 512, 1024]
        backbone = YOLOPAFPN(self.depth, self.width, in_channels=in_channels,
                             act=self.act, depthwise=True)  # Depthwise = lighter
        head = YOLOXHead(self.num_classes, self.width, in_channels=in_channels,
                         act=self.act, depthwise=True)
        self.model = YOLOX(backbone, head)
        return self.model

Training is then just:

python tools/train.py -f yolox_small_aircraft.py -d 1 -b 16 --fp16 -c yolox_s.pth

The -c yolox_s.pth loads YOLOX’s pretrained COCO weights as a starting point (transfer learning). The -d 1 is one GPU, -b 16 is batch size 16 (about 8GB VRAM on the 3090 with fp16), and --fp16 enables mixed precision training.

The Small Object Problem

Here’s the thing about aircraft detection for an AR app: planes at cruise altitude look tiny. A 747-8 at 37,000 feet is maybe 20-30 pixels on your phone screen if you’re lucky, even with the 4x optical zoom of the newest iPhones (8x for the 12MP weird zoom mode). Standard YOLO models are tuned for reasonable-sized objects, not specks in the sky. The COCO dataset has aircraft that are reasonably sized, like when you’re sitting at your gate at an airport and take a picture of the aircraft 100 ft in front of you.

My first results were underwhelming. The nano model was detecting larger aircraft okay but completely missing anything at altitude. The evaluation metrics looked like this:

AP for airplane = 0.234
AR for small objects = 0.089

Not great. The model was basically only catching aircraft on approach or takeoff.

For the small config, I made some changes to help with tiny objects:

  • Increased input resolution to 640×640 (more pixels = more detail for small objects)
  • Enabled full mosaic and mixup augmentation (helps the model see varied object scales)
  • Switched from depthwise to regular convolutions (more capacity)
  • (I’ll be honest, I was leaning heavily on Claude for the ML-specific tuning decisions here)

This pushed the model to 26 GFLOPs though, which had me worried about phone performance.

Here’s what the small model’s accuracy looked like broken down by object size. You can see AP for small objects climbing from ~0.45 to ~0.65 over training, while large objects hit ~0.70. Progress, but small objects remain the hardest category – which tracks with the whole “specks in the sky” problem.

YOLOX-Small mAP by IoU threshold and object size

Will This Actually Run on a Phone?

The whole point of this exercise was to run inference on an iPhone. So here is some napkin math:

ModelGFLOPsEstimated Phone Inference
Nano1.6~15ms, smooth 30fps easy
Nanoish3.2~25ms, still good
Small26~80ms, might be sluggish
YOLOv8n (for reference)8.7~27ms

My app was already running YOLOv8n at 15fps with plenty of headroom. So theoretically even the small model should work, but nano/nanoish would leave more room for everything else the app needs to do.

The plan: train everything, compare accuracy, quantize for deployment, and see what actually works in practice.

Training Results (And a Rookie Mistake)

After letting things run overnight (300 epochs takes a while even on a 3090), here’s what I got:

The nanoish model at epoch 100 was already showing 94% detection rate on test images, beating the fully-trained nano model. And it wasn’t even done training yet.

Quick benchmark on 50 COCO test images with aircraft (RTX 3090 GPU inference – not identical to phone, but close enough for the smaller models to be representative):

ModelDetection RateAvg Detections/ImageAvg Inference (ms)FPS
YOLOv8n58.6%0.8233.629.7
YOLOX nano74.3%1.0414.071.4
YOLOX nanoish81.4%1.1415.066.9
YOLOX tiny91.4%1.2816.560.7
YOLOX small92.9%1.3017.457.4
Ground Truth1.40

YOLOv8n getting beaten by every single YOLOX variant while also being slower was… not what I expected. Here’s the mAP comparison across all the models over training – you can see the hierarchy pretty clearly:

mAP comparison across all YOLOX model variants

The big takeaway: more capacity = better accuracy, but with diminishing returns. The jump from nano to nanoish is huge, nanoish to small is solid, and tiny lands somewhere in between depending on the epoch. (You’ll notice two extra lines in the chart – a large model and a self-sourced variant. I kept training after this post’s story ends. More on the self-sourced pipeline later. You can also see the large model is clearly overfitting past epoch ~315 – loss keeps decreasing but mAP starts dropping. My first time overfitting a model.)

The nanoish model hit a nice sweet spot. Faster than YOLOv8n, better small object detection than pure nano, and still lightweight enough for mobile.

And here is the output from my plot_training.py script:

============================================================
SUMMARY
============================================================
Run                         Epochs   Final Loss    Best AP  Best AP50
------------------------------------------------------------
yolox_large_aircraft           391       0.6000     0.6620     0.8620
yolox_nano_aircraft            300       3.3000     0.4770     0.7390
yolox_nanoish_aircraft         142       4.3000     0.4390     0.7210
yolox_small_aircraft           302       2.2000     0.6360     0.8650
yolox_small_with_self_sou      400       1.4000     0.6420     0.8620
yolox_tiny_aircraft            300       2.5000     0.6060     0.8480
============================================================

====================================================================================================
mAP VALUES AT SPECIFIC EPOCHS
====================================================================================================
Run                           AP@280     AP50@280    APsmall@280     AP@290     AP50@290    APsmall@290     AP@299     AP50@299    APsmall@299
----------------------------------------------------------------------------------------------------
yolox_large_aircraft          0.6350       0.8410         0.6690     0.6390       0.8410         0.6750 0.6480(300)       0.8440         0.6780
yolox_nano_aircraft           0.4750       0.7360         0.4000     0.4740       0.7360         0.3970     0.4770       0.7380         0.4030
yolox_nanoish_aircraft           N/A          N/A            N/A        N/A          N/A            N/A        N/A          N/A            N/A
yolox_small_aircraft          0.5900       0.8440         0.5960     0.6230       0.8610         0.6340     0.6360       0.8630         0.6410
yolox_small_with_self_sou     0.5940       0.8430         0.5690     0.5900       0.8420         0.5660 0.5930(300)       0.8420         0.5630
yolox_tiny_aircraft           0.5800       0.8300         0.5650     0.5950       0.8340         0.5830     0.6060       0.8440         0.5780
====================================================================================================

But there was a problem I didn’t notice until later: my training dataset had zero images without aircraft in them. Every single training image contained at least one airplane. This is… not ideal if you want your model to learn what an airplane isn’t. More on that shortly.

How It Actually Works in the App

Before I get to results, here’s what the ML is actually doing in SkySpottr. The app combines multiple data sources to track aircraft:

  1. ADS-B data tells us where aircraft are in 3D space (lat, lon, altitude)
  2. Device GPS and orientation tell us where the phone is and which way it’s pointing
  3. Physics-based prediction places aircraft overlays on screen based on all the above

That prediction is usually pretty good, but phone sensors drift and aircraft positions are slightly delayed. So the overlays can be off by a couple degrees. This is where YOLO comes in.

The app runs the model on each camera frame looking for aircraft. When it finds one within a threshold distance of where the physics engine predicted an aircraft should be, it “snaps” the overlay to the actual detected position. The UI shows an orange circle around the aircraft and marks it as “SkySpottd” – confirmed via machine learning.

I call this “ML snap” mode. It’s the difference between “there’s probably a plane somewhere around here” and “that specific bright dot is definitely the aircraft.”

The model runs continuously on device, which is why inference time matters so much. Even at 15fps cap, that’s still 15 inference cycles per second competing with everything else the app needs to do (sensor fusion, WebSocket data, AR rendering, etc.). Early on I was seeing 130%+ CPU usage on my iPhone, which is not great for battery life. Every millisecond saved on inference is a win.

Getting YOLOX into CoreML

One thing the internet doesn’t tell you: YOLOX and Apple’s Vision framework don’t play nice together.

YOLOv8 exports to CoreML with a nice Vision-compatible interface. You hand it an image, it gives you detections. Easy. YOLOX expects different preprocessing – it wants pixel values in the 0-255 range (not normalized 0-1), and the output tensor layout is different.

The conversion pipeline goes PyTorch → TorchScript → CoreML. Here’s the core of it:

import torch
import coremltools as ct
from yolox.models import YOLOX, YOLOPAFPN, YOLOXHead

# Build model (same architecture as training config)
backbone = YOLOPAFPN(depth=0.33, width=0.50, in_channels=[256, 512, 1024], act="silu")
head = YOLOXHead(num_classes=1, width=0.50, in_channels=[256, 512, 1024], act="silu")
model = YOLOX(backbone, head)

# Load trained weights
ckpt = torch.load("yolox_small_best.pth", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model"])
model.eval()
model.head.decode_in_inference = True  # Output pixel coords, not raw logits

# Trace and convert
dummy = torch.randn(1, 3, 640, 640)
traced = torch.jit.trace(model, dummy)
mlmodel = ct.convert(
    traced,
    inputs=[ct.TensorType(name="images", shape=(1, 3, 640, 640))],
    outputs=[ct.TensorType(name="output")],
    minimum_deployment_target=ct.target.iOS15,
    convert_to="mlprogram",
)
mlmodel.save("yolox_small_aircraft.mlpackage")

The decode_in_inference = True is crucial — without it, the model outputs raw logits and you’d need to implement the decode head in Swift. With it, the output is [1, N, 6] where 6 is [x_center, y_center, width, height, obj_conf, class_score] in pixel coordinates.

On the Swift side, Claude ended up writing a custom detector that bypasses the Vision framework entirely. Here’s the preprocessing — the part that was hardest to get right:

/// Convert pixel buffer to MLMultiArray [1, 3, H, W] with 0-255 range
private func preprocess(pixelBuffer: CVPixelBuffer) -> MLMultiArray? {
    // GPU-accelerated resize via Core Image
    let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
    let scaleX = CGFloat(inputSize) / ciImage.extent.width
    let scaleY = CGFloat(inputSize) / ciImage.extent.height
    let scaledImage = ciImage.transformed(by: CGAffineTransform(scaleX: scaleX, y: scaleY))

    // Reuse pixel buffer from pool (memory leak fix #1)
    var resizedBuffer: CVPixelBuffer?
    CVPixelBufferPoolCreatePixelBuffer(kCFAllocatorDefault, pool, &resizedBuffer)
    guard let buffer = resizedBuffer else { return nil }
    ciContext.render(scaledImage, to: buffer)

    // Reuse pre-allocated MLMultiArray (memory leak fix #2)
    guard let array = inputArray else { return nil }

    CVPixelBufferLockBaseAddress(buffer, .readOnly)
    defer { CVPixelBufferUnlockBaseAddress(buffer, .readOnly) }

    let bytesPerRow = CVPixelBufferGetBytesPerRow(buffer)
    let pixels = CVPixelBufferGetBaseAddress(buffer)!.assumingMemoryBound(to: UInt8.self)
    let arrayPtr = array.dataPointer.assumingMemoryBound(to: Float.self)
    let channelStride = inputSize * inputSize

    // BGRA → RGB, keep 0-255 range (YOLOX expects unnormalized pixels)
    // Direct pointer access is ~100x faster than MLMultiArray subscript
    for y in 0..<inputSize {
        let rowOffset = y * bytesPerRow
        let yOffset = y * inputSize
        for x in 0..<inputSize {
            let px = rowOffset + x * 4
            let idx = yOffset + x
            arrayPtr[idx] = Float(pixels[px + 2])                      // R
            arrayPtr[channelStride + idx] = Float(pixels[px + 1])      // G
            arrayPtr[2 * channelStride + idx] = Float(pixels[px])      // B
        }
    }
    return array
}

The two key gotchas: (1) BGRA byte order from the camera vs RGB that the model expects, and (2) YOLOX wants raw 0-255 pixel values, not the 0-1 normalized range that most CoreML models expect. If you normalize, everything silently breaks — the model runs, returns garbage, and you spend an evening wondering why.

For deployment, I used CoreML’s INT8 quantization (coremltools.optimize.coreml.linear_quantize_weights). This shrinks the model by about 50% with minimal accuracy loss. The small model went from ~17MB to 8.7MB, and inference time improved slightly.

Real World Results (Round 1)

I exported the nanoish model and got it running in SkySpottr. The good news: it works. The ML snap feature locks onto aircraft, the orange verification circles appear, and inference is fast enough that I don’t notice any lag.

The less good news: false positives. Trees, parts of houses, certain cloud formations – the model occasionally thinks these are aircraft. Remember that rookie mistake about no negative samples? Yeah.

I later set up a 3-way comparison to visualize exactly this kind of failure. The three panels show my COCO-only trained model (red boxes), a later model trained on self-sourced images (green boxes – I’ll explain this pipeline shortly), and YOLO26-X as a ground truth oracle (right panel, no boxes means no detection). The COCO-only model confidently detects an “aircraft” that is… a building. The other two correctly ignore it.

False positive comparison - COCO-only model detects a building as aircraft

The app handles this gracefully because of the matching threshold. Random false positives in empty sky don’t trigger the snap because there’s no predicted aircraft nearby to match against. But when there’s a tree branch right next to where a plane should be, the model sometimes locks onto the wrong thing.

The even less good news: it still struggles with truly distant aircraft. A plane at 35,000 feet that’s 50+ miles away is basically a single bright pixel. No amount of ML is going to reliably detect that. For those, the app falls back on pure ADS-B prediction, which is usually good enough to get the overlay in the right general area.

But when it works, it works. I’ll show some examples of successful detections in the self-sourced section below.

The Memory Leak Discovery (Fun Debugging Tangent)

While testing the YOLOX integration, I was also trying to get RevenueCat working for subscriptions. Had the app running for about 20 minutes while I debugged the in-app purchase flow. Noticed it was getting sluggish, opened Instruments, and… yikes.

Base memory for the app is around 200MB. After 20 minutes of continuous use, it had climbed to 450MB. Classic memory leak pattern.

The culprit was AI induced, and AI resolved: it was creating a new CVPixelBuffer and MLMultiArray for every single frame. At 15fps, that’s 900 allocations per minute that weren’t getting cleaned up fast enough.

The fix was straightforward – use a CVPixelBufferPool for the resize buffers and pre-allocate a single MLMultiArray that gets reused. Memory now stays flat even after hours of use.

(The RevenueCat thing? I ended up ditching it entirely and going with native StoreKit2. RevenueCat is great, but keeping debug and release builds separate was more hassle than it was worth for a side project. StoreKit2 is actually pretty nice these days if you don’t need the analytics. I’m at ~80 downloads, and not a single purchase. First paid app still needs some fine tuning, clearly, on the whole freemium thing.)

Round 2: Retraining with Negative Samples

After discovering the false positive issue, I went back and retrained. This time I made sure to include images without aircraft – random sky photos, clouds, trees, buildings, just random COCO2017 stuff. The model needs to learn what’s NOT an airplane just as much as what IS one.

Here’s the extraction script that handles the negative sampling. The key insight: you need to explicitly tell the model what empty sky looks like:

def extract_airplane_dataset(split="train", negative_ratio=0.2, seed=42):
    """Extract airplane images from COCO, with negative samples."""
    with open(f"instances_{split}2017.json") as f:
        coco_data = json.load(f)

    # Find all images WITH airplanes
    airplane_image_ids = set()
    for ann in coco_data['annotations']:
        if ann['category_id'] == AIRPLANE_CATEGORY_ID:  # 5 in COCO
            airplane_image_ids.add(ann['image_id'])

    # Find images WITHOUT airplanes for negative sampling
    all_ids = {img['id'] for img in coco_data['images']}
    negative_ids = all_ids - airplane_image_ids

    # Add 20% negative images (no airplanes = teach model what ISN'T a plane)
    num_negatives = int(len(airplane_image_ids) * negative_ratio)
    sampled_negatives = random.sample(list(negative_ids), num_negatives)
    # ... copy images and annotations to output directory

I also switched from nanoish to the small model. The accuracy improvement on distant aircraft was worth the extra compute, and with INT8 quantization the inference time came in at around 5.6ms on an iPhone – way better than my napkin math predicted. Apple’s Neural Engine is impressive.

The final production model: YOLOX-Small, 640×640 input, INT8 quantized, ~8.7MB on disk. It runs at 15fps with plenty of headroom for the rest of the app on my iPhone 17 Pro.

Round 3: Self-Sourced Images and Closing the Loop

So the model works, but it was trained entirely on COCO2017 – airport tarmac photos, stock images, that kind of thing. My app is pointing at the sky from the ground. Those are very different domains.

I added a debug flag to SkySpottr for my phone that saves every camera frame where the model fires a detection. Just flip it on, walk around outside for a while, and the app quietly collects real-world training data. Over a few weeks of casual use, I accumulated about 2,000 images from my phone.

The problem: these images don’t have ground truth labels. I’m not going to sit there and manually draw bounding boxes on 2,000 sky photos. So I used YOLO26-X (Ultralytics’ latest and greatest, which I’m fine using as an offline tool since it never ships in the app) as a teacher model. Run it on all the collected images, take its high-confidence detections as pseudo-labels, convert to COCO annotation format, and now I have a self-sourced dataset to mix in with the original COCO training data.

Here’s the pseudo-labeling pipeline. First, run the teacher model on all collected images:

from ultralytics import YOLO

model = YOLO("yolo26x.pt")  # Big model, accuracy over speed

for img_path in tqdm(image_paths, desc="Processing images"):
    results = model(str(img_path), conf=0.5, verbose=False)
    boxes = results[0].boxes
    airplane_boxes = boxes[boxes.cls == AIRPLANE_CLASS_ID]

    for box in airplane_boxes:
        xyxy = box.xyxy[0].cpu().numpy().tolist()
        x1, y1, x2, y2 = xyxy
        detections.append({
            "bbox_xywh": [x1, y1, x2 - x1, y2 - y1],  # COCO format
            "confidence": float(box.conf[0]),
        })

Then convert those detections to COCO annotation format so YOLOX can train on them:

def convert_to_coco(detections):
    """Convert YOLO26 detections to COCO training format."""
    coco_data = {
        "images": [], "annotations": [],
        "categories": [{"id": 1, "name": "airplane", "supercategory": "vehicle"}],
    }

    for uuid, data in detections.items():
        img_path = Path(data["image_path"])
        width, height = Image.open(img_path).size

        if width > 1024 or height > 1024:  # Skip oversized images
            continue

        coco_data["images"].append({"id": image_id, "file_name": f"{uuid}.jpg",
                                     "width": width, "height": height})

        for det in data["detections"]:
            coco_data["annotations"].append({
                "id": ann_id, "image_id": image_id, "category_id": 1,
                "bbox": det["bbox_xywh"], "area": det["bbox_xywh"][2] * det["bbox_xywh"][3],
                "iscrowd": 0,
            })

    with open("instances_train.json", "w") as f:
        json.dump(coco_data, f)

Finally, combine both datasets in the training config using YOLOX’s ConcatDataset:

def get_dataset(self, cache=False, cache_type="ram"):
    from yolox.data import COCODataset, TrainTransform
    from yolox.data.datasets import ConcatDataset

    preproc = TrainTransform(max_labels=50, flip_prob=0.5, hsv_prob=1.0)

    # Original COCO aircraft dataset
    coco_dataset = COCODataset(data_dir=self.data_dir, json_file=self.train_ann,
                                img_size=self.input_size, preproc=preproc, cache=cache)

    # Self-sourced dataset (YOLO26-X validated)
    self_sourced = COCODataset(data_dir=self.self_sourced_dir, json_file=self.self_sourced_ann,
                                name="train", img_size=self.input_size, preproc=preproc, cache=cache)

    print(f"COCO aircraft images: {len(coco_dataset)}")
    print(f"Self-sourced images: {len(self_sourced)}")
    return ConcatDataset([coco_dataset, self_sourced])

Out of 2,000 images, YOLO26-X found aircraft in about 108 of them at a 0.5 confidence threshold – a 1.8% hit rate, which makes sense since most frames are just empty sky between detections. I filtered out anything over 1024px and ended up with a nice supplementary dataset of aircraft-from-the-ground images.

The 3-way comparison images I showed earlier came from this pipeline. Here’s what successful detections look like – the COCO-only model (red), self-sourced model (green), and YOLO26-X (right panel, shown at full resolution so you can see what we’re actually detecting):

Aircraft detection comparison - all models detecting a plane in clear sky

That’s maybe 30 pixels of airplane against blue sky, detected with 0.88 and 0.92 confidence by the two YOLOX variants.

And here’s one I particularly like – aircraft spotted through pine tree branches. Real-world conditions, not a clean test image. Both YOLOX models nail it, YOLO26-X misses at this confidence threshold:

Aircraft detection through pine tree branches

And a recent one from February 12, 2026 – a pair of what appear to be F/A-18s over Denver at 4:22 PM MST, captured at 12x zoom. The model picks up both jets at 73-75% confidence, plus the bird in the bottom-right at 77% (a false positive the app filters out via ADS-B matching). Not bad for specks against an overcast sky.

F/A-18 pair detected over Denver, CO - Feb 12, 2026

I also trained a full YOLOX-Large model (depth 1.0, width 1.0, 1024×1024 input) on the combined dataset, just to see how far I could push it. Too heavy for phone deployment, but useful for understanding the accuracy ceiling.

Conclusion

Was this worth it to avoid Ultralytics’ licensing? Since it took an afternoon and a couple evenings of vibe-coding, yes, it was not hard to switch. Not just because MIT is cleaner than AGPL, but because I learned a ton about how these models actually work. The Ultralytics ecosystem is so polished that it’s easy to treat it as a black box. Building from YOLOX forced me to understand some of the nuances, the training configs, and the tradeoffs between model size and accuracy.

Plus, I can now say I trained my own object detection model from scratch. That’s worth something at parties. Nerdy parties, anyway.

SkySpottr is live on the App Store if you want to see the model in action – point your phone at the sky and watch it lock onto aircraft in real-time.

The self-sourced pipeline is still running. Every time I use the app with the debug flag on, it collects more training data. The plan is to periodically retrain as the dataset grows – especially now that I’m getting images from different weather conditions, times of day, and altitudes. The COCO-only model was a solid start, but a model trained on actual ground-looking-up images of aircraft at altitude? That’s the endgame.

Categories
Programming

CS193p Spring 2021 Lecture 4 & Assignment 2

After lecture 4

A lot of this stuff still isn’t making a ton of sense to me. I really struggled with how to init the theme for assignment 2. The key was optionals. How to do it came to me in the shower. I am writing this post after doing the changes from Lecture 4, Assignment 2, and Lecture 5 so I don’t have a whole lot specifically around lecture 4.

Code

The viewModel is now hooked up to both the View and the Model. This MVVM stuff is clicking for me, thankfully.

MemoryGame.swift – includes scoring:

//
//  MemoryGame.swift
//  Memorize
//
//  Created by Austin on 5/28/21.
//  austinsnerdythings.com

import Foundation

// model
struct MemoryGame<CardContent> where CardContent: Equatable {
    private(set) var cards: Array<Card>
    private var indexOfTheOneAndOnlyFaceUpCard: Int?
    private(set) var score = 0
    
    mutating func choose(_ card: Card) {
        if let chosenIndex = cards.firstIndex(where: { $0.id == card.id }),
           !cards[chosenIndex].isFaceUp,
           !cards[chosenIndex].isMatched
        {
            if let potentialMatchIndex = indexOfTheOneAndOnlyFaceUpCard {
                cards[chosenIndex].hasBeenSeenThisManyTimes += 1
                cards[potentialMatchIndex].hasBeenSeenThisManyTimes += 1
                if cards[chosenIndex].content == cards[potentialMatchIndex].content {
                    // match
                    cards[chosenIndex].isMatched = true
                    cards[potentialMatchIndex].isMatched = true
                    score += 2
                } else if cards[chosenIndex].hasBeenSeenThisManyTimes > 1 ||
							cards[potentialMatchIndex].hasBeenSeenThisManyTimes > 1 {
					// mismatch
					 score -= 1
				}
                indexOfTheOneAndOnlyFaceUpCard = nil
            } else {
                for index in cards.indices {
                    cards[index].isFaceUp = false
                }
                indexOfTheOneAndOnlyFaceUpCard = chosenIndex
            }
            
            cards[chosenIndex].isFaceUp.toggle()
        }
        print("\(cards)")
    }
    
    init(numberOfPairsOfCards: Int, createCardContent: (Int) -> CardContent) {
        cards = Array<Card>()
        // add number of pairs of cards x 2 cards to card array
        for pairIndex in 0..<numberOfPairsOfCards {
            let content: CardContent = createCardContent(pairIndex)
            cards.append(Card(content: content, id: pairIndex*2))
            cards.append(Card(content: content, id: pairIndex*2+1))

        }
        cards.shuffle()
    }
    
    struct Card: Identifiable {
        var isFaceUp: Bool = false
        var isMatched: Bool = false
        var content: CardContent
        var id: Int
        var hasBeenSeenThisManyTimes: Int = 0
    }
}

EmojiMemoryGame.swift – we’ve moved the theme stuff into its own struct/file

//
//  EmojiMemoryGame.swift
//  Memorize
//
//  Created by Austin on 5/28/21.
//  austinsnerdythings.com

import SwiftUI

// viewModel
class EmojiMemoryGame: ObservableObject {
    @Published private var gameModel: MemoryGame<String>
    private(set) var theme: Theme
    
    static func createMemoryGame(theme: Theme) -> MemoryGame<String> {
        let emojis: Array<String> = theme.emojis.shuffled()
		var cardsToShow = theme.numberOfPairsOfCards ?? Int.random(in: 3...theme.emojis.count)
		if cardsToShow > theme.emojis.count {
			cardsToShow = theme.emojis.count
		}
        return MemoryGame<String>(numberOfPairsOfCards: cardsToShow) { pairIndex in
            emojis[pairIndex]
        }
    }
    
	init(startingTheme: Theme? = nil)
    {
		let selectedTheme = startingTheme ?? themes.randomElement()!
		self.theme = selectedTheme
		gameModel = EmojiMemoryGame.createMemoryGame(theme: selectedTheme)
    }

    var cards: Array<MemoryGame<String>.Card> {
        return gameModel.cards
    }
	
	var score: Int {
		return gameModel.score
	}
    
    // MARK: - INTENTS
    func choose(_ card: MemoryGame<String>.Card) {
        gameModel.choose(card)
    }
    
    func startNewGame() {
        let newTheme = themes.randomElement()!
		self.theme = newTheme
		gameModel = EmojiMemoryGame.createMemoryGame(theme: newTheme)
    }
}

MemorizeApp.swift – added the viewModel argument to the init here

//
//  MemorizeApp.swift
//  Memorize
//
//  Created by Austin on 5/25/21.
//  austinsnerdythings.com

import SwiftUI

@main
struct MemorizeApp: App {
    let game = EmojiMemoryGame()
    
    var body: some Scene {
        WindowGroup {
            ContentView(viewModel: game)
        }
    }
}

Theme.swift

//
//  Theme.swift
//  Memorize
//
//  Created by Austin on 6/7/21.
//

import Foundation
import SwiftUI

//    struct Theme: Identifiable {
struct Theme {
    var name: String
    var emojis: [String]
    var numberOfPairsOfCards: Int?
    var baseColor: Color
}

let themes: [Theme] = [
	Theme(name: "vehicles",
		  emojis: ["?","?","?","?","?","?","?","?","?","?","?","?","?","✈️","?","?","?","?","?","?","?","?","?","?"],
		  baseColor: Color.red),
	Theme(name: "fruits",
		  emojis: ["?","?","?","?","?","?","?","?","?","?","?","?"],
		  baseColor: Color.yellow),
	Theme(name: "animals",
		  emojis: ["?","?","?","?","?","?","?","?","?","?","?","?"],
		  numberOfPairsOfCards: 20,
		  baseColor: Color.blue)

]

ContentView.swift

//
//  ContentView.swift
//  Memorize - Stanford CS193p, Spring 2021
//  After assignment 1
//
//  Created by Austin from austinsnerdythings.com on 5/27/21.
//

import SwiftUI

// view
struct ContentView: View {
    @ObservedObject var viewModel: EmojiMemoryGame
    
    var body: some View {
        VStack {
			HStack {
				Text("Memorize!").font(.largeTitle)
				Spacer()
				HStack {
					VStack {
						Text(viewModel.theme.name).font(.title)
						Text("Score: \(viewModel.score)")
					}
					Button("New Game") {
						viewModel.startNewGame()
					}
				}
				
			}
            ScrollView {
                LazyVGrid(columns: [GridItem(.adaptive(minimum: 80))]){
                    ForEach(viewModel.cards[0..<viewModel.cards.count]) { card in
                        CardView(card: card)
                            .aspectRatio(2/3, contentMode: .fit)
                            .onTapGesture {
                                viewModel.choose(card)
                            }
                    }
                }
            }
            .foregroundColor(viewModel.theme.baseColor)
            .font(.largeTitle)
            .padding(.horizontal)
        }
    }
}

struct CardView: View {
    let card: MemoryGame<String>.Card
    var body: some View {
        ZStack {
            let shape = RoundedRectangle(cornerRadius: 20)
            if card.isFaceUp {
                shape.fill().foregroundColor(.white)
                shape.strokeBorder(lineWidth: 3)
                Text(card.content).font(.largeTitle)
            } else if card.isMatched {
                shape.opacity(0)
            } else {
                shape.fill()
            }
        }
    
    }
}

struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        let game = EmojiMemoryGame()
        ContentView(viewModel: game)
            .preferredColorScheme(.light)
        ContentView(viewModel: game)
            .preferredColorScheme(.dark)
    }
}

References

I gained some inspiration (and cleared up a lot of confusion) from two GitHub repos:

Conclusion

Still a slog. Still learning. CS193p Spring 2021 Lecture 4 is probably where I would start wondering if I should drop the class if I was a Stanford student. The stuff from lecture 5 (post coming up) where Professor took 20 lines and shrunk it to 2 is still a bit much for me. He says it improves readability. It does, but stuffing everything into a single line does hinder debugging.

Categories
Programming

Learning Swift – CS193p Spring 2021 Lecture 3

After lecture 3

So lecture 3 really pointed out to me where/why I had trouble learning Swift the first time around. The shortened closures and the whole “if this is the last argument for the function call, drop it” thing don’t make it easy for people new to the language. Words/phrases that came to mind when I realized what happened include “cute”, “nuanced”, and “too concise”. I wrote a whole post about it here. This lecture series will get me back on the right track to learn Swift. Link to after lecture 2/assignment post here.

Code

The code compiles. We did not hook the new model or viewmodel up to the view yet so no updates on the UI with this post. This is the first post where the code will be split across multiple files. (This means I should move to github or something similar.)

MemoryGame.swift

//
//  MemoryGame.swift
//  Memorize
//
//  Created by Austin on 5/28/21.
//  austinsnerdythings.com

import Foundation

struct MemoryGame<CardContent> {
    private(set) var cards: Array<Card>
    
    func choose(_ card: Card) {
        // this is where the game logic will go
    }
    
    init(numberOfPairsOfCards: Int, createCardContent: (Int) -> CardContent) {
        cards = Array<Card>()
        // add number of pairs of cards x 2 cards to card array
        for pairIndex in 0..<numberOfPairsOfCards {
            let content: CardContent = createCardContent(pairIndex)
            cards.append(Card(content: content))
            cards.append(Card(content: content))

        }
    }
    
    struct Card {
        var isFaceUp: Bool = false
        var isMatched: Bool = false
        var content: CardContent
    }
}

EmojiMemoryGame.swift

//
//  EmojiMemoryGame.swift
//  Memorize
//
//  Created by Austin on 5/28/21.
//  austinsnerdythings.com

import SwiftUI


class EmojiMemoryGame {
    static let emojis = ["?","?","?","?","?","?","?","?","?","?","?","?","?","✈️","?","?","?","?","?","?","?","?","?","?"]
    
    static func createMemoryGame() -> MemoryGame<String> {
        return MemoryGame<String>(numberOfPairsOfCards: 4) { pairIndex in
            EmojiMemoryGame.emojis[pairIndex]
        }
    }
    
private var model: MemoryGame<String> =
    MemoryGame<String>(numberOfPairsOfCards: 4) { _ in "A" }
    
    var cards: Array<MemoryGame<String>.Card> {
        return model.cards
    }
}

MemorizeApp.swift

//
//  MemorizeApp.swift
//  Memorize
//
//  Created by Austin on 5/25/21.
//  austinsnerdythings.com

import SwiftUI

@main
struct MemorizeApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

ContentView.swift is basically the same as last lecture. It wasn’t touched much (if at all) so I won’t include it. CS193p Spring 2021 lecture 3 was mostly about some Swift ideas and MVVM, not the view for the app.

Conclusion

I am super glad I started watching these lectures so they could get me going in the right direction for Swift. Sure I’m a bit frustrated now because I realize where I went wrong, but I’m excited to get back to lecture 4 and the assignment after this long weekend.

Categories
Programming

Learning Swift – Confusion on Conciseness

Is Swift too concise for beginners?

This is not the first time I’ve tried to learn Swift. The first go took place maybe October/November 2020. I followed the official Apple Landmarks tutorial (called Creating and Combining Views) and things just did not click. I looked elsewhere for tutorials as well. My wife and I also had our first 3 month old around so my brain wasn’t functioning 100%. Regardless, I could follow the Landmarks tutorial, but not really step out on my own. The words I used to describe Swift to myself were “too cute” and “nuanced” and other things like that. After watching the Stanford CS193p Spring 2021 Lecture 3 video, there was a 2 minute section that really cleared things up for me. My background is mostly C# with some Python so that’s where I’m coming from.

Shortening things up

At 1:33:43 in the lecture, Professor Hegarty is finished taking two completely reasonable functions and chopping out more than half the characters. The resulting combination functions exactly the same as the two larger functions.

Before:

func makeCardContent(index: Int) -> String {
    return "A"
}
private var model: MemoryGame<String> =
    MemoryGame<String>(numberOfPairsOfCards: 4, createCardContent: makeCardContent)

Middle:

private var model: MemoryGame<String> =
    MemoryGame<String>(numberOfPairsOfCards: 4, createCardContent: {(index: Int) -> String in
        return "A"
    })

After:

private var model: MemoryGame<String> =
    MemoryGame<String>(numberOfPairsOfCards: 4) { _ in "A" }

The theme of the code reduction is ‘taking out things that Swift already knows’ as well as the ‘if this is the last argument of a function, plop the function in its place’. Let’s examine that for a minute.

The last argument of a function thing is really an if-then that you need perform mentally while writing code. When learning a language, it isn’t particularly easy to figure out what’s going on when all the code snippets are already fully reduced. Maybe I missed a key page in the documentation, but this wasn’t made clear to me in any of the learning I attempted to do. It could also just be that I don’t understand how functional programming is supposed to work.

That same code in C# (at least for the versions I use) would be a lot more clear to read. Everything would be specified, unless you precede a variable with var, which indicates that you want the compiler to infer the type. Being able to decide is nice.

When does concise become confusing?

All that said, I still think “cute” and “nuanced” are appropriate for describing Swift (at least SwiftUI). It tries to be cute by cutting out thing where other languages just leave them in. The underbar (_) when you don’t need to specify an argument is another example of this. Why not just make every argument label optional unless specifically called out as necessary?

The other thing is the mix between Swift being a strongly-typed language as well as type inferencing. If it is strongly-typed, we should need to specify the type basically everywhere. Leaving out the types and letting the compiler inference them seems to work really well (I know the compilers are all much smarter than me) but it doesn’t help readability.

Conclusion

Are these valid criticisms? I don’t know. If a Swift expert wants to watch me (via screensharing) work on some basic cryptocurrency tracking app I have going for 30-60 minutes to answer my questions and help me learn Swift (I would pay $$$!), I would love that. Swift will make more sense the more I write it, I know that, but I’m left wondering if I’ll always have these thoughts. Beautiful Swift is indeed beautiful. I just need to figure out how to get there.

Categories
Programming

Learning Swift – CS193p Spring 2021 Assignment 1

Coming from Learning Swift (the programming language), I have completed Assignment 1 (including extra credit #1, but not #2). Most of the tasks were relatively straight-forward. I did not do any of the reading.

The last post had a fully functional game working as described in lectures 1 & 2. This assignment built from that.

Some brief notes on the required tasks:

  1. Easy. Done on the last post.
  2. Also easy.
  3. Not too hard, just added a new struct called TitleView and plopped it before the ScrollView in the VStack
  4. I tried to figure out a way to have the emojis var populated with the contents of a different emoji list (e.g. vehicleEmojis, as in var emojis = vehicleEmojis), but wasn’t successful. This will probably be covered in lecture 3. I ended up just hardcoding the initial list to be the same values as the vehicleEmoji string array. The buttons themselves were pretty straightforward.
  5. The hardest part about this was figuring out how to use the emoji browser so I didn’t have to close and re-open it for every new emoji. I did vehicles, animals, and fruit.
  6. array.shuffled(), easy
  7. VStack the system images with text
  8. Okay, I suppose I can do that. I picked the fruit emojis before realizing there weren’t specific fruit symbols. In SF 2, there is a leaf, which is close enough. This seems a bit nit-picky.
  9. font(.body)
  10. ok

Extra credit:

  1. set emojiCount to be a random value from 4 (minimum specified in the task) up to yourEmojiListVar.count. easy. do this after shuffling though
  2. this seems like a lot of work for something I don’t really need to do yet

Code

//
//  ContentView.swift
//  Memorize - Stanford CS193p, Spring 2021
//  After assignment 1
//
//  Created by Austin from austinsnerdythings.com on 5/27/21.
//

import SwiftUI

struct ContentView: View {
    var vehicleEmojis = ["?","?","?","?","?","?","?","?","?","?","?","?","?","✈️","?","?","?","?","?","?","?","?","?","?"]
    var animalEmojis = ["?","?","?","?","?","?","?","?","?","?","?","?"]
    var fruitEmojis = ["?","?","?","?","?","?","?","?","?","?","?","?"]
    
    @State var emojis = ["?","?","?","?","?","?","?","?","?","?","?","?","?","✈️","?","?","?","?","?","?","?","?","?","?"]
    @State var emojiCount: Int = 8

    var body: some View {
        VStack {
            TitleView()
            ScrollView {
                LazyVGrid(columns: [GridItem(.adaptive(minimum: 80))]){
                    ForEach(emojis[0..<emojiCount], id: \.self) { emoji in
                        CardView(content: emoji)
                            .aspectRatio(2/3, contentMode: .fit)
                    }
                }
            }
            .foregroundColor(.red)
            Spacer()
            HStack {
                vehicleTheme
                Spacer()
                animalTheme
                Spacer()
                fruitTheme
            }
            .font(.largeTitle)
            .padding(.horizontal)
        }
        .padding()

    }
    
    var remove: some View {
        Button {
            if emojiCount > 1 {
                emojiCount -= 1
            }
        } label: {
            Image(systemName: "minus.circle")
        }
    }
    
    var add: some View {
        Button {
            if emojiCount < vehicleEmojis.count {
                emojiCount += 1
            }
        } label: {
            Image(systemName: "plus.circle")
        }
    }
    
    var vehicleTheme: some View {
        Button(action: {
            emojis = vehicleEmojis.shuffled()
            emojiCount = Int.random(in: 4..<vehicleEmojis.count)
        }, label: {
            VStack{
                Image(systemName: "car").font(.largeTitle)
                Text("Vehicles").font(.body)
            }
        })
    }
    var animalTheme: some View {
        Button(action: {
            emojis = animalEmojis.shuffled()
            emojiCount = Int.random(in: 4..<animalEmojis.count)
        }, label: {
            VStack{
                Image(systemName: "hare").font(.largeTitle)
                Text("Animals").font(.body)
            }
        })
    }
    var fruitTheme: some View {
        Button(action: {
            emojis = fruitEmojis.shuffled()
            emojiCount = Int.random(in: 4..<fruitEmojis.count)
        }, label: {
            VStack{
                Image(systemName: "leaf").font(.largeTitle)
                Text("Fruits").font(.body)
            }
        })
    }
}

struct CardView: View {
    var content: String
    @State var isFaceUp: Bool = true
    
    var body: some View {
        ZStack {
            let shape = RoundedRectangle(cornerRadius: 20)
            if isFaceUp {
                shape.fill().foregroundColor(.white)
                shape.strokeBorder(lineWidth: 3)
                Text(content).font(.largeTitle)
            } else {
                shape.fill()
            }
        }
        .onTapGesture {
            if isFaceUp == true {
                isFaceUp = false
            } else {
                isFaceUp = true
            }
        }
    }
}

struct TitleView: View {
    var body: some View {
        HStack {
            Spacer()
            Text("Memorize!").font(.largeTitle)
            Spacer()
        }
    }
}

struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView()
            .preferredColorScheme(.light)
        ContentView()
            .preferredColorScheme(.dark)
    }
}

Screenshots

Next up – lecture 3

Link to post about lecture 3 here.