RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Abstract

Nighttime color constancy remains a challenging problem in computational photography due to low-light noise and complex illumination conditions that limit cross-sensor generalization. We present RL-AWB, a novel framework combining statistical methods with deep reinforcement learning for nighttime white balance. Our method begins with a statistical algorithm tailored for nighttime scenes, integrating salient gray pixel detection with novel illumination estimation. Building on this foundation, we develop the first deep reinforcement learning approach for color constancy that leverages the statistical algorithm as its core, mimicking professional AWB tuning experts by dynamically optimizing parameters for each image without knowing its ground-truth illumination. To facilitate cross-sensor evaluation, we introduce the first multi-sensor nighttime color constancy dataset. Experiment results demonstrate that our method achieve superior generalization capability across low-light and well-illuminated images.

Contributions

We develop SGP-LRD (Salient Gray Pixels with Local Reflectance Differences), a nighttime-specific color constancy algorithm that achieves state-of-the-art illumination estimation on public nighttime benchmarks.
We design the RL-AWB framework with Soft Actor-Critic (SAC) training and two-stage curriculum learning, enabling adaptive per-image parameter optimization with exceptional data efficiency.
We contribute LEVI (Low-light Evening Vision Illumination), the first multi-camera nighttime dataset comprising 700 images from two sensors, enabling rigorous cross-sensor color constancy evaluation.
Extensive experiments demonstrate superior cross-sensor generalization over state-of-the-art with only 5 training images per dataset.

Method Overview

Overview of the proposed RL-AWB framework. (A) Given an input image, the proposed nighttime color constancy algorithm SGP-LRD estimates the scene illuminant conditioned on two hyper-parameters (gray-pixel sampling percentage N and Minkowski order p). (B) A SAC agent selects parameter updates based on image statistics and current AWB settings. (C) The policy outputs one action per parameter; actions are sampled, squashed by tanh to [−1, 1], and rescaled to valid ranges. (D) The rescaled actions update the two hyper-parameters and are applied to SGP-LRD to produce the illuminant estimate. Repeat until the termination criterion is met.

LEVI Dataset

Sample images from the proposed LEVI dataset with their corresponding Color Checker mask annotations. The dataset captures diverse nighttime scenes with complex mixed lighting, low illumination, and high ISO conditions.

Illuminant distribution and normalized mean luminance histogram over all the collected nighttime images in the LEVI and NCC datasets. LEVI complements NCC by covering broader lighting conditions and containing more low- luminance nighttime images, offering a new benchmark for low-light color constancy evaluation.

Prior to our work, the NCC dataset was the only public nighttime color constancy benchmark, containing 513 images from a single camera. To enable cross-sensor evaluation, we introduce the Low-light Evening Vision Illumination (LEVI) dataset—the first multi-camera nighttime bench- mark comprising 700 linear RAW images from two systems: iPhone 16 Pro (images #1–370, 4320×2160, 12-bit) and Sony ILCE-6400 (images #371–700, 6000×4000, 14-bit), with ISO ranging from 500 to 16,000. Each scene contains a Macbeth Color Checker with manual annotations. Groundtruth illuminants are computed as median RGB values of non-saturated achromatic patches. All images are black-level corrected and converted to linear RGB.

Results

In-dataset evaluation results on the NCC and LEVI datasets. Angular error in degrees. Note that images shown are gamma-corrected for visualization.

Cross-dataset evaluation results between the NCC and LEVI datasets. Angular error in degrees. Note that $C^5$ (5) and $C^5$ (full) are both trained using the official implementation. $C^5$ (5) denotes the few-shot setting with only five training images per dataset, whereas $C^5$ (full) follows the original 3-fold protocol using all available training images in the datasets.

$**Evaluation results on the Gehler-Shi dataset.** Angular error in degrees. $C^4$, $C^5$, and the proposed RL-AWB are trained on the NCC dataset and evaluated on the Gehler–Shi dataset. Compared with our SGP-LRD, the proposed RL-AWB framework achieves a reduction of 5.9\% in the median angular error and 9.8\% in the best-25\% angular error, showing that RL-AWB generalizes well across low-light and well-illuminated images.$

Evaluation results on the Gehler-Shi dataset. Angular error in degrees. $C^4$, $C^5$, and the proposed RL-AWB are trained on the NCC dataset and evaluated on the Gehler–Shi dataset. Compared with our SGP-LRD, the proposed RL-AWB framework achieves a reduction of 5.9% in the median angular error and 9.8% in the best-25% angular error, showing that RL-AWB generalizes well across low-light and well-illuminated images.

Qualitative comparison of cross-dataset performance. Angular error in degrees. Note that images shown are gamma-corrected for visualization.