RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

arXiv 2025
1MediaTek, 2National Yang Ming Chiao Tung University
*Indicates Equal Contribution
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes
We present RL-AWB, the first framework that integrates reinforcement learning into automatic white balance for nighttime color constancy. Our approach fundamentally differs from existing paradigms by formulating AWB as a sequential decision-making problem, where an RL agent learns adaptive parameter selection policies for a novel statistical illuminant estimator. Experiments show that our method achieves 12% lower median angular error compared the the state-of-the-art methods on the Nighttime Color Constancy (NCC) dataset.

Abstract

Figure

Nighttime color constancy remains a challenging problem in computational photography due to low-light noise and complex illumination conditions that limit cross-sensor generalization. We present RL-AWB, a novel framework combining statistical methods with deep reinforcement learning for nighttime white balance. Our method begins with a statistical algorithm tailored for nighttime scenes, integrating salient gray pixel detection with novel illumination estimation. Building on this foundation, we develop the first deep reinforcement learning approach for color constancy that leverages the statistical algorithm as its core, mimicking professional AWB tuning experts by dynamically optimizing parameters for each image without knowing its ground-truth illumination. To facilitate cross-sensor evaluation, we introduce the first multi-sensor nighttime color constancy dataset. Experiments show that our method achieves 12% lower median angular error compared the the state-of-the-art methods on the Nighttime Color Constancy (NCC) dataset.

Contributions

  • We develop SGP-LRD (Salient Gray Pixels with Local Reflectance Differences), a nighttime-specific color constancy algorithm that achieves state-of-the-art illumination estimation on public nighttime benchmarks.
  • We design the RL-AWB framework with Soft Actor-Critic (SAC) training and two-stage curriculum learning, enabling adaptive per-image parameter optimization with exceptional data efficiency.
  • We contribute LEVI (Low-light Evening Vision Illumination), the first multi-camera nighttime dataset comprising 700 images from two sensors, enabling rigorous cross-sensor color constancy evaluation.
  • Extensive experiments demonstrate superior cross-sensor generalization over state-of-the-art with only 5 training images per dataset.

Method Overview

**Overview of the proposed RL-AWB framework.** (A) Given an input image, the proposed nighttime color constancy algorithm SGP-LRD estimates the scene illuminant conditioned on two hyper-parameters (gray-pixel sampling percentage N and Minkowski order p). (B) A SAC agent selects parameter updates based on image statistics and current AWB settings. (C) The policy outputs one action per parameter; actions are sampled, squashed by tanh to [−1, 1], and rescaled to valid ranges. (D) The rescaled actions update the two hyper-parameters and are applied to SGP-LRD to produce the illuminant estimate. Repeat until the termination criterion is met.

Overview of the proposed RL-AWB framework. (A) Given an input image, the proposed nighttime color constancy algorithm SGP-LRD estimates the scene illuminant conditioned on two hyper-parameters (gray-pixel sampling percentage N and Minkowski order p). (B) A SAC agent selects parameter updates based on image statistics and current AWB settings. (C) The policy outputs one action per parameter; actions are sampled, squashed by tanh to [−1, 1], and rescaled to valid ranges. (D) The rescaled actions update the two hyper-parameters and are applied to SGP-LRD to produce the illuminant estimate. Repeat until the termination criterion is met.

LEVI Dataset

**Sample images from the proposed LEVI dataset with their corresponding Color Checker mask annotations.** The dataset captures diverse nighttime scenes with complex mixed lighting, low illumination, and high ISO conditions.

Sample images from the proposed LEVI dataset with their corresponding Color Checker mask annotations. The dataset captures diverse nighttime scenes with complex mixed lighting, low illumination, and high ISO conditions.

**Illuminant distribution and normalized mean luminance histogram over all the collected nighttime images in the LEVI and NCC datasets.** LEVI complements NCC by covering broader lighting conditions and containing more low- luminance nighttime images, offering a new benchmark for low-light color constancy evaluation.

Illuminant distribution and normalized mean luminance histogram over all the collected nighttime images in the LEVI and NCC datasets. LEVI complements NCC by covering broader lighting conditions and containing more low- luminance nighttime images, offering a new benchmark for low-light color constancy evaluation.

Prior to our work, the NCC dataset was the only public nighttime color constancy benchmark, containing 513 images from a single camera. To enable cross-sensor evaluation, we introduce the Low-light Evening Vision Illumination (LEVI) dataset—the first multi-camera nighttime bench- mark comprising 700 linear RAW images from two systems: iPhone 16 Pro (images #1–370, 4320×2160, 12-bit) and Sony ILCE-6400 (images #371–700, 6000×4000, 14-bit), with ISO ranging from 500 to 16,000. Each scene contains a Macbeth Color Checker with manual annotations. Groundtruth illuminants are computed as median RGB values of non-saturated achromatic patches. All images are black-level corrected and converted to linear RGB.

Results

**Qualitative comparison of cross-dataset performance**. Angular error in degrees. Note that images shown are gamma-corrected for visualization.

Qualitative comparison of cross-dataset performance. Angular error in degrees. Note that images shown are gamma-corrected for visualization.

**Qualitative comparison of cross-dataset performance**. Angular error in degrees. Note that images shown are gamma-corrected for visualization.

Qualitative comparison of cross-dataset performance. Angular error in degrees. Note that images shown are gamma-corrected for visualization.

BibTeX

@inproceedings{rlawb2025,
  title = {RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes},
  author = {Yuan-Kang Lee and Kuan-Lin Chen and Chia-Che Chang and Yu-Lun Liu},
  booktitle = {under review},
  year = {2025},
  pages = {to appear}
}