i.hyper.preproc - GRASS GIS manual

NAME

i.hyper.preproc - General hyperspectral data preprocessing

KEYWORDS

SYNOPSIS

i.hyper.preproc

i.hyper.preproc --help

i.hyper.preproc [-bcqz] input=name output=name [polyorder=integer] [derivative_order=integer] [window_length=integer] [dr_method=string] [dr_components=integer] [dr_kernel=string] [dr_gamma=float] [dr_degree=integer] [dr_max_iter=integer] [dr_tol=float] [dr_alpha=float] [dr_l1_ratio=float] [dr_random_state=integer] [dr_chunk_size=integer] [dr_bands=string] [dr_export=name] [--overwrite] [--help] [--verbose] [--quiet] [--ui]

Flags:

-b: Apply baseline correction
-c: Apply continuum removal
-q: Interpolate missing values in valid bands
-z: Clamp negative values to zero
--overwrite: Allow output files to overwrite existing files
--help: Print usage summary
--verbose: Verbose module output
--quiet: Quiet module output
--ui: Force launching GUI dialog

Parameters:

input=name [required]: Input hyperspectral raster map
output=name [required]: Output preprocessed raster map
polyorder=integer: Polynomial order for Savitzky-Golay filter (0 = skip Savitzky-Golay); Default: 0
derivative_order=integer: Derivative order (0 = smoothing only); Default: 0
window_length=integer: Window length (must be odd number); Default: 11
dr_method=string: Dimensionality reduction method (linear or nonlinear); Options: pca, kpca, nystroem, fastica, truncatedsvd, nmf, sparsepca
dr_components=integer: Number of components to retain (PCA,KPCA,Nystroem,FastICA,TruncatedSVD,NMF,SparsePCA). 0 = automatic (up to 10 or number of bands); Default: 0
dr_kernel=string: Kernel type (used only for KPCA and Nystroem); Options: linear, rbf, poly, sigmoid; Default: rbf
dr_gamma=float: Kernel gamma (KPCA and Nystroem only); Default: 0.01
dr_degree=integer: Polynomial degree (used if kernel=poly); Default: 3
dr_max_iter=integer: Maximum iterations for convergence (FastICA,NMF,SparsePCA); Default: 200
dr_tol=float: Convergence tolerance (FastICA,NMF,SparsePCA); Default: 1e-4
dr_alpha=float: Regularization strength (NMF,SparsePCA); Default: 0.0
dr_l1_ratio=float: L1 ratio in [0,1] (NMF,SparsePCA); Default: 0.0
dr_random_state=integer: Random seed for reproducibility (PCA,FastICA,NMF,SparsePCA,TruncatedSVD); Default: 0
dr_chunk_size=integer: Number of spectra per chunk for dimensionality reduction (0 = automatic; KPCA is approximated if chunked); Default: 0
dr_bands=string: Wavelength intervals or single values to include before reduction (e.g., 400–700,850–1300,2200)
dr_export=name: Optional path to export fitted reduction model (.pkl) for reuse

DESCRIPTION
FUNCTIONALITY
NOTES
EXAMPLES
SEE ALSO
DEPENDENCIES
AUTHORS

DESCRIPTION

i.hyper.preproc performs preprocessing of hyperspectral data stored as a 3D raster map (raster_3d). It is designed to improve data quality, suppress noise, and transform the spectral dimension into representations better suited for scientific analysis and machine learning workflows.

The module operates directly on hyperspectral cubes imported with i.hyper.import or other compatible 3D raster datasets. All transformations are performed along the spectral (z) dimension for each spatial position (x, y).

Preprocessing steps can be chained together in a pipeline, specified with the steps option. Each stage is executed sequentially according to the defined preprocessing pipeline. The module displays the full pipeline sequence in the console (for example: Savitzky–Golay → Baseline correction → Continuum removal → PCA), providing a clear overview of the operations applied in order.

i.hyper.preproc is part of the i.hyper module family and provides a reproducible, modular framework for spectral preprocessing prior to feature extraction, classification, or regression. All output maps are 3D rasters (raster_3d) compatible with the rest of the i.hyper suite.

FUNCTIONALITY

The following preprocessing methods are supported:

Savitzky–Golay (sav_gol) – Polynomial smoothing and derivative computation to reduce spectral noise and enhance absorption features.
Baseline correction (baseline) – Removes global trends or offsets in reflectance curves.
Continuum removal (cont_rem) – Normalizes spectra to their convex hull to highlight relative absorption depths.
Principal Component Analysis (pca) – Linear dimensionality reduction using eigen decomposition of covariance.
Kernel PCA (kpca) – Nonlinear dimensionality reduction using kernel functions (RBF, polynomial, sigmoid).
Nystroem approximation (nystroem) – Scalable approximation of Kernel PCA using a low-rank kernel mapping followed by PCA compression. Provides nonlinear feature extraction suitable for large hyperspectral cubes.
Fast Independent Component Analysis (fastica) – Separates statistically independent spectral sources or mixtures.
Truncated Singular Value Decomposition (tsvd) – Linear dimensionality reduction preserving dominant singular vectors (useful for sparse data).
Non-negative Matrix Factorization (nmf) – Decomposes spectra into additive non-negative basis components.
Sparse Principal Component Analysis (sparsepca) – PCA variant enforcing sparsity on component loadings for interpretability.

Multiple steps can be combined in one command by listing them in steps= (comma-separated). For example, steps='sav_gol,baseline,cont_rem,kpca' will execute all four in sequence. Intermediate rasters are handled internally and automatically cleaned up.

All dimensionality reduction methods are implemented using the scikit-learn library. For detailed algorithmic descriptions and parameter explanations, refer to the official scikit-learn documentation.

NOTES

The module is constructed as a preprocessing pipeline engine. Each transformation acts spectrally while preserving full spatial alignment. Operations are reported in the console as a sequential pipeline.

When using PCA, KPCA, FastICA, NMF, or SparsePCA, the number of output components can be controlled using the dr_components parameter.

Chunked dimensionality reduction:
Large hyperspectral datasets can be processed in smaller portions using the dr_chunk_size option. This enables dimensionality reduction on datasets exceeding system memory capacity. When dr_chunk_size is used with kernel-based methods (e.g., KPCA), the algorithm operates as an approximation of the full kernel mapping, trading some precision for scalability.

Model export and reuse:
Trained dimensionality reduction models can be exported using the dr_export option. The exported model (in .pkl format) can be reused to transform other spectra—such as field or laboratory measurements from a spectroradiometer—into the same reduced feature space. This allows consistent feature alignment between image-derived data and point spectra, facilitating integrated machine learning and spectral modeling workflows.

Results can be directly used by i.hyper.explore, i.hyper.composite, or exported with i.hyper.export for further analysis.

EXAMPLES

# Example 1: Savitzky–Golay smoothing (basic denoising)

# Set the region
g.region raster_3d=prisma

# Perform Savitzky–Golay smoothin with a window of 7 bands and polynomial order of 3
i.hyper.preproc input=prisma output=prisma_savgol \
                window_length=7 polyorder=3

# Console output:
Savitzky–Golay
Loading floating point  data with 4  bytes ...  (1254x1222x234)

# Example 2: PCA transformation

# Set the region
g.region raster_3d=enmap

# Performs PCA
# Interpolaties missing values in valid bands
i.hyper.preproc input=enmap output=enmap_pca \
                dr_method=pca dr_components=10 -q

# Console output:
PCA
Interpolating missing values across spectral bands...
Loading floating point  data with 4  bytes ...  (1263x1127x10)

# Example 3.1: Combined preprocessing pipeline

# Set the region
g.region raster_3d=tanager

# Savitzky–Golay derivative + baseline correction + continuum removal + Nystroem
# Interpolaties missing values in valid bands
# Processes the hyperspectral 3D map in chunks and exports the fitted Nystroem model
i.hyper.preproc input=tanager output=tanager_ml \
                polyorder=3 derivative_order=1 window_length=9 \
                -b -c -q \
                dr_method=nystroem dr_components=30 \
                dr_chunk_size=5000 \
                dr_export=/models/tanager_nystroem.pkl

# Console output:
Savitzky–Golay → Baseline correction → Continuum removal → NYSTROEM
Interpolating missing values across spectral bands...
Loading floating point  data with 4  bytes ...  (869x804x426)

# Example 3.2: Using the exported Nystroem model in Python
import joblib, numpy as np

# Load exported Nystroem model (kernel map + PCA compressor)
feature_map, pca_after = joblib.load("/models/tanager_nystroem.pkl")

# Load new field spectra (rows = samples, cols = wavelengths
# The spectra must use the same wavelength order and scaling as the hyperspectral 3D map)
spectra = np.loadtxt("/data/field_spectra.txt")

# Apply the same nonlinear mapping and dimensionality reduction
Z = feature_map.transform(spectra)
spectra_reduced = pca_after.transform(Z)

DEPENDENCIES

NumPy – Core numerical operations and array manipulation.
SciPy – Signal processing.
scikit-learn – Machine learning algorithms for PCA, KPCA, FastICA, NMF, SparsePCA, TruncatedSVD, and Nystroem.

AUTHORS

Alen Mangafić and Tomaž Žagar, Geodetic Institute of Slovenia

SOURCE CODE

Available at: i.hyper.preproc source code (history)

Latest change: Monday Dec 08 15:45:48 2025 in commit: f5942b5bae39bd7a93b3710a683074f255cff2a6