v.class.mlR

Provides supervised support vector machine classification

Command linePython (grass.tools)Python (grass.script)

v.class.mlR [-fintp] [segments_map=name] [segments_layer=string] [training_map=name] [training_layer=string] [segments_file=name] [training_file=name] [training_sample_size=string] [tuning_sample_size=integer] [separator=character] [raster_segments_map=name] [classified_map=name] [train_class_column=string] output_class_column=string output_prob_column=string [max_features=integer] classifiers=string [,string,...] folds=integer partitions=integer tunelength=integer [tunegrids=string] weighting_modes=string [,string,...] weighting_metric=string [output_model_file=name] [input_model_file=name] [classification_results=name] [variable_importance_file=name] [accuracy_file=name] [model_details=name] [bw_plot_file=name] [r_script_file=name] [processes=integer] [--overwrite] [--verbose] [--quiet] [--qq] [--ui]

Example:

v.class.mlR segments_map=name output_class_column=vote output_prob_column=prob classifiers=svmRadial,rf folds=5 partitions=10 tunelength=10 weighting_modes=smv weighting_metric=accuracy

grass.tools.Tools.v_class_mlR(segments_map=None, segments_layer="1", training_map=None, training_layer="1", segments_file=None, training_file=None, training_sample_size=None, tuning_sample_size=None, separator="pipe", raster_segments_map=None, classified_map=None, train_class_column=None, output_class_column="vote", output_prob_column="prob", max_features=None, classifiers="svmRadial,rf", folds=5, partitions=10, tunelength=10, tunegrids=None, weighting_modes="smv", weighting_metric="accuracy", output_model_file=None, input_model_file=None, classification_results=None, variable_importance_file=None, accuracy_file=None, model_details=None, bw_plot_file=None, r_script_file=None, processes=1, flags=None, overwrite=None, verbose=None, quiet=None, superquiet=None)

Example:

tools = Tools()
tools.v_class_mlR(segments_map="name", output_class_column="vote", output_prob_column="prob", classifiers="svmRadial,rf", folds=5, partitions=10, tunelength=10, weighting_modes="smv", weighting_metric="accuracy")

This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

grass.script.run_command("v.class.mlR", segments_map=None, segments_layer="1", training_map=None, training_layer="1", segments_file=None, training_file=None, training_sample_size=None, tuning_sample_size=None, separator="pipe", raster_segments_map=None, classified_map=None, train_class_column=None, output_class_column="vote", output_prob_column="prob", max_features=None, classifiers="svmRadial,rf", folds=5, partitions=10, tunelength=10, tunegrids=None, weighting_modes="smv", weighting_metric="accuracy", output_model_file=None, input_model_file=None, classification_results=None, variable_importance_file=None, accuracy_file=None, model_details=None, bw_plot_file=None, r_script_file=None, processes=1, flags=None, overwrite=None, verbose=None, quiet=None, superquiet=None)

Example:

gs.run_command("v.class.mlR", segments_map="name", output_class_column="vote", output_prob_column="prob", classifiers="svmRadial,rf", folds=5, partitions=10, tunelength=10, weighting_modes="smv", weighting_metric="accuracy")

Parameters

Command linePython (grass.tools)Python (grass.script)

segments_map=name
    Vector map with areas to be classified
    Vector map containing all areas and relevant attributes
segments_layer=string
    Layer of the segments map where attributes are stored
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Default: 1
training_map=name
    Vector map with training areas
    Vector map with training areas and relevant attributes
training_layer=string
    Layer of the training map where attributes are stored
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Default: 1
segments_file=name
    File containing statistics of all segments
    File containing relevant attributes for all areas
training_file=name
    File containing statistics of training segments
    File containing relevant attributes for training areas
training_sample_size=string
    Size of subsample per class to be used for training
tuning_sample_size=integer
    Size of sample per class to be used for hyperparameter tuning
separator=character
    Field separator
    Field separator in input text files
    Default: pipe
raster_segments_map=name
    Raster map with segments
    Input raster map containing all segments
classified_map=name
    Prefix for raster maps (one per weighting mode) with classes attributed to pixels
    Output raster maps (one per weighting mode) in which all pixels are reclassed to the class attributed to the segment they belong to
train_class_column=string
    Name of attribute column containing training classification
output_class_column=string [required]
    Prefix of column with final classification
    Default: vote
output_prob_column=string [required]
    Prefix of column with probability of classification
    Default: prob
max_features=integer
    Perform feature selection to a maximum of max_features
classifiers=string [,string,...] [required]
    Classifiers to use
    Allowed values: svmRadial, svmLinear, svmPoly, rf, ranger, rpart, C5.0, knn, xgbTree
    Default: svmRadial,rf
folds=integer [required]
    Number of folds to use for cross-validation
    Default: 5
partitions=integer [required]
    Number of different partitions to use for cross-validation
    Default: 10
tunelength=integer [required]
    Number of levels to test for each tuning parameter
    Default: 10
tunegrids=string
    Python dictionary of customized tunegrids
weighting_modes=string [,string,...] [required]
    Type of weighting to use
    Allowed values: smv, swv, bwwv, qbwwv
    Default: smv
weighting_metric=string [required]
    Metric to use for weighting
    Allowed values: accuracy, kappa
    Default: accuracy
output_model_file=name
    File where to save model(s)
input_model_file=name
    Name of file containing an existing model
classification_results=name
    File for saving results of all classifiers
variable_importance_file=name
    File for saving relative importance of used variables
accuracy_file=name
    File for saving accuracy measures of classifiers
model_details=name
    File for saving details about the classifier module runs
bw_plot_file=name
    PNG file for saving box-whisker plot of classifier performance
r_script_file=name
    File containing R script
processes=integer
    Number of processes to run in parallel
    Default: 1
-f
    Only write results to text file, do not update vector map
-i
    Include individual classifier results in output
-n
    Normalize (center and scale) data before analysis
-t
    Only tune and train model, do not predict
-p
    Include class probabilities in classification results
--overwrite
    Allow output files to overwrite existing files
--help
    Print usage summary
--verbose
    Verbose module output
--quiet
    Quiet module output
--qq
    Very quiet module output
--ui
    Force launching GUI dialog

segments_map : str, optional
    Vector map with areas to be classified
    Vector map containing all areas and relevant attributes
    Used as: input, vector, name
segments_layer : str, optional
    Layer of the segments map where attributes are stored
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Used as: input, layer
    Default: 1
training_map : str, optional
    Vector map with training areas
    Vector map with training areas and relevant attributes
    Used as: input, vector, name
training_layer : str, optional
    Layer of the training map where attributes are stored
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Used as: input, layer
    Default: 1
segments_file : str | io.StringIO, optional
    File containing statistics of all segments
    File containing relevant attributes for all areas
    Used as: input, file, name
training_file : str | io.StringIO, optional
    File containing statistics of training segments
    File containing relevant attributes for training areas
    Used as: input, file, name
training_sample_size : str, optional
    Size of subsample per class to be used for training
tuning_sample_size : int, optional
    Size of sample per class to be used for hyperparameter tuning
separator : str, optional
    Field separator
    Field separator in input text files
    Used as: input, separator, character
    Default: pipe
raster_segments_map : str | np.ndarray, optional
    Raster map with segments
    Input raster map containing all segments
    Used as: input, raster, name
classified_map : str | type(np.ndarray) | type(np.array) | type(gs.array.array), optional
    Prefix for raster maps (one per weighting mode) with classes attributed to pixels
    Output raster maps (one per weighting mode) in which all pixels are reclassed to the class attributed to the segment they belong to
    Used as: output, raster, name
train_class_column : str, optional
    Name of attribute column containing training classification
output_class_column : str, required
    Prefix of column with final classification
    Default: vote
output_prob_column : str, required
    Prefix of column with probability of classification
    Default: prob
max_features : int, optional
    Perform feature selection to a maximum of max_features
classifiers : str | list[str], required
    Classifiers to use
    Allowed values: svmRadial, svmLinear, svmPoly, rf, ranger, rpart, C5.0, knn, xgbTree
    Default: svmRadial,rf
folds : int, required
    Number of folds to use for cross-validation
    Default: 5
partitions : int, required
    Number of different partitions to use for cross-validation
    Default: 10
tunelength : int, required
    Number of levels to test for each tuning parameter
    Default: 10
tunegrids : str, optional
    Python dictionary of customized tunegrids
weighting_modes : str | list[str], required
    Type of weighting to use
    Allowed values: smv, swv, bwwv, qbwwv
    Default: smv
weighting_metric : str, required
    Metric to use for weighting
    Allowed values: accuracy, kappa
    Default: accuracy
output_model_file : str, optional
    File where to save model(s)
    Used as: output, file, name
input_model_file : str | io.StringIO, optional
    Name of file containing an existing model
    Used as: input, file, name
classification_results : str, optional
    File for saving results of all classifiers
    Used as: output, file, name
variable_importance_file : str, optional
    File for saving relative importance of used variables
    Used as: output, file, name
accuracy_file : str, optional
    File for saving accuracy measures of classifiers
    Used as: output, file, name
model_details : str, optional
    File for saving details about the classifier module runs
    Used as: output, file, name
bw_plot_file : str, optional
    PNG file for saving box-whisker plot of classifier performance
    Used as: output, file, name
r_script_file : str, optional
    File containing R script
    Used as: output, file, name
processes : int, optional
    Number of processes to run in parallel
    Default: 1
flags : str, optional
    Allowed values: f, i, n, t, p
    f
        Only write results to text file, do not update vector map
    i
        Include individual classifier results in output
    n
        Normalize (center and scale) data before analysis
    t
        Only tune and train model, do not predict
    p
        Include class probabilities in classification results
overwrite : bool, optional
    Allow output files to overwrite existing files
    Default: None
verbose : bool, optional
    Verbose module output
    Default: None
quiet : bool, optional
    Quiet module output
    Default: None
superquiet : bool, optional
    Very quiet module output
    Default: None

Returns:

result : grass.tools.support.ToolResult | np.ndarray | tuple[np.ndarray] | None
If the tool produces text as standard output, a ToolResult object will be returned. Otherwise, None will be returned. If an array type (e.g., np.ndarray) is used for one of the raster outputs, the result will be an array and will have the shape corresponding to the computational region. If an array type is used for more than one raster output, the result will be a tuple of arrays.

Raises:

grass.tools.ToolError: When the tool ended with an error.

segments_map : str, optional
    Vector map with areas to be classified
    Vector map containing all areas and relevant attributes
    Used as: input, vector, name
segments_layer : str, optional
    Layer of the segments map where attributes are stored
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Used as: input, layer
    Default: 1
training_map : str, optional
    Vector map with training areas
    Vector map with training areas and relevant attributes
    Used as: input, vector, name
training_layer : str, optional
    Layer of the training map where attributes are stored
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Used as: input, layer
    Default: 1
segments_file : str, optional
    File containing statistics of all segments
    File containing relevant attributes for all areas
    Used as: input, file, name
training_file : str, optional
    File containing statistics of training segments
    File containing relevant attributes for training areas
    Used as: input, file, name
training_sample_size : str, optional
    Size of subsample per class to be used for training
tuning_sample_size : int, optional
    Size of sample per class to be used for hyperparameter tuning
separator : str, optional
    Field separator
    Field separator in input text files
    Used as: input, separator, character
    Default: pipe
raster_segments_map : str, optional
    Raster map with segments
    Input raster map containing all segments
    Used as: input, raster, name
classified_map : str, optional
    Prefix for raster maps (one per weighting mode) with classes attributed to pixels
    Output raster maps (one per weighting mode) in which all pixels are reclassed to the class attributed to the segment they belong to
    Used as: output, raster, name
train_class_column : str, optional
    Name of attribute column containing training classification
output_class_column : str, required
    Prefix of column with final classification
    Default: vote
output_prob_column : str, required
    Prefix of column with probability of classification
    Default: prob
max_features : int, optional
    Perform feature selection to a maximum of max_features
classifiers : str | list[str], required
    Classifiers to use
    Allowed values: svmRadial, svmLinear, svmPoly, rf, ranger, rpart, C5.0, knn, xgbTree
    Default: svmRadial,rf
folds : int, required
    Number of folds to use for cross-validation
    Default: 5
partitions : int, required
    Number of different partitions to use for cross-validation
    Default: 10
tunelength : int, required
    Number of levels to test for each tuning parameter
    Default: 10
tunegrids : str, optional
    Python dictionary of customized tunegrids
weighting_modes : str | list[str], required
    Type of weighting to use
    Allowed values: smv, swv, bwwv, qbwwv
    Default: smv
weighting_metric : str, required
    Metric to use for weighting
    Allowed values: accuracy, kappa
    Default: accuracy
output_model_file : str, optional
    File where to save model(s)
    Used as: output, file, name
input_model_file : str, optional
    Name of file containing an existing model
    Used as: input, file, name
classification_results : str, optional
    File for saving results of all classifiers
    Used as: output, file, name
variable_importance_file : str, optional
    File for saving relative importance of used variables
    Used as: output, file, name
accuracy_file : str, optional
    File for saving accuracy measures of classifiers
    Used as: output, file, name
model_details : str, optional
    File for saving details about the classifier module runs
    Used as: output, file, name
bw_plot_file : str, optional
    PNG file for saving box-whisker plot of classifier performance
    Used as: output, file, name
r_script_file : str, optional
    File containing R script
    Used as: output, file, name
processes : int, optional
    Number of processes to run in parallel
    Default: 1
flags : str, optional
    Allowed values: f, i, n, t, p
    f
        Only write results to text file, do not update vector map
    i
        Include individual classifier results in output
    n
        Normalize (center and scale) data before analysis
    t
        Only tune and train model, do not predict
    p
        Include class probabilities in classification results
overwrite : bool, optional
    Allow output files to overwrite existing files
    Default: None
verbose : bool, optional
    Verbose module output
    Default: None
quiet : bool, optional
    Quiet module output
    Default: None
superquiet : bool, optional
    Very quiet module output
    Default: None

DESCRIPTION

v.class.mlR is a wrapper module that uses the R caret package for machine learning in R to classify objects using training features by supervised learning.

The user provides a set of objects (or segments) to be classified, including all feature variables describing these object, and a set of objects to be used as training data, including the same feature variables as those describing the unknown objects, plus one additional column indicating the class each training falls into. The training data can, but does not have to be, a subset of the set of objects to be classified.

The user can provide input either as vector maps (segments_map and training_map, or as csv files (segments_file and training file, or a combination of both. Csv files have to be formatted in line with the default output of v.db.select, i.e. with a header. The field separator can be set with the separator parameter. Output can consist of either additional columns in the vector input map of features, a text file (classification_results) or reclassed raster maps (classified_map).

When using text file input, the training data should not contain an id column. The object data (i.e., full set of data to be classified) should have the ids in the first column.

The user has to provide the name of the column in the training data that contains the class values (train_class_column), the prefix of the columns that will contain the final class after classification (output_class_column) as well as the prefix of the columns that will contain the probability values linked to these classifications (output_prob_column - see below).

Different classifiers are proposed classifiers: k-nearest neighbor (knn), support vector machine with a radial kernel (svmRadial), support vector machine with a linear kernel (svmLinear), random forest (rf), C5.0 (C5.0) and XGBoost (xgbTree) decision trees and recursive partitioning (rpart). Each of these classifiers is tuned automatically through repeated cross-validation. Caret will automatically determine a reasonable set of values for tuning. See the caret webpage for more information about the tuning parameters for each classifier, and more generally for the information about how caret works. By default, the module creates 10 5-fold partitions for cross-validation and tests 10 possible values of the tuning parameters. These values can be changed using, repectively, the partitions, folds and tunelength parameters.

The user can define a customized tunegrid for each classifier, using the tunegrids parameter. Any customized tunegrid has to be defined as a Python dictionary, with the classifiers as keys, and the input to expand.grid() as content as defined in the caret documentation.

For example, to define customized tuning grids for svmRadial and RandomForest, the user can define the paramter as:

tunegrids="{'svmRadial': 'sigma=c(0.01,0.05,0.1), C=c(1,16,128)', 'rf': 'mtry=c(3,10,20)'}"

Tuning is potentially very time consuming. Using only a subset of the training data for tuning can thus speed up the process significantly, without losing much quality in the tuning results. For training, depending on the number of features used, some R functions can reach their capacity limit. The user can, therefore, define a maximum size of samples per class both for tuning (tuning_sample_size) and for training (training_sample_size).

Classifying using too many features (i.e. variables describing the objects to be classified) as input can have negative effects on classification accuracy (Georganos et al, 2018). The module therefore provides the possibility to run a feature selection algorithm on the training data in order to identify those features that are the most efficient for classification. Using less features also speeds up the tuning, training and classification processes. To activate feature selection, the user has to set the max_features parameter to the maximum number of features that the model should select. Often, less than this maximum will be selected. The method used for feature selection is recursive feature elimination based on a random forest model. Note thus that feature selection might be sub-optimal for other classifiers, notably non tree-based.

The module can be run only for tuning and training a model, but without

Optionally, the module can be run for tuning and training only, i.e., no prediction (-t flag). Any trained model can be saved to a file (output_model_file) which can then be read into the module at a later stage for the prediction step (input_model_file). This can be particularly useful for cluster computing approaches where a trained model can be applied to different datasets in parallel.

The module can run the model tuning using parallel processing. In order for this to work, the R-package doParallel has to be installed. The processes parameter allows to chose the number of processes to run.

The user can chose to include the individual classifiers results in the output (the attributes and/or the raster maps) using the i flag, but by default the output will be the result of a voting scheme merging the results of the different classifiers. Votes can be weighted according to a user-defined mode (weighting_mode): simple majority vote without weighting, i.e. all weights are equal (smv), simple weighted majority vote (swv), best-worst weighted vote (bwwv) and quadratic best-worst weighted vote (qbwwv). For more details about these voting modes see Moreno-Seco et al (2006). By default, the weights are calculated based on the accuracy metric, but the user can chose the kappa value as an alternative (weighting_metric).

In the output (as attribute columns or text file) each weighting schemes result is provided accompanied by a value that can be considered as an estimation of the probability of the classification after weighted vote, based on equation (2) in Moreno et al (2006), page 709. At this stage, this estimation does not, however, take into account the probabilities determined individually by each classifier.

Optional output of the module include detailed information about the different classifier models and their cross-validation results model_details (for details of these results see the train, resamples and confusionMatrix.train functions in the caret package), a box-and-whisker plot indicating the resampling variance based on the cross-validation for each classifier (bw_plot_file), a csv file containing accuracy measures (overall accuracy and kappa) for each classifier (accuracy_file), and a file containing variable importance as determined by the classifier (for those classifiers that allow such calculation). When the -p flag is given, the module also provides probabilities per class for each classifier (at least for those where caret can calculate such probabilities). This allows to evaluate the confidence of classification of each object. The user can also chose to write the R script constructed and used internally to a text file for study or further modification.

NOTES

The module can be used in a tool chain together with i.segment and the addon i.segment.stats for object-based classification of satellite imagery.

WARNING: The option output files are created by R and currently no checking is done of whether files of the same name already exist. If they exist, they are silently overwritten, regardless of whether the GRASS GIS --o flag is set or not.

The module makes no effort to check the input data for NA values or anything else that might perturb the analyses. It is up to the user to proceed to relevant checks before launching the module.

DEPENDENCIES

This module uses R. It is the user's responsibility to make sure R is installed and can be called from the environment this module is running in. See the relevant wiki page for more information. The module tries to install necessary R packages automatically if necessary. These include : 'caret', 'kernlab', 'e1071', 'randomForest', and 'rpart'. Other packages can be necessary such as 'ggplot2', 'lattice' (for the plots), and 'doParallel' (if parallel processing is desired).

TODO

Check for existing file created by R as no overwrite check is done in R
Use class probabilities determined by individual classifiers to calculate overall class probabilities

EXAMPLE

Using existing vector maps as input and writing the output to the attribute table of the segments map, including the individual classifier results:

v.class.mlR segments_map=seg training_map=training train_class_column=class weighting_mode=smv,swv,qbwwv -i

Using text files with segment characteristics as input and writing output to raster files and a csv file

v.class.mlR segments_file=segstats.csv training_file=training.csv train_class_column=class weighting_mode=smv,swv,qbwwv raster_segments_map=seg classified_map=vote classification_results=class_results.csv

REFERENCES

Moreno-Seco, F. et al. (2006), Comparison of Classifier Fusion Methods for Classification in Pattern Recognition Tasks. In D.-Y. Yeung et al., eds. Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 705–713, https://doi.org/10.1007/11815921_77.
Georganos, S. et al (2018), Less is more: optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application, GIScience and Remote Sensing, 55:2, 221-242, DOI: 10.1080/15481603.2017.1408892

AUTHOR

Moritz Lennert, Université Libre de Bruxelles (ULB) based on an initial R-script by Ruben Van De Kerchove, also ULB at the time

SOURCE CODE

Available at: v.class.mlR source code (history)
Latest change: Wednesday Jun 17 14:05:16 2026 in commit 2b69c1e

v.class.mlR

Parameters

DESCRIPTION

NOTES

DEPENDENCIES

TODO

EXAMPLE

REFERENCES

SEE ALSO

AUTHOR

SOURCE CODE