v.class.ml

Classification of a vector maps based on the values in attribute tables

Command linePython (grass.tools)Python (grass.script)

v.class.ml [-enfbocrtvxda] vector=name [vtraining=name] [vlayer=string] [tlayer=string] [rlayer=string] [npy_data=string] [npy_cats=string] [npy_cols=string] [npy_index=string] [npy_tdata=string] [npy_tclasses=string] [npy_btdata=string] [npy_btclasses=string] [imp_csv=string] [imp_fig=string] [scalar=string [,string,...]] [decomposition=string] [n_training=integer] [pyclassifiers=string] [pyvar=string] [pyindx=string] [pyindx_optimize=string] [nan=string [,string,...]] [inf=string [,string,...]] [neginf=string [,string,...]] [posinf=string [,string,...]] [csv_test_cls=string] [report_class=string] [svc_c_range=float [,float,...]] [svc_gamma_range=float [,float,...]] [svc_kernel_range=string [,string,...]] [svc_poly_range=string [,string,...]] [svc_n_jobs=integer] [svc_c=float] [svc_gamma=float] [svc_kernel=string] [svc_img=string] [rst_names=string] [--overwrite] [--verbose] [--quiet] [--qq] [--ui]

Example:

v.class.ml vector=name

grass.tools.Tools.v_class_ml(vector, vtraining=None, vlayer=None, tlayer=None, rlayer=None, npy_data="data.npy", npy_cats="cats.npy", npy_cols="cols.npy", npy_index="indx.npy", npy_tdata="training_data.npy", npy_tclasses="training_classes.npy", npy_btdata="Xbt.npy", npy_btclasses="Ybt.npy", imp_csv="features_importances.csv", imp_fig="features_importances.png", scalar="with_mean,with_std", decomposition="", n_training=None, pyclassifiers=None, pyvar=None, pyindx=None, pyindx_optimize=None, nan="*_skewness:nanmean,*_kurtosis:nanmean", inf="*_skewness:nanmean,*_kurtosis:nanmean", neginf="", posinf="", csv_test_cls="test_classifiers.csv", report_class="classification_report.txt", svc_c_range=1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8, svc_gamma_range=1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4, svc_kernel_range="linear,poly,rbf,sigmoid", svc_poly_range="", svc_n_jobs=1, svc_c=None, svc_gamma=None, svc_kernel="rbf", svc_img="domain_%s.svg", rst_names="%s", flags=None, overwrite=None, verbose=None, quiet=None, superquiet=None)

Example:

tools = Tools()
tools.v_class_ml(vector="name")

This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

grass.script.run_command("v.class.ml", vector, vtraining=None, vlayer=None, tlayer=None, rlayer=None, npy_data="data.npy", npy_cats="cats.npy", npy_cols="cols.npy", npy_index="indx.npy", npy_tdata="training_data.npy", npy_tclasses="training_classes.npy", npy_btdata="Xbt.npy", npy_btclasses="Ybt.npy", imp_csv="features_importances.csv", imp_fig="features_importances.png", scalar="with_mean,with_std", decomposition="", n_training=None, pyclassifiers=None, pyvar=None, pyindx=None, pyindx_optimize=None, nan="*_skewness:nanmean,*_kurtosis:nanmean", inf="*_skewness:nanmean,*_kurtosis:nanmean", neginf="", posinf="", csv_test_cls="test_classifiers.csv", report_class="classification_report.txt", svc_c_range=1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8, svc_gamma_range=1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4, svc_kernel_range="linear,poly,rbf,sigmoid", svc_poly_range="", svc_n_jobs=1, svc_c=None, svc_gamma=None, svc_kernel="rbf", svc_img="domain_%s.svg", rst_names="%s", flags=None, overwrite=None, verbose=None, quiet=None, superquiet=None)

Example:

gs.run_command("v.class.ml", vector="name")

Parameters

Command linePython (grass.tools)Python (grass.script)

vector=name [required]
    Name of vector map
    Name of input vector map
vtraining=name
    Name of vector map
    Name of training vector map
vlayer=string
    layer name or number to use for data
tlayer=string
    layer number/name for the training layer
rlayer=string
    layer number/name for the ML results
npy_data=string
    Data with statistics in npy format.
    Default: data.npy
npy_cats=string
    Numpy array with vector cats.
    Default: cats.npy
npy_cols=string
    Numpy array with columns names.
    Default: cols.npy
npy_index=string
    Boolean numpy array with training indexes.
    Default: indx.npy
npy_tdata=string
    training npy file with training set, default: training_data.npy
    Default: training_data.npy
npy_tclasses=string
    training npy file with the classes, default: training_classes.npy
    Default: training_classes.npy
npy_btdata=string
    training npy file with training set, default: training_data.npy
    Default: Xbt.npy
npy_btclasses=string
    training npy file with the classes, default: training_classes.npy
    Default: Ybt.npy
imp_csv=string
    CSV file name with the feature importances rank using extra tree algorithms
    Default: features_importances.csv
imp_fig=string
    Figure file name with feature importances rank using extra tree algorithms
    Default: features_importances.png
scalar=string [,string,...]
    scaler method, center the data before scaling, if no, not scale at all
    Default: with_mean,with_std
decomposition=string
    choose a decomposition method (PCA, KernelPCA, ProbabilisticPCA, RandomizedPCA, FastICA, TruncatedSVD) and set the parameters using the | to separate the decomposition method from the parameters like: PCA|n_components=98
n_training=integer
    Number of random training per class to training the machine learning algorithms
pyclassifiers=string
    a python file with classifiers
pyvar=string
    name of the python variable that must be a list of dictionary
pyindx=string
    specify the index or range of index of the classifiers that you want to use
pyindx_optimize=string
    Index of the classifiers to optimize the training set
nan=string [,string,...]
    Column pattern:Value or Numpy funtion to use to substitute NaN values
    Default: *_skewness:nanmean,*_kurtosis:nanmean
inf=string [,string,...]
    Key:Value or Numpy funtion to use to substitute Inf values
    Default: *_skewness:nanmean,*_kurtosis:nanmean
neginf=string [,string,...]
    Key:Value or Numpy funtion to use to substitute neginf values
posinf=string [,string,...]
    Key:Value or Numpy funtion to use to substitute posinf values
csv_test_cls=string
    csv file name with results of different machine learning scores
    Default: test_classifiers.csv
report_class=string
    text file name with the report of different machine learning algorithms
    Default: classification_report.txt
svc_c_range=float [,float,...]
    C value range list to explore SVC domain
    Default: 1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8
svc_gamma_range=float [,float,...]
    gamma value range list to explore SVC domain
    Default: 1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4
svc_kernel_range=string [,string,...]
    kernel value range list to explore SVC domain
    Default: linear,poly,rbf,sigmoid
svc_poly_range=string [,string,...]
    polynomial order list to explore SVC domain
svc_n_jobs=integer
    number of jobs to use during the domain exploration
    Default: 1
svc_c=float
    definitive C value
svc_gamma=float
    definitive gamma value
svc_kernel=string
    definitive kernel value. Available kernel are: 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'
    Default: rbf
svc_img=string
    filename pattern with the image of SVC parameter
    Default: domain_%s.svg
rst_names=string
    filename pattern for raster
    Default: %s
-e
    Extract the training set from the vtraining map
-n
    Export to numpy files
-f
    Feature importances using extra trees algorithm
-b
    Balance the training using the class with the minor number of data
-o
    Optimize the training samples
-c
    Classify the whole dataset
-r
    Export the classify results to raster maps
-t
    Test different classification methods
-v
    add to test to compute the Bias variance
-x
    add to test to compute extra parameters like: confusion matrix, ROC, PR
-d
    Explore the SVC domain
-a
    append the classification results
--overwrite
    Allow output files to overwrite existing files
--help
    Print usage summary
--verbose
    Verbose module output
--quiet
    Quiet module output
--qq
    Very quiet module output
--ui
    Force launching GUI dialog

vector : str, required
    Name of vector map
    Name of input vector map
    Used as: input, vector, name
vtraining : str, optional
    Name of vector map
    Name of training vector map
    Used as: input, vector, name
vlayer : str, optional
    layer name or number to use for data
tlayer : str, optional
    layer number/name for the training layer
rlayer : str, optional
    layer number/name for the ML results
npy_data : str, optional
    Data with statistics in npy format.
    Default: data.npy
npy_cats : str, optional
    Numpy array with vector cats.
    Default: cats.npy
npy_cols : str, optional
    Numpy array with columns names.
    Default: cols.npy
npy_index : str, optional
    Boolean numpy array with training indexes.
    Default: indx.npy
npy_tdata : str, optional
    training npy file with training set, default: training_data.npy
    Default: training_data.npy
npy_tclasses : str, optional
    training npy file with the classes, default: training_classes.npy
    Default: training_classes.npy
npy_btdata : str, optional
    training npy file with training set, default: training_data.npy
    Default: Xbt.npy
npy_btclasses : str, optional
    training npy file with the classes, default: training_classes.npy
    Default: Ybt.npy
imp_csv : str, optional
    CSV file name with the feature importances rank using extra tree algorithms
    Default: features_importances.csv
imp_fig : str, optional
    Figure file name with feature importances rank using extra tree algorithms
    Default: features_importances.png
scalar : str | list[str], optional
    scaler method, center the data before scaling, if no, not scale at all
    Default: with_mean,with_std
decomposition : str, optional
    choose a decomposition method (PCA, KernelPCA, ProbabilisticPCA, RandomizedPCA, FastICA, TruncatedSVD) and set the parameters using the | to separate the decomposition method from the parameters like: PCA|n_components=98
n_training : int, optional
    Number of random training per class to training the machine learning algorithms
pyclassifiers : str, optional
    a python file with classifiers
pyvar : str, optional
    name of the python variable that must be a list of dictionary
pyindx : str, optional
    specify the index or range of index of the classifiers that you want to use
pyindx_optimize : str, optional
    Index of the classifiers to optimize the training set
nan : str | list[str], optional
    Column pattern:Value or Numpy funtion to use to substitute NaN values
    Default: *_skewness:nanmean,*_kurtosis:nanmean
inf : str | list[str], optional
    Key:Value or Numpy funtion to use to substitute Inf values
    Default: *_skewness:nanmean,*_kurtosis:nanmean
neginf : str | list[str], optional
    Key:Value or Numpy funtion to use to substitute neginf values
posinf : str | list[str], optional
    Key:Value or Numpy funtion to use to substitute posinf values
csv_test_cls : str, optional
    csv file name with results of different machine learning scores
    Default: test_classifiers.csv
report_class : str, optional
    text file name with the report of different machine learning algorithms
    Default: classification_report.txt
svc_c_range : float | list[float] | str, optional
    C value range list to explore SVC domain
    Default: 1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8
svc_gamma_range : float | list[float] | str, optional
    gamma value range list to explore SVC domain
    Default: 1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4
svc_kernel_range : str | list[str], optional
    kernel value range list to explore SVC domain
    Default: linear,poly,rbf,sigmoid
svc_poly_range : str | list[str], optional
    polynomial order list to explore SVC domain
svc_n_jobs : int, optional
    number of jobs to use during the domain exploration
    Default: 1
svc_c : float, optional
    definitive C value
svc_gamma : float, optional
    definitive gamma value
svc_kernel : str, optional
    definitive kernel value. Available kernel are: 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'
    Default: rbf
svc_img : str, optional
    filename pattern with the image of SVC parameter
    Default: domain_%s.svg
rst_names : str, optional
    filename pattern for raster
    Default: %s
flags : str, optional
    Allowed values: e, n, f, b, o, c, r, t, v, x, d, a
    e
        Extract the training set from the vtraining map
    n
        Export to numpy files
    f
        Feature importances using extra trees algorithm
    b
        Balance the training using the class with the minor number of data
    o
        Optimize the training samples
    c
        Classify the whole dataset
    r
        Export the classify results to raster maps
    t
        Test different classification methods
    v
        add to test to compute the Bias variance
    x
        add to test to compute extra parameters like: confusion matrix, ROC, PR
    d
        Explore the SVC domain
    a
        append the classification results
overwrite : bool, optional
    Allow output files to overwrite existing files
    Default: None
verbose : bool, optional
    Verbose module output
    Default: None
quiet : bool, optional
    Quiet module output
    Default: None
superquiet : bool, optional
    Very quiet module output
    Default: None

Returns:

result : grass.tools.support.ToolResult | None
If the tool produces text as standard output, a ToolResult object will be returned. Otherwise, None will be returned.

Raises:

grass.tools.ToolError: When the tool ended with an error.

vector : str, required
    Name of vector map
    Name of input vector map
    Used as: input, vector, name
vtraining : str, optional
    Name of vector map
    Name of training vector map
    Used as: input, vector, name
vlayer : str, optional
    layer name or number to use for data
tlayer : str, optional
    layer number/name for the training layer
rlayer : str, optional
    layer number/name for the ML results
npy_data : str, optional
    Data with statistics in npy format.
    Default: data.npy
npy_cats : str, optional
    Numpy array with vector cats.
    Default: cats.npy
npy_cols : str, optional
    Numpy array with columns names.
    Default: cols.npy
npy_index : str, optional
    Boolean numpy array with training indexes.
    Default: indx.npy
npy_tdata : str, optional
    training npy file with training set, default: training_data.npy
    Default: training_data.npy
npy_tclasses : str, optional
    training npy file with the classes, default: training_classes.npy
    Default: training_classes.npy
npy_btdata : str, optional
    training npy file with training set, default: training_data.npy
    Default: Xbt.npy
npy_btclasses : str, optional
    training npy file with the classes, default: training_classes.npy
    Default: Ybt.npy
imp_csv : str, optional
    CSV file name with the feature importances rank using extra tree algorithms
    Default: features_importances.csv
imp_fig : str, optional
    Figure file name with feature importances rank using extra tree algorithms
    Default: features_importances.png
scalar : str | list[str], optional
    scaler method, center the data before scaling, if no, not scale at all
    Default: with_mean,with_std
decomposition : str, optional
    choose a decomposition method (PCA, KernelPCA, ProbabilisticPCA, RandomizedPCA, FastICA, TruncatedSVD) and set the parameters using the | to separate the decomposition method from the parameters like: PCA|n_components=98
n_training : int, optional
    Number of random training per class to training the machine learning algorithms
pyclassifiers : str, optional
    a python file with classifiers
pyvar : str, optional
    name of the python variable that must be a list of dictionary
pyindx : str, optional
    specify the index or range of index of the classifiers that you want to use
pyindx_optimize : str, optional
    Index of the classifiers to optimize the training set
nan : str | list[str], optional
    Column pattern:Value or Numpy funtion to use to substitute NaN values
    Default: *_skewness:nanmean,*_kurtosis:nanmean
inf : str | list[str], optional
    Key:Value or Numpy funtion to use to substitute Inf values
    Default: *_skewness:nanmean,*_kurtosis:nanmean
neginf : str | list[str], optional
    Key:Value or Numpy funtion to use to substitute neginf values
posinf : str | list[str], optional
    Key:Value or Numpy funtion to use to substitute posinf values
csv_test_cls : str, optional
    csv file name with results of different machine learning scores
    Default: test_classifiers.csv
report_class : str, optional
    text file name with the report of different machine learning algorithms
    Default: classification_report.txt
svc_c_range : float | list[float] | str, optional
    C value range list to explore SVC domain
    Default: 1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8
svc_gamma_range : float | list[float] | str, optional
    gamma value range list to explore SVC domain
    Default: 1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4
svc_kernel_range : str | list[str], optional
    kernel value range list to explore SVC domain
    Default: linear,poly,rbf,sigmoid
svc_poly_range : str | list[str], optional
    polynomial order list to explore SVC domain
svc_n_jobs : int, optional
    number of jobs to use during the domain exploration
    Default: 1
svc_c : float, optional
    definitive C value
svc_gamma : float, optional
    definitive gamma value
svc_kernel : str, optional
    definitive kernel value. Available kernel are: 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'
    Default: rbf
svc_img : str, optional
    filename pattern with the image of SVC parameter
    Default: domain_%s.svg
rst_names : str, optional
    filename pattern for raster
    Default: %s
flags : str, optional
    Allowed values: e, n, f, b, o, c, r, t, v, x, d, a
    e
        Extract the training set from the vtraining map
    n
        Export to numpy files
    f
        Feature importances using extra trees algorithm
    b
        Balance the training using the class with the minor number of data
    o
        Optimize the training samples
    c
        Classify the whole dataset
    r
        Export the classify results to raster maps
    t
        Test different classification methods
    v
        add to test to compute the Bias variance
    x
        add to test to compute extra parameters like: confusion matrix, ROC, PR
    d
        Explore the SVC domain
    a
        append the classification results
overwrite : bool, optional
    Allow output files to overwrite existing files
    Default: None
verbose : bool, optional
    Verbose module output
    Default: None
quiet : bool, optional
    Quiet module output
    Default: None
superquiet : bool, optional
    Very quiet module output
    Default: None

DESCRIPTION

v.class.ml uses machine-learning algorithms to classify a vector maps based on the values of its attribute table. The module uses different machine-learning libraries available for python at the moment uses: scikit-learn (package name may be "python-scikit-learn") and MLPY, but should be possible to add easily other python libraries. The module is though to be use in a modular way, using the flags it is possible to define which independent tasks should be execute.

Flags

-e
Extract the training set from a vector map (vtraining).
-n
Store: attribute table data, columns names, categories training data, training index to a numpy binary files.
-f
Rank feature importances using a ExtraTreesClassifier algorithm.
-b
Balance the training using the class with the minor number of training samples or the parameter set in n_training.
-o
Optimize a balanced training dataset using the class with the minor number of training samples or the parameter set in n_training.
-c
Classify the whole dataset.
-r
Export machine-learning results to raster maps.
-t
Test several machine-learning algorithms on your dataset.
-v
Test also the bias variance.
-x
Compute also extra parameters to evaluate different algorithms like: confusion matrix, ROC, PR.
-d
Explore the Support Vector Classification (SVC) domain.

Input parameters

The vector parameter is the input vector map. The input vector map must be prepared with v.category to copy the categories to all the layers that will be created.

The vtraining parameter is a vector input map that can be used to select the training areas. Currently only supervised classification is implemented so this parameter is mandatory. The training vector map can be generated using the GRASS standard tool for supervised classification g.gui.iclass.

The vlayer parameter is the layer name or number of the attribute tables with the data that must be used as input for the machine-learning algorithms.

The tlayer parameter is the layer name or number of the attribute tables where are or will be stored the training data for the machine-learning algorithms.

The rlayer parameter is the layer name or number the attribute tables where will be stored the machine-learning results.

The npy_data parameter is a string with the path to define where the binary numpy files containing the complete dataset will be saved.

The npy_cats parameter is a string with the path to define where the binary numpy files containing the vector categories will be saved.

The npy_cols parameter is a string with the path to define where the binary numpy files containing the column names of the data attribute table will be saved.

The npy_index parameter is a string with the path to define where the binary numpy files containing a boolean array to say if the category is used or not as training.

The npy_tdata parameter is a string with the path to define where the binary numpy files containing a training data array will be saved.

The npy_tclasses parameter is a string with the path to define where the binary numpy files containing the training classes will be saved.

The npy_btdata parameter as npy_tdata but only for a balance dataset.

The npy_btclasses parameter as npy_tclasses but only for a balance dataset.

The imp_csv parameter is a string with the path to define where a CSV file containing the rank of the feature importances should be save.

The imp_fig parameter is a string with the path to define where a figure file containing the rank of the feature importances should be save.

The scalar parameter is a string with scaler methods that will be apply to pre-process the data. Two main methods are available: with_mean, with_std. This is a quite common task therefore the default parameter apply both methods.

The decomposition parameter is a string with scaler methods that will be apply to pre-process the data. The main decomposition methods available are: PCA, KernelPCA, ProbabilisticPCA, RandomizedPCA, FastICA, TruncatedSVD. Each of this methods could take several parameters. Use "|" as separator between the decomposition method name and its options, using the "," to separate the options. For examples imagine that we want to decompose using the KernelPCA method with 10 number of components and using a linear kernel, so the correct string is: "KernelPCA|n_components=10,kernel=linear"

The n_training parameter is an integer with the number of training that must be use per class. Some machine-learning methods are sensitive if the training dataset is balanced or not. As default all the training will be used.

The pyclassifiers parameter is a file path to a python file containing a list of dictionary to define classifiers class and options. See an example of the default classifiers used by the v.class.ml module.

The pyvar parameter is a string with the python variable name defined in the pyclassifiers file.

The pyindx parameter is a string with the indexes of the classifiers that will be used. In the string you could define a range using the minus character, or list the index usig the comma as separator, or combine both options together. For example: '1-5,34-36,40' it means that only classifiers with index: 1, 2, 3, 4, 5, 34, 35, 36 and 40 will be used.

The pyindx_optimize parameter is a integer with the classifier index that will be used to optimize a balance training dataset. This option is used only if optimize is true otherwise will be ignored.

The nan parameter is a string that allows user to define for each column in the attribute table which value or function should be used to substitute NaN values. The syntax could be: 'col0:9999,col1:9999'. The column name could be also a pattern, so it is possible to define a rule like: '*_mean:nanmean,*_max:nanmax' that substitute in all the columns that finish with '_mean' the mean value of the column and for column that end with '_max' the maximum value. This operation is needed because machine-learning algorithms are not able to handle nan, inf, neginf, and posinf values.

The inf parameter is similar to nan, but instead of substituting nan values the rules will be applied for infinite values.

The neginf parameter is similar to nan, but instead of substituting nan values the rules will be applied for negative infinite values.

The posinf parameter is similar to nan, but instead of substituting nan values the rules will be applied for positive infinite values.

The csv_test_cls parameter is the file name/path where the results of the classification test will be written.

The report_class parameter is the file name/path where a summary for each machine learning algorithms will be written.

The svc_c_range parameter is a range of C values that will be used when exploring the domain of the Support Vector Machine algorithms.

The svc_gamma_range parameter is a range of gamma values that will be used when exploring the domain of the Support Vector Machine algorithms.

The svc_kernel_range parameter is a range of kernel values that will be used when exploring the domain of the Support Vector Machine algorithms.

The svc_n_jobs parameter is an integer with the number of process that will be used during the domain exploration of Support Vector Machine algorithms.

The svc_img parameter is the file name/path pattern of the image that will be generated from the domain exploration.

The svc_c parameter is the definitive C value that will be used for final classification.

The svc_gamma parameter is the definitive gamma value that will be used for final classification.

The svc_kernel parameter is the definitive kernel value that will be used for final classification.

The rst_names parameter is the name pattern that will be use to generate the output raster map for each algorithm.

AUTHOR

Pietro Zambelli, University of Trento

SOURCE CODE

Available at: v.class.ml source code (history)
Latest change: Tuesday Feb 17 22:58:37 2026 in commit 04d2580

v.class.ml

Parameters

DESCRIPTION

Flags

Input parameters

SEE ALSO

AUTHOR

SOURCE CODE