v.mrmr
Perform Minimum Redundancy Maximum Relevance Feature Selection on a GRASS Attribute Table
v.mrmr table=name layer=string [threshold=float] nfeatures=integer nsamples=integer maxvar=integer method=string [--verbose] [--quiet] [--qq] [--ui]
Example:
v.mrmr table=name layer=1 nfeatures=50 nsamples=1000 maxvar=10000 method=MID
grass.script.run_command("v.mrmr", table, layer="1", threshold=1.0, nfeatures=50, nsamples=1000, maxvar=10000, method="MID", verbose=False, quiet=False, superquiet=False)
Example:
gs.run_command("v.mrmr", table="name", layer="1", nfeatures=50, nsamples=1000, maxvar=10000, method="MID")
Parameters
table=name [required]
Name of input vector map
Vector features
layer=string [required]
Layer number or name
Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
Default: 1
threshold=float
Discretization threshold
Default: 1.0
nfeatures=integer [required]
Number of features (attributes)
Default: 50
nsamples=integer [required]
Maximum number of samples
Default: 1000
maxvar=integer [required]
Maximum number of variables/attributes
Default: 10000
method=string [required]
Feature selection method
Allowed values: MID, MIQ
Default: MID
--help
Print usage summary
--verbose
Verbose module output
--quiet
Quiet module output
--qq
Very quiet module output
--ui
Force launching GUI dialog
table : str, required
Name of input vector map
Vector features
Used as: input, vector, name
layer : str, required
Layer number or name
Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
Used as: input, layer
Default: 1
threshold : float, optional
Discretization threshold
Default: 1.0
nfeatures : int, required
Number of features (attributes)
Default: 50
nsamples : int, required
Maximum number of samples
Default: 1000
maxvar : int, required
Maximum number of variables/attributes
Default: 10000
method : str, required
Feature selection method
Allowed values: MID, MIQ
Default: MID
verbose: bool, optional
Verbose module output
Default: False
quiet: bool, optional
Quiet module output
Default: False
superquiet: bool, optional
Very quiet module output
Default: False
DESCRIPTION
v.mrmr is a simple GUI for exporting data to the Minimum Redundancy Maximum Relevance (mRMR) feature selection command line tool (Peng et al., 2005). mRMR is designed to select features that have the maximal statistical "dependency" on the classification variable, while simultaneously minimizing the redundancy among the selected features.
NOTES
The command line tool needs to be installed separately in a location that is recognized by the system or in the PATH. The command line tool can be installed on windows (binaries available), linux and OS X (needs compilation). Installation instructions are provided on Peng's Website.
The module requires data within a vector attribute table to be arranged in a specific order. The classification variable (i.e., class labels) need to be in the first column, except for the cat attribute which is not exported. The class label also needs to be in numerical form, i.e., 1, 2, 3.... rather than 'forest' or 'urban'. Also, the attribute table should not contain any missing values because this causes an erroneous mRMR result.
The algorithm outputs a tab-separated list of attributes, ranked by the most important feature first. The method parameter allows a choice between the Maximum Information Difference (MID) and Mutual Information Quotient (MIQ) feature evaluation criteria, which respectively represent the relevancy and redundancy of the features. The algorithm also shows the ranking of the features based on the conventional maximum relevance method. Additional user options include nfeatures which specifies the number of features that you want to select; nsamples limits the maximum number of samples to base the feature selection, and maxvar limits the maximum number of attributes, both of which can therefore reduce the computation for very large datasets. threshold is the discretization threshold to apply to the continuous variable data, i.e., mean +/- threshold * standard deviation. layer is the attribute layer to be used in the feature selection process.
EXAMPLE
v.mrmr.py vector=vector_layer layer=1 thres=1.0 nfeatures=50 \
nsamples=10000 maxvar=10000 method=MID
REFERENCES
Peng, H.; Fulmi Long; Ding, C., "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," in Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.27, no.8, pp.1226-1238, Aug. 2005
AUTHOR
Steven Pawley
SOURCE CODE
Available at: v.mrmr source code
(history)
Latest change: Thursday Feb 20 13:02:26 2025 in commit 53de819