GRASS GIS manual: v.mrmr

NAME

v.mrmr - Perform Minimum Redundancy Maximum Relevance Feature Selection on a GRASS Attribute Table

KEYWORDS

SYNOPSIS

v.mrmr

v.mrmr --help

v.mrmr table=name layer=string [threshold=float] nfeatures=integer nsamples=integer maxvar=integer method=string [--help] [--verbose] [--quiet] [--ui]

Flags:

--help: Print usage summary
--verbose: Verbose module output
--quiet: Quiet module output
--ui: Force launching GUI dialog

Parameters:

table=name [required]: Name of input vector map; Vector features
layer=string [required]: Layer number or name; Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.; Default: 1
threshold=float: Discretization threshold; Default: 1.0
nfeatures=integer [required]: Number of features (attributes); Default: 50
nsamples=integer [required]: Maximum number of samples; Default: 1000
maxvar=integer [required]: Maximum number of variables/attributes; Default: 10000
method=string [required]: Feature selection method; Options: MID, MIQ; Default: MID

DESCRIPTION
NOTES
EXAMPLE
REFERENCES
AUTHOR

DESCRIPTION

v.mrmr is a simple GUI for exporting data to the Minimum Redundancy Maximum Relevance (mRMR) feature selection command line tool (Peng et al., 2005). mRMR is designed to select features that have the maximal statistical "dependency" on the classification variable, while simultaneously minimizing the redundancy among the selected features.

NOTES

The command line tool needs to be installed separately in a location that is recognized by the system or in the PATH. The command line tool can be installed on windows (binaries available), linux and OS X (needs compilation). Installation instructions are provided on Peng's Website.

The module requires data within a vector attribute table to be arranged in a specific order. The classification variable (i.e., class labels) need to be in the first column, except for the cat attribute which is not exported. The class label also needs to be in numerical form, i.e., 1, 2, 3.... rather than 'forest' or 'urban'. Also, the attribute table should not contain any missing values because this causes an erroneous mRMR result.

The algorithm outputs a tab-separated list of attributes, ranked by the most important feature first. The method parameter allows a choice between the Maximum Information Difference (MID) and Mutual Information Quotient (MIQ) feature evaluation criteria, which respectively represent the relevancy and redundancy of the features. The algorithm also shows the ranking of the features based on the conventional maximum relevance method. Additional user options include nfeatures which specifies the number of features that you want to select; nsamples limits the maximum number of samples to base the feature selection, and maxvar limits the maximum number of attributes, both of which can therefore reduce the computation for very large datasets. threshold is the discretization threshold to apply to the continuous variable data, i.e., mean +/- threshold * standard deviation. layer is the attribute layer to be used in the feature selection process.

EXAMPLE

v.mrmr.py vector=vector_layer layer=1 thres=1.0 nfeatures=50 \
	  nsamples=10000 maxvar=10000 method=MID

REFERENCES

Peng, H.; Fulmi Long; Ding, C., "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," in Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.27, no.8, pp.1226-1238, Aug. 2005

AUTHOR

Steven Pawley

Last changed: $Date: 2016-02-20 14:37:38 +0100 (Sat, 20 Feb 2016) $

SOURCE CODE

Available at: v.mrmr source code (history)