GRASS GIS manual: v.class.mlpy

NAME

v.class.mlpy - Vector supervised classification tool which uses attributes as classification parametres (order of columns matters, names not), cat column identifies feature, class_column is excluded from classification parametres.

KEYWORDS

vector, classification, supervised, machine learning

SYNOPSIS

v.class.mlpy

v.class.mlpy --help

v.class.mlpy input=name training=name [class_column=string] [columns=string[,string,...]] [--help] [--verbose] [--quiet] [--ui]

Flags:

--help: Print usage summary
--verbose: Verbose module output
--quiet: Quiet module output
--ui: Force launching GUI dialog

Parameters:

input=name [required]: Name of vector map; Input vector map (attribut table required)
training=name [required]: Name of vector map; Training vector map (attribut table required)
class_column=string: Name of column containing class; Used for both input/output and training dataset. If column does not exists in input map attribute table, it will be created.; Default: class
columns=string[,string,...]: Columns to be used in classification; Columns to be used in classification. If left empty, all columns will be used for classification except for class_column and cat column.

DESCRIPTION
NOTES
EXAMPLE
SEE ALSO
REFERENCES
AUTHOR

DESCRIPTION

The v.class.mlpy module is a tool for supervised vector classification. It is built on top of the Python mlpy library [Albanese2012]. The classification is based on attribute values. The geometry is not taken into account, so the module does not depend on the feature types used in the map. The classification is supervised, so the training dataset is always required.

The attribute table of training map (dataset) has to contain a column with the class. Required type of class column is integer. Expected type of other columns is double or integer.

NOTES

This module requires the user to have mlpy library installed. However, this is not an issue because mlpy library is free and open source and can be quickly downloaded and installed. Furthermore, library is available for all major platforms supported by GRASS GIS. You find mlpy download and installation instructions at the official mlpy website (http://mlpy.sourceforge.net/).

EXAMPLE

This is an example in a North Carolina sample dataset. It uses several raster maps and generates (spatially) random vector data for classification from raster maps. The random data used as input to the classification and represent training dataset and dataset to be classified in the real use case.

Two sets of random points are generated containing 100 and 1000 points. Then, an attribute table is created for both maps and attributes are derived from digital values of raster maps (Landsat images) at points locations. These attribute table columns are input to the classification. The smaller dataset is used as training dataset. Classes are taken from the raster map which is a part of the sample dataset as an example result of some former classification. The number of classes in training dataset is 6.

# the example code uses unix-like syntax for continuation lines, for-loops,
# variables and assigning command outputs to variables

# generate random points to be used as an input
v.random output=points_unknown n=1000
v.db.addtable map=points_unknown

# generate random points to be used as a training dataset
v.random output=points_known n=100
v.db.addtable map=points_known

# fill attribute tables
MAPS=$(g.list type=rast pattern="lsat*" exclude="*87*" mapset=PERMANENT sep=" ")
let NUM=0
for MAP in $MAPS
do
let NUM++
    v.db.addcolumn map=points_unknown layer=1 columns="map_$NUM integer"
    v.db.addcolumn map=points_known layer=1 columns="map_$NUM integer"
    v.what.rast map=points_unknown layer=1 raster=$MAP column=map_$NUM
    v.what.rast map=points_known layer=1 raster=$MAP column=map_$NUM
done

# fill the class (category) column with correct values for training dataset
v.db.addcolumn map=points_known layer=1 columns="landclass integer"
v.what.rast map=points_known layer=1 raster=landclass96 column=landclass

# TODO: syntax in the setting of color tables is strange, fix example
# set color table
r.colors.out map=landclass96 rules=tmp_color_rules_file \
| v.colors map=points_known column=landclass layer=1 rules=tmp_color_rules_file
rm tmp_color_rules_file

# do the classification
v.class.mlpy input=points_unknown training=points_known class_column=landclass

# set color table
r.colors.out map=landclass96 rules=tmp_color_rules_file \
| v.colors map=points_unknown column=landclass layer=1 rules=tmp_color_rules_file
rm tmp_color_rules_file