NAME
r.scatterplot - Creates a scatter plot of raster maps
Creates a scatter plot of two or more raster maps as a vector map
KEYWORDS
raster,
statistics,
diagram,
correlation,
scatter plot,
vector
SYNOPSIS
r.scatterplot
r.scatterplot --help
r.scatterplot [-wfsub] input=name[,name,...] output=name [z_raster=name] [color_raster=name] [xscale=float] [yscale=float] [zscale=float] [position=east,north] [spacing=float] [vector_mask=name] [mask_layer=string] [mask_cats=range] [mask_where=sql_query] [--overwrite] [--help] [--verbose] [--quiet] [--ui]
Flags:
- -w
- Place into the current region south-west corner
- The output coordinates will not represent the original values
- -f
- Automatically offset each scatter plot
- The output coordinates will not represent the original values
- -s
- Put points into a single layer
- Even with multiple rasters, put all points into a single layer
- -u
- Invert mask
- -b
- Do not build topology
- Advantageous when handling a large number of points
- --overwrite
- Allow output files to overwrite existing files
- --help
- Print usage summary
- --verbose
- Verbose module output
- --quiet
- Quiet module output
- --ui
- Force launching GUI dialog
Parameters:
- input=name[,name,...] [required]
- Name of input raster map(s)
- output=name [required]
- Name for output vector map
- z_raster=name
- Name of input raster map to define Z coordinates
- color_raster=name
- Name of input raster map to define category and color
- xscale=float
- Scale to apply to X axis
- Default: 1.0
- yscale=float
- Scale to apply to Y axis
- Default: 1.0
- zscale=float
- Scale to apply to Z axis
- Default: 1.0
- position=east,north
- Place to the given coordinates
- The output coordinates will not represent the original values
- spacing=float
- Spacing between scatter plots
- Applied when automatic offset is used
- vector_mask=name
- Areas to use in the scatter plots
- Name of vector map with areas from where the scatter plot should be generated
- mask_layer=string
- Layer number or name for vector mask
- Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
- Default: 1
- mask_cats=range
- Category values for vector mask
- Example: 1,3,7-9,13
- mask_where=sql_query
- WHERE conditions for the vector mask
- Example: income < 1000 and population >= 10000
The
r.scatterplot module takes raster maps and creates
a scatter plot which is a vector map and where individual points in
the scatter plot are vector points. As with any scatter plot the X
coordinates of the points represent values from the first raster map
and the Y coordinates represent values from the second raster map.
Consequently, the vector map is placed in the combined value space of
the original raster maps and its geographic position should be ignored.
Typically, it is necessary to zoom or to change computational in order
to view the scatter plot or to perform further computations on the result.
With the default settings, the r.scatterplot output allows
measuring and querying of the values in the scatter plot. Settings
such as xscale or position option change the coordinates
and make some of the measurements wrong.
If more than two raster maps are provided to the
input option,
r.scatterplot creates a scatter plot for each unique pair
of input maps. For example, if A, B, C, and D are the inputs,
r.scatterplot creates scatter plots for A and B, A and C,
A and D, B and C, B and D, and finally C and D. Each pair is part of
different vector map layer.
r.scatterplot provides textual
output which specifies the pairs and associated layers.
A 3D scatter plot can be generated when the z_raster option is
provided. A third variable is added to each scatter plot and each point
has Z coordinate which represents this third variable.
Each point can also have a color based on an additional variable
based on the values from color_raster. Values from a raster are
stored as categories, i.e. floating point values are truncated to
integers, and a color table based on the input raster color table is
assigned to the vector map.
The z_raster and color_raster can be the same. This can help
with understanding the 3D scatter plot and makes the third variable
visible in 2D as well.
When z_raster and color_raster are the same, total of four
variables are associated with one point.
Figure: One scatter plot of two variables (left),
the same scatter plot but with color showing third variable (middle),
again the same scatter plot in 3D where Z represents a third variable (right).
Figure: One scatter plot in with one variable as Z coordinate and
another variable as color (two rotated views).
When working only with variable, X axis represents the first one
and Y axis the second one. With more than one variable, the individual
scatter plots for individual pairs of variables are at the same
place. In this case, the coordinates show the actual values of the
variables. Each scatter plot is placed into a separate layer of
the output vector map.
Figure: Three overlapping scatter plots of three variables A, B, and C.
Individual scatter plots are distinguished by color.
The colors can be obtained using d.vect layer=-1 -c.
If visualization is more important than preserving the actual values,
the -s flag can be used. This will place the scatter plots next
to each other separated by values provided using spacing option.
The layout options can be still combined with additional variables
represented as Z coordinate or color. In that case, Z coordinate
or color is same for all the scatter plots.
Figure: Three scatter plots of three variables A, B, and C.
First one is A and B, second A and C, and third B and C.
Figure: Three scatter plots of three variables A, B, and C
with color showing a fourth variable D in all scatter plots.
The options
xscale,
yscale and
zscale will cause
the values to be rescaled before they are stored as point coordinates.
This is useful for visualization when one of the variables has
significantly different range than the other or when the scatter plot
is shown with other data and must fit a certain area.
The
position option is used to place the scatter plot to any
given coordinates. Similarly,
-w flag can be used to place it
to the south-west corner of the computation region.
The resulting vector will have as many points as there is 3D raster
cells in the current computational region. It might be appropriate to
use coarser resolution for the scatter plot than for the other
computations. However, note that the some values will be skipped
which may lead, e.g. to missing some outliers.
The color_raster input is expected to be categorical raster
or have values which won't loose anything when converted from floating
point to integer. This is because vector categories are used to store
the color_raster values and carry association with the color.
The visualization of the output vector map has potentially the same
issue as visualization of any vector with many points. The points
cover each other and above certain density of points, it is not possible
to compare relative density in the scatter plot. Furthermore, if colors
are associated with the points, the colors of points rendered last are
those which are visible, not actually showing the prevailing color
(value). The modules v.mkgrid and
v.vect.stats can be used to
overcome this issue.
In the full North Carolina sample location,
set the computation region to one of the raster maps:
g.region raster=lsat7_2002_30
Create the scatter plot:
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot color_raster=landclass96
Figure: Scatter plot showing red and near infrared Landsat bands
colored using land cover classes
In an ideal case, the scatter plot is computed with the computation region
resolution set to the resolutions of one of the rasters (which ideally
matches the other raster as well):
g.region raster=lsat7_2002_30 -p
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot_full_res
This best describes the actual state of the data, but unfortunately
this creates a lot of points which must be processed and rendered.
Therefore, it is also possible to compute the scatter plot in a lower
resolution by changing the computational region resolution:
g.region raster=lsat7_2002_30 res=120 -p
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot_res_120
Reducing the resolution creates a possibility of missing some outliers
or even smaller groups as some of the cells are just ignored, but
typically the general shape of the scatter plot is preserved.
In any case, with less points, every operation will by much faster.
Figure: Scatter plots computed with different computational region
resolutions; first one is with full raster resolution (~30 m)
second with resolution 120 m, and third with 240 m
Another way of dealing with hight density scatter plots
is to bin the points into cells of a rectangular grid.
Number of points per cells with influence color of the cell,
so the density will be expressed clearly.
The scatter plot can be computed in full resolution:
g.region raster=lsat7_2002_30 -p
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot
To create the grid the computation region extent should match
the scatter plot extent. The resolution determines the size of the grid
cells. 5 is a good size for data from 0 to 255.
g.region vector=scatterplot res=5 -p
The grid can be created using
v.mkgrid
module, the binning done using
v.vect.stats, and finally the
color is set using
v.colors.
v.mkgrid map=scatterplot_grid
v.vect.stats points=scatterplot areas=scatterplot_grid count_column=count
v.colors map=scatterplot_grid use=attr column=count color=viridis
The
d.vect module picks up the color table
automatically, but it is advantageous to also specify that only
the grid cells with non-zero count of points should be displayed
using
where="count > 0":
d.vect map=scatterplot_grid where="count > 0" icon=basic/point
To get more interesting and sometimes smoother look, hexagonal
grid can be used:
v.mkgrid -h map=scatterplot_grid
Alternatively, a smaller cell size can be used. When the cell size
is the same as the distance between the points, like for example
using cells size 1 with integer rasters, the grid needs to be shifted
so that the points fall into the middle of the cells rather than
on the edges or corners. For these purposes the
g.region accepts modifications of
the current extent values:
g.region vector=scatterplot res=1 w=w-0.5 e=e+0.5 s=s-0.5 n=n+0.5
Figure: High density scatter plot visualized using binning into
rectangular grid, hexagonal grid, and dense rectangular grid
r.stats,
d.correlate,
r3.scatterplot,
v.mkgrid,
v.vect.stats,
g.region
Vaclav Petras,
NCSU GeoForAll LabSOURCE CODE
Available at: r.scatterplot source code (history)
Main index |
Raster index |
Topics index |
Keywords index |
Graphical index |
Full index
© 2003-2018
GRASS Development Team,
GRASS GIS 7.4.1svn Reference Manual