Skip to content

r.scatterplot

Creates a scatter plot of raster maps

Creates a scatter plot of two or more raster maps as a vector map

r.scatterplot [-wfsub] input=name [,name,...] output=name [z_raster=name] [color_raster=name] [xscale=float] [yscale=float] [zscale=float] [position=east,north] [spacing=float] [vector_mask=name] [mask_layer=string] [mask_cats=range] [mask_where=sql_query] [--overwrite] [--verbose] [--quiet] [--qq] [--ui]

Example:

r.scatterplot input=name output=name

grass.script.run_command("r.scatterplot", input, output, z_raster=None, color_raster=None, xscale=1.0, yscale=1.0, zscale=1.0, position=None, spacing=None, vector_mask=None, mask_layer="1", mask_cats=None, mask_where=None, flags=None, overwrite=False, verbose=False, quiet=False, superquiet=False)

Example:

gs.run_command("r.scatterplot", input="name", output="name")

Parameters

input=name [,name,...] [required]
    Name of input raster map(s)
output=name [required]
    Name for output vector map
z_raster=name
    Name of input raster map to define Z coordinates
color_raster=name
    Name of input raster map to define category and color
xscale=float
    Scale to apply to X axis
    Default: 1.0
yscale=float
    Scale to apply to Y axis
    Default: 1.0
zscale=float
    Scale to apply to Z axis
    Default: 1.0
position=east,north
    Place to the given coordinates
    The output coordinates will not represent the original values
spacing=float
    Spacing between scatter plots
    Applied when automatic offset is used
vector_mask=name
    Areas to use in the scatter plots
    Name of vector map with areas from where the scatter plot should be generated
mask_layer=string
    Layer number or name for vector mask
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Default: 1
mask_cats=range
    Category values for vector mask
    Example: 1,3,7-9,13
mask_where=sql_query
    WHERE conditions for the vector mask
    Example: income < 1000 and population >= 10000
-w
    Place into the current region south-west corner
    The output coordinates will not represent the original values
-f
    Automatically offset each scatter plot
    The output coordinates will not represent the original values
-s
    Put points into a single layer
    Even with multiple rasters, put all points into a single layer
-u
    Invert mask
-b
    Do not build topology
    Advantageous when handling a large number of points
--overwrite
    Allow output files to overwrite existing files
--help
    Print usage summary
--verbose
    Verbose module output
--quiet
    Quiet module output
--qq
    Very quiet module output
--ui
    Force launching GUI dialog

input : str | list[str], required
    Name of input raster map(s)
    Used as: input, raster, name
output : str, required
    Name for output vector map
    Used as: output, vector, name
z_raster : str, optional
    Name of input raster map to define Z coordinates
    Used as: input, raster, name
color_raster : str, optional
    Name of input raster map to define category and color
    Used as: input, raster, name
xscale : float, optional
    Scale to apply to X axis
    Default: 1.0
yscale : float, optional
    Scale to apply to Y axis
    Default: 1.0
zscale : float, optional
    Scale to apply to Z axis
    Default: 1.0
position : tuple[float, float] | list[float] | str, optional
    Place to the given coordinates
    The output coordinates will not represent the original values
    Used as: input, coords, east,north
spacing : float, optional
    Spacing between scatter plots
    Applied when automatic offset is used
vector_mask : str, optional
    Areas to use in the scatter plots
    Name of vector map with areas from where the scatter plot should be generated
    Used as: input, vector, name
mask_layer : str, optional
    Layer number or name for vector mask
    Vector features can have category values in different layers. This number determines which layer to use. When used with direct OGR access this is the layer name.
    Used as: input, layer
    Default: 1
mask_cats : str, optional
    Category values for vector mask
    Example: 1,3,7-9,13
    Used as: input, cats, range
mask_where : str, optional
    WHERE conditions for the vector mask
    Example: income < 1000 and population >= 10000
    Used as: input, sql_query, sql_query
flags : str, optional
    Allowed values: w, f, s, u, b
    w
        Place into the current region south-west corner
        The output coordinates will not represent the original values
    f
        Automatically offset each scatter plot
        The output coordinates will not represent the original values
    s
        Put points into a single layer
        Even with multiple rasters, put all points into a single layer
    u
        Invert mask
    b
        Do not build topology
        Advantageous when handling a large number of points
overwrite: bool, optional
    Allow output files to overwrite existing files
    Default: False
verbose: bool, optional
    Verbose module output
    Default: False
quiet: bool, optional
    Quiet module output
    Default: False
superquiet: bool, optional
    Very quiet module output
    Default: False

DESCRIPTION

The r.scatterplot module takes raster maps and creates a scatter plot which is a vector map and where individual points in the scatter plot are vector points. As with any scatter plot the X coordinates of the points represent values from the first raster map and the Y coordinates represent values from the second raster map. Consequently, the vector map is placed in the combined value space of the original raster maps and its geographic position should be ignored. Typically, it is necessary to zoom or to change computational in order to view the scatter plot or to perform further computations on the result.

With the default settings, the r.scatterplot output allows measuring and querying of the values in the scatter plot. Settings such as xscale or position option change the coordinates and make some of the measurements wrong.

Multiple variables

If more than two raster maps are provided to the input option, r.scatterplot creates a scatter plot for each unique pair of input maps. For example, if A, B, C, and D are the inputs, r.scatterplot creates scatter plots for A and B, A and C, A and D, B and C, B and D, and finally C and D. Each pair is part of different vector map layer. r.scatterplot provides textual output which specifies the pairs and associated layers.

A 3D scatter plot can be generated when the z_raster option is provided. A third variable is added to each scatter plot and each point has Z coordinate which represents this third variable.

Each point can also have a color based on an additional variable based on the values from color_raster. Values from a raster are stored as categories, i.e. floating point values are truncated to integers, and a color table based on the input raster color table is assigned to the vector map.

The z_raster and color_raster can be the same. This can help with understanding the 3D scatter plot and makes the third variable visible in 2D as well. When z_raster and color_raster are the same, total of four variables are associated with one point.

image-alt image-alt image-alt

Figure: One scatter plot of two variables (left), the same scatter plot but with color showing third variable (middle), again the same scatter plot in 3D where Z represents a third variable (right).

image-alt image-alt

Figure: One scatter plot in with one variable as Z coordinate and another variable as color (two rotated views).

Layout

When working only with variable, X axis represents the first one and Y axis the second one. With more than one variable, the individual scatter plots for individual pairs of variables are at the same place. In this case, the coordinates show the actual values of the variables. Each scatter plot is placed into a separate layer of the output vector map.

image-alt

Figure: Three overlapping scatter plots of three variables A, B, and C. Individual scatter plots are distinguished by color. The colors can be obtained using d.vect layer=-1 -c.

If visualization is more important than preserving the actual values, the -s flag can be used. This will place the scatter plots next to each other separated by values provided using spacing option.

The layout options can be still combined with additional variables represented as Z coordinate or color. In that case, Z coordinate or color is same for all the scatter plots.

image-alt

Figure: Three scatter plots of three variables A, B, and C. First one is A and B, second A and C, and third B and C.

image-alt

Figure: Three scatter plots of three variables A, B, and C with color showing a fourth variable D in all scatter plots.

The options xscale, yscale and zscale will cause the values to be rescaled before they are stored as point coordinates. This is useful for visualization when one of the variables has significantly different range than the other or when the scatter plot is shown with other data and must fit a certain area. The position option is used to place the scatter plot to any given coordinates. Similarly, -w flag can be used to place it to the south-west corner of the computation region.

NOTES

The resulting vector will have as many points as there is 3D raster cells in the current computational region. It might be appropriate to use coarser resolution for the scatter plot than for the other computations. However, note that the some values will be skipped which may lead, e.g. to missing some outliers.

The color_raster input is expected to be categorical raster or have values which won't loose anything when converted from floating point to integer. This is because vector categories are used to store the color_raster values and carry association with the color.

The visualization of the output vector map has potentially the same issue as visualization of any vector with many points. The points cover each other and above certain density of points, it is not possible to compare relative density in the scatter plot. Furthermore, if colors are associated with the points, the colors of points rendered last are those which are visible, not actually showing the prevailing color (value). The modules v.mkgrid and v.vect.stats can be used to overcome this issue.

EXAMPLES

Landsat bands

In the full North Carolina sample location, set the computation region to one of the raster maps:

g.region raster=lsat7_2002_30

Create the scatter plot:

r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot color_raster=landclass96

image-alt

Figure: Scatter plot showing red and near infrared Landsat bands colored using land cover classes

High density scatter plots

In an ideal case, the scatter plot is computed with the computation region resolution set to the resolutions of one of the rasters (which ideally matches the other raster as well):

g.region raster=lsat7_2002_30 -p
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot_full_res

This best describes the actual state of the data, but unfortunately this creates a lot of points which must be processed and rendered. Therefore, it is also possible to compute the scatter plot in a lower resolution by changing the computational region resolution:

g.region raster=lsat7_2002_30 res=120 -p
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot_res_120

Reducing the resolution creates a possibility of missing some outliers or even smaller groups as some of the cells are just ignored, but typically the general shape of the scatter plot is preserved. In any case, with less points, every operation will by much faster.

image-alt image-alt image-alt

Figure: Scatter plots computed with different computational region resolutions; first one is with full raster resolution (\~30 m) second with resolution 120 m, and third with 240 m

Another way of dealing with hight density scatter plots is to bin the points into cells of a rectangular grid. Number of points per cells with influence color of the cell, so the density will be expressed clearly. The scatter plot can be computed in full resolution:

g.region raster=lsat7_2002_30 -p
r.scatterplot input=lsat7_2002_30,lsat7_2002_40 output=scatterplot

To create the grid the computation region extent should match the scatter plot extent. The resolution determines the size of the grid cells. 5 is a good size for data from 0 to 255.

g.region vector=scatterplot res=5 -p

The grid can be created using v.mkgrid module, the binning done using v.vect.stats, and finally the color is set using v.colors.

v.mkgrid map=scatterplot_grid
v.vect.stats points=scatterplot areas=scatterplot_grid count_column=count
v.colors map=scatterplot_grid use=attr column=count color=viridis

The d.vect module picks up the color table automatically, but it is advantageous to also specify that only the grid cells with non-zero count of points should be displayed using where="count > 0":

d.vect map=scatterplot_grid where="count > 0" icon=basic/point

To get more interesting and sometimes smoother look, hexagonal grid can be used:

v.mkgrid -h map=scatterplot_grid

Alternatively, a smaller cell size can be used. When the cell size is the same as the distance between the points, like for example using cells size 1 with integer rasters, the grid needs to be shifted so that the points fall into the middle of the cells rather than on the edges or corners. For these purposes the g.region accepts modifications of the current extent values:

g.region vector=scatterplot res=1 w=w-0.5 e=e+0.5 s=s-0.5 n=n+0.5

image-alt image-alt image-alt

Figure: High density scatter plot visualized using binning into rectangular grid, hexagonal grid, and dense rectangular grid

SEE ALSO

r.stats, d.correlate, r3.scatterplot, v.mkgrid, v.vect.stats, g.region

AUTHOR

Vaclav Petras, NCSU GeoForAll Lab

SOURCE CODE

Available at: r.scatterplot source code (history)
Latest change: Thursday Feb 20 13:02:26 2025 in commit 53de819