Skip to content

m.csv.clean

Creates a cleaned-up copy a CSV files

Creates CSV files which are ready to used in GRASS GIS

m.csv.clean input=name separator=character output=name prefix=string [recognized_date=string [,string,...]] [clean_date=string] missing_names=string [cell_clean=string [,string,...]] [--overwrite] [--verbose] [--quiet] [--qq] [--ui]

Example:

m.csv.clean input=name separator=comma output=name prefix=col_ missing_names=column

grass.script.run_command("m.csv.clean", input, separator="comma", output, prefix="col_", recognized_date=None, clean_date="date_%Y-%m-%d", missing_names="column", cell_clean="strip_whitespace,collapse_whitespace", overwrite=False, verbose=False, quiet=False, superquiet=False)

Example:

gs.run_command("m.csv.clean", input="name", separator="comma", output="name", prefix="col_", missing_names="column")

Parameters

input=name [required]
    Input CSV file to clean up
    Name of input file
separator=character [required]
    Field separator
    Special characters: pipe, comma, space, tab, newline
    Default: comma
output=name [required]
    Clean CSV output file
    Name for output file
prefix=string [required]
    Prefix for columns which don't start with a letter
    Prefix itself must start with a letter of English alphabeth
    Default: col_
recognized_date=string [,string,...]
    Recognized date formats (e.g., %m/%d/%y)
    For example, %m/%d/%Y,%m/%d/%y matches 7/30/2021 and 7/30/21
clean_date=string
    Format for new clean-up date
    For example, %Y-%m-%d for 2021-07-30
    Default: date_%Y-%m-%d
missing_names=string [required]
    Names for the columns without a name in the header
    If only one is provided, but more than one is need, underscore and column number is added
    Default: column
cell_clean=string [,string,...]
    Operations to apply to non-header cells in the body of the document
    If only one is provided, but more than one is need, underscore and column number is added
    Allowed values: strip_whitespace, collapse_whitespace, date_format, none
    Default: strip_whitespace,collapse_whitespace
--overwrite
    Allow output files to overwrite existing files
--help
    Print usage summary
--verbose
    Verbose module output
--quiet
    Quiet module output
--qq
    Very quiet module output
--ui
    Force launching GUI dialog

input : str, required
    Input CSV file to clean up
    Name of input file
    Used as: input, file, name
separator : str, required
    Field separator
    Special characters: pipe, comma, space, tab, newline
    Used as: input, separator, character
    Default: comma
output : str, required
    Clean CSV output file
    Name for output file
    Used as: output, file, name
prefix : str, required
    Prefix for columns which don't start with a letter
    Prefix itself must start with a letter of English alphabeth
    Default: col_
recognized_date : str | list[str], optional
    Recognized date formats (e.g., %m/%d/%y)
    For example, %m/%d/%Y,%m/%d/%y matches 7/30/2021 and 7/30/21
clean_date : str, optional
    Format for new clean-up date
    For example, %Y-%m-%d for 2021-07-30
    Default: date_%Y-%m-%d
missing_names : str, required
    Names for the columns without a name in the header
    If only one is provided, but more than one is need, underscore and column number is added
    Default: column
cell_clean : str | list[str], optional
    Operations to apply to non-header cells in the body of the document
    If only one is provided, but more than one is need, underscore and column number is added
    Allowed values: strip_whitespace, collapse_whitespace, date_format, none
    Default: strip_whitespace,collapse_whitespace
overwrite: bool, optional
    Allow output files to overwrite existing files
    Default: False
verbose: bool, optional
    Verbose module output
    Default: False
quiet: bool, optional
    Quiet module output
    Default: False
superquiet: bool, optional
    Very quiet module output
    Default: False

DESCRIPTION

m.csv.clean reads a CSV (Comma Separated Value) file, cleans it, and writes a new CSV file. The separator for CSV is comma (,) by default, but it can be set to any single character such as semicolon (;), pipe (|), or tabulator.

NOTES

Originally, the name for this module was supposed to be m.csv.polish and the module was to be accompanied with module named m.csv.czech for checking the state of the CSV.

EXAMPLES

In GRASS GIS shell

The following would apply all the default fixes to the the file sampling_sites_raw.csv and output a cleaned file sampling_sites.csv:

m.csv.clean input=sampling_sites_raw.csv output=sampling_sites.csv

In any shell

The module is not using any information from the current location and mapset, so it is very easy to run it with an adhoc temporary location by executing a grass --exec command:

grass --tmp-project XY --exec m.csv.clean input=sampling_sites_raw.csv output=sampling_sites.csv

SEE ALSO

  • v.in.csv for an addon module for importing CSV as vector points with coordinate transformation,
  • v.in.ascii for importing CSV as vector points with different approach,
  • v.in.ogr for an alternative CSV import using GDAL/OGR.

AUTHOR

Vaclav Petras, NCSU Center for Geospatial Analytics

SOURCE CODE

Available at: m.csv.clean source code (history)
Latest change: Friday Feb 21 10:10:05 2025 in commit 7d78fe3