m.csv.clean
Creates a cleaned-up copy a CSV files
Creates CSV files which are ready to used in GRASS GIS
m.csv.clean input=name separator=character output=name prefix=string [recognized_date=string [,string,...]] [clean_date=string] missing_names=string [cell_clean=string [,string,...]] [--overwrite] [--verbose] [--quiet] [--qq] [--ui]
Example:
m.csv.clean input=name separator=comma output=name prefix=col_ missing_names=column
grass.script.run_command("m.csv.clean", input, separator="comma", output, prefix="col_", recognized_date=None, clean_date="date_%Y-%m-%d", missing_names="column", cell_clean="strip_whitespace,collapse_whitespace", overwrite=False, verbose=False, quiet=False, superquiet=False)
Example:
gs.run_command("m.csv.clean", input="name", separator="comma", output="name", prefix="col_", missing_names="column")
Parameters
input=name [required]
Input CSV file to clean up
Name of input file
separator=character [required]
Field separator
Special characters: pipe, comma, space, tab, newline
Default: comma
output=name [required]
Clean CSV output file
Name for output file
prefix=string [required]
Prefix for columns which don't start with a letter
Prefix itself must start with a letter of English alphabeth
Default: col_
recognized_date=string [,string,...]
Recognized date formats (e.g., %m/%d/%y)
For example, %m/%d/%Y,%m/%d/%y matches 7/30/2021 and 7/30/21
clean_date=string
Format for new clean-up date
For example, %Y-%m-%d for 2021-07-30
Default: date_%Y-%m-%d
missing_names=string [required]
Names for the columns without a name in the header
If only one is provided, but more than one is need, underscore and column number is added
Default: column
cell_clean=string [,string,...]
Operations to apply to non-header cells in the body of the document
If only one is provided, but more than one is need, underscore and column number is added
Allowed values: strip_whitespace, collapse_whitespace, date_format, none
Default: strip_whitespace,collapse_whitespace
--overwrite
Allow output files to overwrite existing files
--help
Print usage summary
--verbose
Verbose module output
--quiet
Quiet module output
--qq
Very quiet module output
--ui
Force launching GUI dialog
input : str, required
Input CSV file to clean up
Name of input file
Used as: input, file, name
separator : str, required
Field separator
Special characters: pipe, comma, space, tab, newline
Used as: input, separator, character
Default: comma
output : str, required
Clean CSV output file
Name for output file
Used as: output, file, name
prefix : str, required
Prefix for columns which don't start with a letter
Prefix itself must start with a letter of English alphabeth
Default: col_
recognized_date : str | list[str], optional
Recognized date formats (e.g., %m/%d/%y)
For example, %m/%d/%Y,%m/%d/%y matches 7/30/2021 and 7/30/21
clean_date : str, optional
Format for new clean-up date
For example, %Y-%m-%d for 2021-07-30
Default: date_%Y-%m-%d
missing_names : str, required
Names for the columns without a name in the header
If only one is provided, but more than one is need, underscore and column number is added
Default: column
cell_clean : str | list[str], optional
Operations to apply to non-header cells in the body of the document
If only one is provided, but more than one is need, underscore and column number is added
Allowed values: strip_whitespace, collapse_whitespace, date_format, none
Default: strip_whitespace,collapse_whitespace
overwrite: bool, optional
Allow output files to overwrite existing files
Default: False
verbose: bool, optional
Verbose module output
Default: False
quiet: bool, optional
Quiet module output
Default: False
superquiet: bool, optional
Very quiet module output
Default: False
DESCRIPTION
m.csv.clean reads a CSV (Comma Separated Value) file, cleans it, and
writes a new CSV file. The separator for CSV is comma (,
) by default,
but it can be set to any single character such as semicolon (;
), pipe
(|
), or tabulator.
NOTES
Originally, the name for this module was supposed to be m.csv.polish and the module was to be accompanied with module named m.csv.czech for checking the state of the CSV.
EXAMPLES
In GRASS GIS shell
The following would apply all the default fixes to the the file
sampling_sites_raw.csv
and output a cleaned file sampling_sites.csv
:
m.csv.clean input=sampling_sites_raw.csv output=sampling_sites.csv
In any shell
The module is not using any information from the current location and
mapset, so it is very easy to run it with an adhoc temporary location by
executing a grass --exec
command:
grass --tmp-project XY --exec m.csv.clean input=sampling_sites_raw.csv output=sampling_sites.csv
SEE ALSO
- v.in.csv for an addon module for importing CSV as vector points with coordinate transformation,
- v.in.ascii for importing CSV as vector points with different approach,
- v.in.ogr for an alternative CSV import using GDAL/OGR.
AUTHOR
Vaclav Petras, NCSU Center for Geospatial Analytics
SOURCE CODE
Available at: m.csv.clean source code
(history)
Latest change: Friday Feb 21 10:10:05 2025 in commit 7d78fe3