Data Model

GDAL Datasets

The following code demonstrates the general workflow for reading in a dataset:

ArchGDAL.read(filename) do dataset
    # work with dataset
end

We defer the discussion on ArchGDAL.read(filename) to the section on Working with Files.

Vector Datasets

In this section, we work with the data/point.geojson dataset:

dataset = ArchGDAL.read("data/point.geojson")
GDAL Dataset (Driver: GeoJSON/GeoJSON)
File(s): 

Number of feature layers: 1
  Layer 0: point (wkbPoint)

The display indicates

  • the type of the object (GDAL Dataset)
  • the driver used to open it (shortname/longname: GeoJSON/GeoJSON)
  • the files that it corresponds to (data/point.geojson)
  • the number of layers in the dataset (1), and a brief summary of each.

You can also programmatically retrieve them using

For more on working with features and vector data, see the Section on Feature Data.

Raster Datasets

In this section, we work with the gdalworkshop/world.tif dataset:

dataset = ArchGDAL.read("gdalworkshop/world.tif")
GDAL Dataset (Driver: GTiff/GeoTIFF)
File(s): 

Dataset (width x height): 2048 x 1024 (pixels)
Number of raster bands: 3
  [GA_ReadOnly] Band 1 (Red): 2048 x 1024 (UInt8)
  [GA_ReadOnly] Band 2 (Green): 2048 x 1024 (UInt8)
  [GA_ReadOnly] Band 3 (Blue): 2048 x 1024 (UInt8)

The display indicates

  • the type of the object (GDAL Dataset)
  • the driver used to open it (shortname/longname: GTiff/GeoTIFF)
  • the files that it corresponds to (gdalworkshop/world.tif)
  • the number of raster bands in the dataset (3), and a brief summary of each.

You can also programmatically retrieve them using

For more on working with raster data, see the Section on Raster Data.

Working with Files

We provide the following methods for working with files:

  • ArchGDAL.copy: creates a copy of a dataset. This is often used with a virtual source dataset allowing configuration of band types, and other information without actually duplicating raster data.
  • ArchGDAL.create: creates a new dataset.
  • ArchGDAL.read: opens a dataset in read-only mode.
  • ArchGDAL.update: opens a dataset with the possibility of updating it. If you open a dataset object with update access, it is not recommended to open a new dataset on the same underlying file.

In GDAL, datasets are closed by calling GDAL.close(). This will result in proper cleanup, and flushing of any pending writes. Forgetting to call GDAL.close() on a dataset opened in update mode in a popular format like GTiff will likely result in being unable to open it afterwards.

In ArchGDAL, the closing of datasets is handled by the API and not by the user. ArchGDAL provides two methods for working with datasets.

The first is to use a do-block:

ArchGDAL.<copy/create/read/update>(...) do dataset
    # work with dataset
end

The second is to call the method directly:

dataset = ArchGDAL.<copy/create/read/update>(...)
# work with dataset
Note

This pattern of using do-blocks to manage context plays a big way into the way we handle memory in this package. For details, see the section on Memory Management.

The ArchGDAL.read method accepts keyword arguments(kwargs) viz. the GDAL open-options for reading .csv spatial datasets.

Example: In a CSV the data is stored as String.

dataset1 = ArchGDAL.read("data/multi_geom.csv")
layer1 = ArchGDAL.getlayer(dataset1, 0)
Layer: multi_geom
     Field 0 (id): [OFTString], 5.1, 5.2
     Field 1 (point): [OFTString], POINT (30 10), POINT (35 15)
     Field 2 (zoom): [OFTString], 1.0, 2.0
     Field 3 (linestring): [OFTString], LINESTRING (30 10, 1..., ...
     Field 4 (location): [OFTString], Mumbai, New Delhi

Well this is weird, the CSV driver recognises our point and linestring geometries as String. Now if you have a .csvt file of the same name with the geometry types as WKT, they types will be recognized, else, GDAL offers open-options to tweak the read parameters that are passed as kwargs.

So for the above CSV, we want the driver to detect our geometries, so according to open-options we should use the "GEOM_POSSIBLE_NAMES=point,linestring" option. Also we want that the geometry columns should not be kept as regular String columns, so we add a "KEEP_GEOM_COLUMNS=NO" option too.

dataset2 = ArchGDAL.read("data/multi_geom.csv", options = ["GEOM_POSSIBLE_NAMES=point,linestring", "KEEP_GEOM_COLUMNS=NO"])

layer2 = ArchGDAL.getlayer(dataset2, 0)
Layer: multi_geom
  Geometry 0 (point): [wkbUnknown]
  Geometry 1 (linestring): [wkbUnknown]
     Field 0 (id): [OFTString], 5.1, 5.2
     Field 1 (zoom): [OFTString], 1.0, 2.0
     Field 2 (location): [OFTString], Mumbai, New Delhi