Raster Data

In this section, we revisit the gdalworkshop/world.tif dataset.

dataset = AG.read("gdalworkshop/world.tif")
GDAL Dataset (Driver: GTiff/GeoTIFF)
File(s): 

Dataset (width x height): 2048 x 1024 (pixels)
Number of raster bands: 3
  [GA_ReadOnly] Band 1 (Red): 2048 x 1024 (UInt8)
  [GA_ReadOnly] Band 2 (Green): 2048 x 1024 (UInt8)
  [GA_ReadOnly] Band 3 (Blue): 2048 x 1024 (UInt8)

A description of the display is available in Raster Datasets.

Raster Bands

We can examine an individual raster band

band = ArchGDAL.getband(dataset, 1)
[GA_ReadOnly] Band 1 (Red): 2048 x 1024 (UInt8)
    blocksize: 256×256, nodata: nothing, units: 1.0px + 0.0
    overviews: (0) 1024x512 (1) 512x256 (2) 256x128 
               (3) 128x64 (4) 64x32 (5) 32x16 
               (6) 16x8 

You can programmatically retrieve the information in the header using

You can get additional attribute information using

Note

GDAL contains a concept of the natural block size of rasters so that applications can organized data access efficiently for some file formats. The natural block size is the block size that is most efficient for accessing the format. For many formats this is simple a whole scanline. However, for tiled images this will typically be the tile size.

Finally, you can obtain overviews:

ArchGDAL.getoverview(band, 2)
[GA_ReadOnly] Band 1 (Red): 256 x 128 (UInt8)
    blocksize: 128×128, nodata: nothing, units: 1.0px + 0.0
    overviews: 

Raster I/O

Reading Raster Values

The general operative method for reading in raster values from a dataset or band is to use ArchGDAL.read().

  • ArchGDAL.read(dataset): reads the entire dataset as a single multidimensional array.
  • ArchGDAL.read(dataset, indices): reads the raster bands at the indices (in that order) into a multidimensional array.
  • ArchGDAL.read(dataset, i): reads the i-th raster band into an array.
  • ArchGDAL.read(band): reads the raster band into an array.
Note

The array returned by read has (cols, rows, bands) dimensions.

To convert to a format used by the Images.jl ecosystem, you can either create a view using PermutedDimsArray(A, (3,2,1)) or create a permuted copy using permutedims(A, (3,2,1)). The resulting arrays will have (bands, rows, cols) dimensions.

You can also specify the subset of rows and columns (provided as UnitRanges like 2:9) to read:

  • ArchGDAL.read(dataset, indices, rows, cols)
  • ArchGDAL.read(dataset, i, rows, cols)
  • ArchGDAL.read(band, rows, cols)

On other occasions, it might be easier to first specify a position (xoffset,yoffset) to read from, and the size (xsize, ysize) of the window to read:

  • ArchGDAL.read(dataset, indices, xoffset, yoffset, xsize, ysize)
  • ArchGDAL.read(dataset, i, xoffset, yoffset, xsize, ysize)
  • ArchGDAL.read(band, xoffset, yoffset, xsize, ysize)

You might have an existing buffer that you wish to read the values into. In such cases, the general API for doing so is to write ArchGDAL.read!(source, buffer, args...) instead of ArchGDAL.read(source, args...).

Writing Raster Values

For writing values from a buffer to a raster dataset or band, the following methods are available:

  • ArchGDAL.write!(band, buffer)
  • ArchGDAL.write!(band, buffer, rows, cols)
  • ArchGDAL.write!(band, buffer, xoffset, yoffset, xsize, ysize)
  • ArchGDAL.write!(dataset, buffer, i)
  • ArchGDAL.write!(dataset, buffer, i, rows, cols)
  • ArchGDAL.write!(dataset, buffer, i, xoffset, yoffset, xsize, ysize)
  • ArchGDAL.write!(dataset, buffer, indices)
  • ArchGDAL.write!(dataset, buffer, indices, rows, cols)
  • ArchGDAL.write!(dataset, buffer, indices, xoffset, yoffset, xsize, ysize)
Note

ArchGDAL expects the dimensions of the buffer to be (cols, rows, bands) or (cols, rows).

Windowed Reads and Writes

Following the description in mapbox/rasterio's documentation, a window is a view onto a rectangular subset of a raster dataset. This is useful when you want to work on rasters that are larger than your computers RAM or process chunks of large rasters in parallel.

For that purpose, we have a method called ArchGDAL.windows(band) which iterates over the windows of a raster band, returning the indices corresponding to the rasterblocks within that raster band for efficiency:

using Base.Iterators: take  # to prevent showing all blocks
windows = ArchGDAL.windows(band)

for (cols, rows) in take(windows, 5)
    @info "Window" cols rows
end
┌ Info: Window
│   cols = 1:256
└   rows = 1:256
┌ Info: Window
│   cols = 1:256
└   rows = 257:512
┌ Info: Window
│   cols = 1:256
└   rows = 513:768
┌ Info: Window
│   cols = 1:256
└   rows = 769:1024
┌ Info: Window
│   cols = 257:512
└   rows = 1:256

Alternatively, we have another method called ArchGDAL.blocks(band) which iterates over the windows of a raster band, returning the offset and size corresponding to the rasterblocks within that raster band for efficiency:

blocks = ArchGDAL.blocks(band)
for (xyoffset, xysize) in take(blocks, 5)
    @info "Window offset" xyoffset xysize
end
┌ Info: Window offset
│   xyoffset = (0, 0)
└   xysize = (256, 256)
┌ Info: Window offset
│   xyoffset = (1, 0)
└   xysize = (256, 256)
┌ Info: Window offset
│   xyoffset = (2, 0)
└   xysize = (256, 256)
┌ Info: Window offset
│   xyoffset = (3, 0)
└   xysize = (256, 256)
┌ Info: Window offset
│   xyoffset = (0, 1)
└   xysize = (256, 256)
Note

These methods are often used for reading/writing a block of image data efficiently, as it accesses "natural" blocks from the raster band without resampling, or data type conversion.

Using the DiskArray interface

Raster bands as 2D DiskArrays

As of ArchGDAL version 0.5.0 and higher a RasterBand is a subtype of AbstractDiskArray from the DiskArrays.jl package. This means that a RasterBand is also an AbstractArray and can therefore be treated like any Julia array. This means that square bracket indexing works in addition to the read methods described above.

band[1000:1010,300:310]
11×11 Matrix{UInt8}:
 0x87  0x8a  0x7b  0x5f  0x32  0x08  0x11  0x11  0x0d  0x0e  0x09
 0x8c  0x96  0x70  0x70  0x32  0x10  0x0e  0x0e  0x0a  0x11  0x0a
 0x88  0xa5  0x87  0x6f  0x25  0x10  0x0f  0x10  0x0a  0x10  0x07
 0x80  0x9c  0x87  0x60  0x1c  0x16  0x10  0x0b  0x03  0x0c  0x0f
 0x7a  0x7e  0x6e  0x57  0x2d  0x12  0x0f  0x0c  0x05  0x0e  0x0f
 0x80  0x6e  0x5e  0x5b  0x3d  0x0a  0x0e  0x13  0x0f  0x13  0x07
 0x8f  0x70  0x58  0x5a  0x26  0x0f  0x10  0x11  0x0a  0x11  0x0f
 0x96  0x73  0x56  0x63  0x1b  0x13  0x10  0x0d  0x03  0x0e  0x18
 0x91  0x71  0x5a  0x80  0x35  0x0b  0x0e  0x13  0x0b  0x10  0x0f
 0x87  0x69  0x67  0x84  0x61  0x00  0x0f  0x0c  0x04  0x0c  0x11
 0x88  0x6a  0x72  0x70  0x1a  0x0e  0x0f  0x0c  0x09  0x06  0x06

Also, windowed reading of the data can alternatively be done through the DiskArrays interface:

using DiskArrays: eachchunk
# take only the first 3 chunks to decrease output
using Base.Iterators: take
for (rows, cols) in take(eachchunk(band), 3)
    @info "window" rows cols
end
┌ Info: window
│   rows = CartesianIndex(1, 1)
└   cols = CartesianIndex(2, 1)
┌ Info: window
│   rows = CartesianIndex(257, 1)
└   cols = CartesianIndex(258, 1)
┌ Info: window
│   rows = CartesianIndex(513, 1)
└   cols = CartesianIndex(514, 1)

This code is similar to the window function mentioned in Windowed Reads and Writes but more portable because the raster band can be exchanged with any other type implementing the DiskArrays interface. Also, for many operations it will not be necessary anymore to implement the window loop, since the DiskArrays package provides efficient implementations for reductions and lazy broadcasting, so that for example operations like:

sum(sqrt.(band), dims=1)
1×1024 Matrix{Float64}:
 7094.48  7094.48  7094.48  7094.48  …  30197.9  30368.0  30295.8  29723.7

will read the data block by block allocating only the amount of memory in the order of the size of a single raster block. See the DiskArrays README for more information on DiskArrays.jl

The RasterDataset type

Many raster datasets that contain multiple bands of the same size and data type can also be abstracted as a 3D array where the last dimension represents the band index. In order to open a raster dataset in a way that it is represented as a 3D AbstractArray there is the readraster funtion. It returns a RasterDataset which is a thin wrapper around a Dataset but it is a subtype of AbstractDiskArray{T,3} and therefore part of the array hierarchy.

This means that data can be accessed with the square-bracket syntax

dataset = AG.readraster("gdalworkshop/world.tif")
dataset[1000,300,:]
3-element Vector{UInt8}:
 0x87
 0x78
 0x41

and broadcasting, views and reductions are provided by the DiskArrays package. In addition, many ArchGDAL functions like (getband, nraster, getgeotransform, etc) are delegated to the wrapped Dataset and work for RasterDatasets as well.