STOAT (beta) My Annotations Annotations Scenarios Sources About FAQ Tutorial

Frequently Asked Questions (Version 1.0.0)

General

  1. What makes STOAT different from existing environmental annotation tools?
  2. What are spatial and temporal buffers and why do they matter?
  3. What is the difference between pre-annotation and custom annotation?
  4. What types of biodiversity data can I annotate?
  5. What environmental datasets are available on STOAT?
  6. How many records can I annotate? How long will annotation take?
  7. How many jobs can I run?
  8. I have an idea for a new feature! Can I get it added to STOAT?

Methods

  1. What biodiversity data are used for pre-annotated results?
  2. What are the criteria for an environmental layer’s inclusion in STOAT? Can I request a layer to be added/use my own layer?
  3. How does STOAT implement spatial and temporal buffers?
  4. How are the upper/lower limits to permitted spatiotemporal buffering decided?
  5. How were the spatial and temporal buffers selected for pre-annotations?
  6. How does STOAT efficiently carry out annotations?
  7. How is the histogram calculated?

Troubleshooting

  1. I’m getting errors from the rstoat package
  2. I’m having trouble uploading my dataset
  3. I can’t find my dataset in the dropdown
  4. My annotated occurrence records are out of sequence
  5. My annotation outputs are shorter than my original dataset
  6. I’m getting “no scenes found” for certain products
  7. I’m not getting notification emails about my jobs
  8. My question is not listed/my issue is still not resolved

General

What makes STOAT different from existing environmental annotation tools?

STOAT provides customizable spatial and temporal buffers, allowing for more versatile and scale-explicit characterizations of species occurrences than have been available from previous tools. STOAT additionally provides access to multiple high resolution datasets, including global Landsat data at 30m resolution, and MODIS data at daily temporal resolution. Finally, STOAT conducts all of its computations in the cloud using a highly optimized annotation workflow. STOAT provides annotations at speed, and obviates the need to download or interact with complex remote sensing or other environmental layers.

What are spatial and temporal buffers and why do they matter?

Biodiversity data (and environmental data) have particular scales (grain sizes, uncertainties) of observation, associated with the form of data collection. Due to the scale-dependent nature of many ecological patterns, these differences in observational scales can make comparisons across data sources difficult. Spatial and temporal buffers can be employed to alter the effective grain size of biodiversity data. Instead of retrieving only an environmental value associated with an exact coordinate, use of buffers allows a user to retrieve values neighboring in space and time. These can be averaged with the central point to effectively coarsen data grain. Buffers can also be used to account for locational uncertainty in occurrences, or match the scales of observational data with those of ecological processes. As such, they are a versatile tool that can be applied to various scale issues in the data fusion process.

What is the difference between pre-annotation and custom annotation?

Pre-annotations are annotations that have been completed by the STOAT team using publicly available species occurrence data (GBIF, eBird, Movebank). The results of these annotations are stored in STOAT for rapid visualizations (eventually downloads) by users. Custom annotations allow for users to annotate any uploaded dataset with user-specified layers and buffering parameters, though users have to wait for job completion. Unless otherwise specified, all references to annotation refer to the custom annotator.

What types of biodiversity data can I annotate?

STOAT currently only supports the annotation of occurrence records (species observations with a latitude, longitude, and date). STOAT aims to eventually support annotation for all types of biodiversity data. Support for geographic regions (e.g. surveys and inventories) will be added in the near future.

What environmental datasets are available on STOAT?

Please see https://mol.org/stoat-dev/sources for a comprehensive list.

How many records can I annotate? How long will annotation take?

Currently, there is a data size cap for uploads of 10,000 records, with annotation jobs capped at 1 hour of compute time. Computationally-intensive requests may not finish within the allotted time - see below. Up to 5 environmental layers can be annotated in a job. Static layers compute quickly, but annotating multiple dynamic layers can push a job over the compute time limit.

How many jobs can I run?

At present, users can have two jobs run in parallel, though more can be queued.

I have an idea for a new feature! Can I get it added to STOAT?

We welcome feedback and suggestions from users: please contact the STOAT team!

Methods

What biodiversity data are used for pre-annotated results?

Pre-annotations are conducted using GBIF, eBird, and public Movebank GPS tracking records.

What are the criteria for an environmental layer’s inclusion in STOAT? Can I request a layer to be added/use my own layer?

Any layer of a gridded nature is compatible with STOAT integration. The current list of layers are curated by the STOAT team and represent a range of broadly-used environmental products used across ecology. We welcome suggestions for further layers; please contact us. As adding a new layer represents a significant time investment by our developers, suggested layers should be broadly useful. At present, we cannot support addition of personal layers or layers with highly specialized uses.

How does STOAT implement spatial and temporal buffers?

STOAT implements spatial buffers as a radius around the coordinate of interest, within which all environmental values are averaged. More sophisticated aggregation methods (such as inverse distance weighting to the central point) are in the works. STOAT implements temporal buffers by retrieving layers from days prior to the record and averaging across values. Pixels that are partially within the buffered area are weighted by their proportion of overlap.

How are the upper/lower limits to permitted spatiotemporal buffering decided?

Lower limits are based on the finest possible resolution for each product. Upper limits were selected in two ways: 1) availability of the variable resolution from another product. For example, MODIS EVI is available at 250m resolution, leaving little reason to aggregate Landsat EVI to more than 250m. 2) computational limitations of the data fusion workflow (ever-evolving).

How were the spatial and temporal buffers selected for pre-annotations?

Pre-annotations were always conducted for the minimum buffer combinations that guaranteed data (e.g. 1-day for MODIS, 16-day for Landsat). Further pre-annotations were conducted with larger buffers that were deemed broadly useful (e.g. 30-day, 90-day) and which were within computational limits.

How does STOAT efficiently carry out annotations?

When datasets are large enough (e.g. pre-annotations), STOAT uses a clustering algorithm that groups records in space and time to minimize the number of times environmental layers must be retrieved. STOAT’s data partners (Descartes Labs and Google Earth Engine) provide access to high performance computing clusters and locally-hosted environmental layers for maximum throughput.

How is the histogram calculated?

STOAT compares the distribution of annotated biodiversity records for a specific environmental variable between a species and that species' family. For each annotated value, we compute the counts and normalize it for a given event. On the histogram plot, the orange represents the species' family, blue represents the species and grey represents what values are overlapped with one another

Troubleshooting

I’m getting errors from the rstoat package

Please first read the R documentation for the function causing the error to check for possible and check that inputs are as specified. If you are still having problems, direct issues to https://github.com/MapofLife/rstoat/issues.

I’m having trouble uploading my dataset

Please check that column format matches that required by the Map of Life uploader; a sample dataset can be found at this link. Do also note that there is a dataset size limit of 10,000 records. Annotations larger than this cap cannot be conducted automatically and should be either broken up, or annotated with assistance from STOAT team.

I can’t find my dataset in the dropdown

Note that currently, only occurrence datasets can be annotated via STOAT, though inventory datasets are accepted by the uploader. If an occurrence dataset has been successfully uploaded but cannot be annotated, contact the STOAT team.

My annotated occurrence records are out of sequence

Records are broken up into groups during the annotation process for efficiency, mixing up ordering. If order is important, a user should be able to merge annotated values back into their ordered dataset using the associated spatiotemporal coordinates.

My annotation outputs are shorter than my original dataset

Note that STOAT collapses datasets to only their unique occurrences for efficient annotations, so the list of unique events returned may be shorter than the input dataset. Secondly, STOAT has a hard time limit for jobs (1 hour at present) to allow for more concurrent users. Computationally intensive jobs (such as annotating against dynamic layers using large spatial and temporal buffers) may exceed the time limit and will return whatever values have been generated to that point. If a timeout occurs, output CSVs are labelled as having timed out in their name. To improve computation times, users can try annotating against fewer layers in a single job or decreasing buffer sizes.

I’m getting “no scenes found” for certain products

Please check that the spatiotemporal coordinate for annotation is within the temporal range of the product, which can be viewed at https://mol.org/stoat-dev/sources. For annotations against SRTM, please also check against the spatial extent of the product (valid 60°N to 56°S); SRTM is the only non-global layer we provide at this time. Please also note that Landsat annotations have a sampling interval of 16 days. If the temporal buffer is not large enough to span the sampling interval, there is a chance that scenes may not exist for the selected date.

I’m not getting notification emails about my jobs

Please check your spam folder. An email is sent both upon submission and completion of an annotation job, to the address linked to your Map of Life account.

My question is not listed/my issue is still not resolved!

Please direct issues with the rstoat package to https://github.com/MapofLife/rstoat/issues. For issues with the rstoat web application, please contact the STOAT team.