# data_ingestion/sentinel2/preprocess_s2 Downloads and preprocesses Sentinel-2 imagery that covers the input geometry and time range. This workflow selects a minimum set of tiles that covers the input geometry, downloads Sentinel-2 imagery for the selected time range, and preprocesses it by generating a single multi-band raster at 10m resolution. ```{mermaid} graph TD inp1>user_input] out1>raster] out2>mask] tsk1{{list}} tsk2{{filter}} tsk3{{download}} tsk4{{group}} tsk5{{merge}} tsk1{{list}} -- sentinel_products/items --> tsk2{{filter}} tsk2{{filter}} -- filtered_items/sentinel_product --> tsk3{{download}} tsk3{{download}} -- raster/rasters --> tsk4{{group}} tsk3{{download}} -- cloud/masks --> tsk4{{group}} tsk4{{group}} -- raster_groups/raster_group --> tsk5{{merge}} tsk4{{group}} -- mask_groups/mask_group --> tsk5{{merge}} inp1>user_input] -- input_item --> tsk1{{list}} inp1>user_input] -- bounds_items --> tsk2{{filter}} tsk5{{merge}} -- output_raster --> out1>raster] tsk5{{merge}} -- output_mask --> out2>mask] ``` ## Sources - **user_input**: Time range and geometry of interest. ## Sinks - **raster**: Sentinel-2 L2A rasters with all bands resampled to 10m resolution. - **mask**: Cloud mask at 10m resolution from the product's quality indicators. ## Parameters - **min_tile_cover**: Minimum RoI coverage to consider a set of tiles sufficient. - **max_tiles_per_time**: Maximum number of tiles used to cover the RoI in each date. - **pc_key**: Optional Planetary Computer API key. - **dl_timeout**: Maximum time, in seconds, before a band reading operation times out. ## Tasks - **list**: Lists Sentinel-2 products that intersect with input geometry and time range. - **filter**: Select items necessary to spatially cover the geometry of the bounds items. - **download**: Downloads and preprocesses Sentinel-2 products. - **group**: Groups raster files representing the same tile and moment in time that might have been partially generated and split due to the movement of Sentinel-2 through base stations. - **merge**: Combines raster files grouped by group_sentinel2_orbits into a single raster. ## Workflow Yaml ```yaml name: preprocess_s2 sources: user_input: - list.input_item - filter.bounds_items sinks: raster: merge.output_raster mask: merge.output_mask parameters: min_tile_cover: null max_tiles_per_time: null pc_key: null dl_timeout: null tasks: list: op: list_sentinel2_products_pc op_dir: list_sentinel2_products filter: op: select_necessary_coverage_items parameters: min_cover: '@from(min_tile_cover)' max_items: '@from(max_tiles_per_time)' download: op: download_stack_sentinel2 parameters: api_key: '@from(pc_key)' timeout_s: '@from(dl_timeout)' group: op: group_sentinel2_orbits merge: op: merge_sentinel2_orbits edges: - origin: list.sentinel_products destination: - filter.items - origin: filter.filtered_items destination: - download.sentinel_product - origin: download.raster destination: - group.rasters - origin: download.cloud destination: - group.masks - origin: group.raster_groups destination: - merge.raster_group - origin: group.mask_groups destination: - merge.mask_group description: short_description: Downloads and preprocesses Sentinel-2 imagery that covers the input geometry and time range. long_description: This workflow selects a minimum set of tiles that covers the input geometry, downloads Sentinel-2 imagery for the selected time range, and preprocesses it by generating a single multi-band raster at 10m resolution. sources: user_input: Time range and geometry of interest. sinks: raster: Sentinel-2 L2A rasters with all bands resampled to 10m resolution. mask: Cloud mask at 10m resolution from the product's quality indicators. parameters: min_tile_cover: Minimum RoI coverage to consider a set of tiles sufficient. max_tiles_per_time: Maximum number of tiles used to cover the RoI in each date. pc_key: Optional Planetary Computer API key. ```