# data_ingestion/sentinel2/improve_cloud_mask

Improves cloud masks by merging the product cloud mask with cloud and shadow masks computed by machine learning segmentation models. This workflow computes cloud and shadow probabilities using segmentation models, thresholds them, and merges the models' masks with the product mask.

```{mermaid}
    graph TD
    inp1>s2_raster]
    inp2>product_mask]
    out1>mask]
    tsk1{{cloud}}
    tsk2{{shadow}}
    tsk3{{merge}}
    tsk1{{cloud}} -- cloud_probability --> tsk3{{merge}}
    tsk2{{shadow}} -- shadow_probability --> tsk3{{merge}}
    inp1>s2_raster] -- sentinel_raster --> tsk1{{cloud}}
    inp1>s2_raster] -- sentinel_raster --> tsk2{{shadow}}
    inp2>product_mask] -- product_mask --> tsk3{{merge}}
    tsk3{{merge}} -- merged_cloud_mask --> out1>mask]
```

## Sources

- **s2_raster**: Sentinel-2 L2A raster.

- **product_mask**: Cloud mask obtained from the product's quality indicators.

## Sinks

- **mask**: Improved cloud mask.

## Parameters

- **cloud_thr**: Confidence threshold to assign a pixel as cloud.

- **shadow_thr**: Confidence threshold to assign a pixel as shadow.

- **in_memory**: Whether to load the whole raster in memory when running predictions. Uses more memory (~4GB/worker) but speeds up inference for fast models.

- **cloud_model**: ONNX file for the cloud model. Available models are 'cloud_model{idx}_cpu.onnx' with idx ∈ {1, 2} being FPN-based models, which are more accurate but slower, and idx ∈ {3, 4, 5} being cheaplab models, which are less accurate but faster.

- **shadow_model**: ONNX file for the shadow model. 'shadow.onnx' is the only currently available model.

## Tasks

- **cloud**: Computes cloud probabilities using a convolutional segmentation model for L2A.

- **shadow**: Computes shadow probabilities using a convolutional segmentation model for L2A.

- **merge**: Merges cloud, shadow and product cloud masks into a single mask.

## Workflow Yaml

```yaml

name: improve_cloud_mask
sources:
  s2_raster:
  - cloud.sentinel_raster
  - shadow.sentinel_raster
  product_mask:
  - merge.product_mask
sinks:
  mask: merge.merged_cloud_mask
parameters:
  cloud_thr: null
  shadow_thr: null
  in_memory: null
  cloud_model: null
  shadow_model: null
tasks:
  cloud:
    op: compute_cloud_prob
    parameters:
      in_memory: '@from(in_memory)'
      model_path: '@from(cloud_model)'
  shadow:
    op: compute_shadow_prob
    parameters:
      in_memory: '@from(in_memory)'
      model_path: '@from(shadow_model)'
  merge:
    op: merge_cloud_masks_simple
    op_dir: merge_cloud_masks
    parameters:
      cloud_prob_threshold: '@from(cloud_thr)'
      shadow_prob_threshold: '@from(shadow_thr)'
edges:
- origin: cloud.cloud_probability
  destination:
  - merge.cloud_probability
- origin: shadow.shadow_probability
  destination:
  - merge.shadow_probability
description:
  short_description: Improves cloud masks by merging the product cloud mask with cloud
    and shadow masks computed by machine learning segmentation models.
  long_description: This workflow computes cloud and shadow probabilities using segmentation
    models, thresholds them, and merges the models' masks with the product mask.
  sources:
    s2_raster: Sentinel-2 L2A raster.
    product_mask: Cloud mask obtained from the product's quality indicators.
  sinks:
    mask: Improved cloud mask.
  parameters:
    cloud_thr: Confidence threshold to assign a pixel as cloud.
    shadow_thr: Confidence threshold to assign a pixel as shadow.
    in_memory: Whether to load the whole raster in memory when running predictions.
      Uses more memory (~4GB/worker) but speeds up inference for fast models.
    cloud_model: "ONNX file for the cloud model. Available models are 'cloud_model{idx}_cpu.onnx'\
      \ with idx \u2208 {1, 2} being FPN-based models, which are more accurate but\
      \ slower, and idx \u2208 {3, 4, 5} being cheaplab models, which are less accurate\
      \ but faster."
    shadow_model: ONNX file for the shadow model. 'shadow.onnx' is the only currently
      available model.


```