heatmap_sensor_admag

name: heatmap_sensor
sources:
  admag_input:
  - prescriptions.admag_input
  input_raster:
  - compute_index.raster
sinks:
  result: soil_sample_heatmap.result
parameters:
  base_url: null
  client_id: null
  client_secret: null
  authority: null
  default_scope: null
  attribute_name: null
  buffer: null
  index: null
  bins: null
  simplify: null
  tolerance: null
  data_scale: null
  distribute_output: null
  max_depth: null
  n_estimators: null
  random_state: null
tasks:
  prescriptions:
    workflow: data_ingestion/admag/prescriptions
    parameters:
      base_url: '@from(base_url)'
      client_id: '@from(client_id)'
      client_secret: '@from(client_secret)'
      authority: '@from(authority)'
      default_scope: '@from(default_scope)'
  compute_index:
    workflow: data_processing/index/index
    parameters:
      index: '@from(index)'
  soil_sample_heatmap:
    op: soil_sample_heatmap
    op_dir: heatmap_sensor
    parameters:
      attribute_name: '@from(attribute_name)'
      buffer: '@from(buffer)'
      bins: '@from(bins)'
      simplify: '@from(simplify)'
      tolerance: '@from(tolerance)'
      data_scale: '@from(data_scale)'
      distribute_output: '@from(distribute_output)'
      max_depth: '@from(max_depth)'
      n_estimators: '@from(n_estimators)'
      random_state: '@from(random_state)'
edges:
- origin: compute_index.index_raster
  destination:
  - soil_sample_heatmap.raster
- origin: prescriptions.response
  destination:
  - soil_sample_heatmap.samples
description:
  short_description: Utilizes input Sentinel-2 satellite imagery & the sensor samples
    as labeled data that contain nutrient information (Nitrogen, Carbon, pH, Phosphorus)
    to train a model using Random Forest classifier. The inference operation predicts
    nutrients in soil for the chosen farm boundary.
  long_description: The workflow generates a heatmap for selected nutrient. It relies
    on sample soil data that contain information of nutrients. The quantity of samples
    define the accuracy of the heat map generation. During the research performed
    testing with samples spaced at 200 feet, 100 feet and 50 feet. The 50 feet sample
    spaced distance provided results matching to the ground truth. Generating heatmap
    with this approach reduce the number of samples. It utilizes the logic below behind
    the scenes to generate heatmap. - Read the sentinel raster provided. - Sensor
    samples needs to be uploaded into prescriptions entity in Azure data manager for
    Agriculture (ADMAG). ADMAG is having hierarchy to hold information of Farmer,
    Field, Seasons, Crop, Boundary etc. Prior to uploading prescriptions, it is required
    to build hierarchy and a prescription_map_id. All prescriptions uploaded to ADMAG
    are related to farm hierarchy through prescription_map_id. Please refer to https://learn.microsoft.com/en-us/rest/api/data-manager-for-agri/
    for more information on ADMAG. - Compute indices using the spyndex python package.
    - Clip the satellite imagery & sensor samples using farm boundary. - Perform spatial
    interpolation to find raster pixels within the offset distance from sample location
    and assign the value of nutrients to group of pixels. - Classify the data based
    on number of bins. - Train the model using Random Forest classifier. - Predict
    the nutrients using the satellite imagery. - Generate a shape file using the predicted
    outputs.
  sources:
    input_raster: Input raster for index computation.
    admag_input: Required inputs to download prescriptions from admag.
  sinks:
    result: Zip file containing cluster geometries.
  parameters:
    base_url: URL to access the registered app
    client_id: Value uniquely identifies registered application in the Microsoft identity
      platform. Visit url https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app
      to register the app.
    client_secret: Sometimes called an application password, a client secret is a
      string value your app can use in place of a certificate to identity itself.
    authority: The endpoint URIs for your app are generated automatically when you
      register or configure your app. It is used by client to obtain authorization
      from the resource owner
    default_scope: URL for default azure OAuth2 permissions
    attribute_name: Nutrient property name in sensor samples geojson file. For example
      - CARBON (C), Nitrogen (N), Phosphorus (P) etc.,
    buffer: Offset distance from sample to perform interpolate operations with raster.
    index: Type of index to be used to generate heatmap. For example - evi, pri etc.,
    bins: Possible number of groups used to move value to nearest group using [numpy
      histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html)
      and to pre-process the data to support model training with classification .
    simplify: Replace small polygons in input with value of their largest neighbor
      after converting from raster to vector. Accepts 'simplify' or 'convex' or 'none'.
    tolerance: All parts of a [simplified geometry](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.simplify.html)
      will be no more than tolerance distance from the original. It has the same units
      as the coordinate reference system of the GeoSeries. For example, using tolerance=100
      in a projected CRS with meters as units means a distance of 100 meters in reality.
    data_scale: Accepts True or False. Default is False. On True, it scale data using
      [StandardScalar] (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
      from scikit-learn package.  It Standardize features by removing the mean and
      scaling to unit variance.
    distribute_output: Increases the output variance to avoid output polygon in shape
      file grouped into single large polygon.
    max_depth: The maximum depth of the tree. If None, then nodes are expanded until
      all leaves are pure or until all leaves contain less than min_samples_split
      samples. For more details refer to (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
    n_estimators: The number of trees in the forest. For more details refer to (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
    random_state: Controls both the randomness of the bootstrapping of the samples
      used when building trees (if bootstrap=True) and the sampling of the features
      to consider when looking for the best split at each node (if max_features <
      n_features). For more details refer to (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
graph TD inp1>admag_input] inp2>input_raster] out1>result] tsk1{{prescriptions}} tsk2{{compute_index}} tsk3{{soil_sample_heatmap}} tsk2{{compute_index}} -- index_raster/raster --> tsk3{{soil_sample_heatmap}} tsk1{{prescriptions}} -- response/samples --> tsk3{{soil_sample_heatmap}} inp1>admag_input] -- admag_input --> tsk1{{prescriptions}} inp2>input_raster] -- raster --> tsk2{{compute_index}} tsk3{{soil_sample_heatmap}} -- result --> out1>result]