Troubleshooting services

Allowing crash diagnostics

A parent can give a child permission to save the following data to the cloud in the event the child fails while processing a question:

  • Input values

  • Input manifest and datasets

  • Child configuration values

  • Child configuration manifest and datasets

  • Messages sent from the child to the parent

The parent can give permission on a question-by-question basis by setting allow_save_diagnostics_data_on_crash=True in Child.ask. For example:

child = Child(
    id="my-organisation/my-service:latest",
    backend={"name": "GCPPubSubBackend", "project_name": "my-project"},
)

answer = child.ask(
    input_values={"height": 32, "width": 3},
    allow_save_diagnostics_data_on_crash=True,
)

For crash diagnostics to be saved, the child must have the crash_diagnostics_cloud_path field in its service configuration (octue.yaml file) set to a Google Cloud Storage path.

Accessing crash diagnostics

In the event of a child crash, the child will upload the crash diagnostics and send the cloud path to them to the parent as a log message. A user with credentials to access this path can use the octue CLI to retrieve the crash diagnostics data:

octue get-crash-diagnostics <cloud-path>

More information on the command:

>>> octue get-crash-diagnostics -h

Usage: octue get-crash-diagnostics [OPTIONS] CLOUD_PATH

  Download crash diagnostics for an analysis from the given directory in
  Google Cloud Storage. The cloud path should end in the analysis ID.

  CLOUD_PATH: The path to the directory in Google Cloud Storage containing the
  diagnostics data.

Options:
  --local-path DIRECTORY  The path to a directory to store the directory of
                          diagnostics data in. Defaults to the current working
                          directory.
  -h, --help              Show this message and exit.