diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 9d4aa9b98..cd4d4e586 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -49,6 +49,7 @@ include::partial$nav-app-dev.adoc[] *** xref:creating-and-managing-a-cluster-manually.adoc[Create and Manage a Cluster Manually] *** xref:hadr-guide.adoc[High Availability and Disaster Recovery] * xref:mule-upgrade-tool.adoc[Mule Upgrade Tool] +* xref:mule-troubleshooting-plugin.adoc[Mule Troubleshooting Plugin] * xref:using-maven-with-mule.adoc[Maven Support in Mule] ** xref:mmp-concept.adoc[Mule Maven Plugin] ** xref:package-a-mule-application.adoc[Package a Mule Application] diff --git a/modules/ROOT/pages/mule-troubleshooting-plugin.adoc b/modules/ROOT/pages/mule-troubleshooting-plugin.adoc new file mode 100644 index 000000000..fc49e2ba8 --- /dev/null +++ b/modules/ROOT/pages/mule-troubleshooting-plugin.adoc @@ -0,0 +1,399 @@ += Mule Troubleshooting Plugin +ifndef::env-site,env-github[] +include:: +endif::[] + +Use the Mule Troubleshooting plugin to generate structured diagnostic information, simplify troubleshooting, and provide consistent data for Mule runtime support. + +The Mule Troubleshooting plugin provides a unified way to collect diagnostic data from Mule runtime environments. It generates a structured diagnostic archive called the Diagnostic Information Analysis File (DIAF), which consolidates Mule runtime information, application metrics, and system data into a single, standardized output. + +This Java-based plugin provides an extensible, environment-agnostic solution that simplifies troubleshooting for Mule runtime engineers, MuleSoft Support teams, customers running self-service diagnostics, and AI-assisted analysis. + +== Before You Begin + +Before using the plugin, make sure that you have the following prerequisites: + +* Mule runtime distribution starting with 4.10, with patches available for 4.6 and 4.9. +* Java 8 or later, matching the Mule runtime version requirements. +* Access to `$MULE_HOME`. The CLI script `diag` automatically locates the Mule home directory. + +The plugin works out-of-the-box in all deployment models (Standalone, CloudHub, CloudHub 2.0, Runtime Fabric) without installing additional dependencies. + +== Using the Mule Troubleshooting Plugin + +Run the following command from your Mule runtime installation at `$MULE_HOME/tools/diag` to generate the DIAF and a thread dump. By default, this creates a ZIP file named `mule_dump_[timestamp].zip`. +Use the `./diag --support` command to generate a heap dump. The Mule dump is saved in the `logs` directory by default, or use the `./diag --output` command to save the dump to a different path. + +The plugin's help output lists the available commands and options. + +[source,bash] +---- +➜ mule-enterprise-standalone-4.6.21-SNAPSHOT ./tools/diag help +Mule Troubleshooting Tool +========================= + +Usage: ./diag [options] [command] [command-options] + +Commands: + diaf Generate a complete Mule diagnostic dump (default) + help Show this help message + Execute a specific troubleshooting operation + +Global Options: + --stdout Output the diagnostic dump to standard output + --output Specify custom output directory or file path + --support Enable support mode (includes heap dump) + --debug Enable debug mode with remote debugging on port 5005 + +Examples: + ./diag # Generate diagnostic dump to logs directory + ./diag --stdout # Output diagnostic dump to stdout + ./diag --support # Include heap dump in diagnostic + ./diag --output /tmp/mule.zip # Save to specific file + ./diag --output /tmp/ # Save to specific directory + ./diag # Execute specific operation + +Output: + By default, the tool creates a ZIP file containing: + - mule_dump_.diaf # Diagnostic information + - thread_dump_.txt # Thread dump + - heap_dump_.hprof # Heap dump (if --support is used) + + The ZIP file is saved to the 'logs' directory by default. +---- + +== Understanding Diagnostic Information Analysis File (DIAF) + +The Diagnostic Information Analysis File (DIAF) groups all diagnostic data collected by the Mule Troubleshooting plugin into structured sections. Use this reference to understand the content of each section: + +* <> +* <> +* <> +* <> +* <> +* <> + +[[diaf-title]] +=== Title + +This section shows the report generation timestamp. + +[cols="1,3", options="header"] +|=== +| Field | Description + +| Report Generation Timestamp +| The report generation time, expressed in the local time zone. +|=== + +[[diaf-basic-info]] +=== Basic Information + +This section shows details about the environment where the Mule runtime instance is running. + +[cols="1,3", options="header"] +|=== +| Field | Description + +| Mule Product/version +| The product (CE/EE), version, and build number of the Mule runtime. Formatted as `[productName] [version] (build [buildNumber])`. + +| `mule_home` +| Absolute path to `MULE_HOME` for the Mule runtime. + +| `mule_base` +| Absolute path to MULE_BASE for the Mule runtime. + +| `mule.*` System Properties +| All system properties starting with `mule.`, including those defined by DataWeave and API Gateway. Listed with values and sorted alphabetically. + +| Java Version +| Version of the JVM running the Mule runtime. + +| Java Vendor +| Vendor of the JVM running the Mule runtime. + +| Java VM Name +| Full name of the JVM running the Mule runtime. + +| `JAVA_HOME` +| Location of the JVM running the Mule runtime. + +| OS Name +| Name of the OS running the Mule runtime. + +| OS Version +| Version of the OS running the Mule runtime. + +| OS Arch +| Architecture of the OS (for example, `amd64`, `aarch`). + +| Running Time +| The total time the Mule runtime has been running. + +| PID +| Process ID of the JVM running the Mule runtime. + +| Report Millis Time +| Report generation time in milliseconds since epoch (`System.currentTimeMillis`). + +| Report Nano Time +| Report generation time in nanoseconds (`System.nanoTime`). + +| `memory.used` +| Amount of used memory in the JVM. + +| `memory.free` +| Amount of free memory in the JVM. + +| `memory.total` +| Total amount of memory in the JVM. + +| `memory.max` +| Maximum amount of memory the JVM attempts to use. + +| `memory.used/total` +| Percentage of used memory compared to the total allocated memory. + +| `memory.used/max` +| Percentage of used memory compared to the maximum available memory. + +| `load.process` +| Percentage of recent CPU usage for the JVM process; negative value if unavailable. + +| `load.system` +| Percentage of recent CPU usage for the whole system; negative value if unavailable. + +| `load.systemAverage` +| System load average for the last minute; negative value if unavailable. +|=== + +[[diaf-statistics]] +=== Statistics + +This section shows detailed statistics information about deployed Mule applications and their performance metrics. + +==== General Application Metrics + +[cols="1,3", options="header"] +|=== +| Field | Description + +| Events Received +| Number of events received by the application or flow. + +| Events Processed +| Number of events processed by the application or flow. + +| Messages Dispatched +| Total number of messages dispatched from message sources within the application. + +| Execution Errors +| Number of execution errors encountered. + +| Fatal Errors +| Number of fatal errors that cause the application to fail or stop processing. + +| Connection Errors +| Number of connection-related errors that occur. + +| Average Processing Time +| Average time (in milliseconds) required to process an event. + +| Min Processing Time +| Minimum time (in milliseconds) required to process an event. + +| Max Processing Time +| Maximum time (in milliseconds) required to process an event. + +| Total Processing Time +| Cumulative time (in milliseconds) spent processing all events. +|=== + +==== Flow Summary Statistics + +[cols="1,3", options="header"] +|=== +| Field | Description + +| Private Flows Declared +| Total number of private flows declared in the application. A private flow doesn't contain a `MessageSource` and isn't used by an APIkit router. + +| Private Flows Active +| Number of private flows that are currently in a started state. + +| Trigger Flows Declared +| Total number of trigger flows declared in the application. A trigger flow contains a MessageSource. + +| Trigger Flows Active +| Number of trigger flows currently in a started state. + +| API Kit Flows Declared +| Total number of APIkit flows declared in the application. An APIkit flow is used by an APIkit router but doesn't contain a `MessageSource`. + +| API Kit Flows Active +| Number of APIkit flows currently in a started state. +|=== + +==== Flow Statistics + +[cols="1,3", options="header"] +|=== +| Field | Description + +| Events Received +| Total number of events received by the application since it started. + +| Events Processed +| Total number of events successfully processed by the application. + +| Messages Dispatched +| Total number of messages dispatched from message sources within the application. + +| Execution Errors +| Number of execution errors that occur during event processing. + +| Fatal Errors +| Number of fatal errors that cause the application to fail or stop processing. + +| Connection Errors +| Number of connection-related errors that occur. + +| Average Processing Time +| Average time (in milliseconds) required to process an event. +|=== + +[[diaf-fuse-board]] +=== Fuse Board + +This section shows alerts for known Mule runtime issues. The report lists how many times each alert triggers during the last 1, 5, 15, and 60 minutes along with the context of the alert at the time of triggering. Some alerts trigger multiple times with the same context, such as the backpressure alert, so the plugin shows the context once and indicates how many times it happens to avoid flooding the report. Alerts that don't trigger in any of the time intervals aren't included in the report. + +[cols="1,3", options="header"] +|=== +| Field | Description + +| `MULE:UNKNOWN` error raised +| `MULE:UNKNOWN` errors are generated by the runtime and go unhandled. If such an error raises or appears in the app log, it indicates a bug in the Mule runtime. The context shows the details of the errors catalogued as `MULE:UNKNOWN`. + +| Reactor discarded event +| A discarded event is one that a component explicitly filters in a flow. This effectively cuts the processing of such event, causing the execution to hang. The context shows the correlation ID of each discarded event. + +| Reactor dropped event +| A dropped event doesn't properly pass to the following component in a flow through a reactor chain and doesn't complete. Its symptom is that the event is “hanged”. No context is shown for this alert because information is already available in the event dump. + +| Reactor dropped error +| A dropped error doesn't properly pass to the corresponding error handler in a flow through a reactor chain, and so doesn't complete. Its symptom is that the event is “hanged” when an error occurs. The context shows the string representation of each dropped error. + +| Not consumed stream +| A stream that is garbage collected before being completely consumed may provoke leaks on certain conditions (the most common one is connections from a DB connection pool that remain taken until the data is fully read). The context shows the originating location of the components that generated the streams. + +| Backpressure triggered +| Backpressure is the mechanism by which incoming events in excess of current capacity are rejected. This happens because of a spike of incoming events or a longer than usual processing time of the flows. A common sign is when backpressure triggers on systems that have a CPU and memory capacity. The context shows the flow or component that exceeded capacity and the reason for backpressure. + +| XA recovery start error +| Triggered when recovery of an XA connection fails to start. The context shows the unique name (including the config name) of the connection for which recovery fails. + +| Async logger ringbuffer full +| When a log appender writes logs slower than the log entries are generated, the logger ringbuffer fills up. When full, threads attempting to log either wait for space in the ringbuffer or log synchronously, depending on the configuration. In either case, a thread that shouldn't block or wait does so, causing performance issues in the Mule runtime. No context is available for this alert because it always means the same, the buffer is full. +|=== + +[NOTE] +-- +The report provides hints about potential issues. For details on a specific alert, query the log. This report isn't intended to replace the log for detailed analysis. +-- + +[[diaf-event-dump]] +=== Event Dump + +This section shows a hierarchical listing of in-flight events. For each event hierarchy executing through a flow in Mule has at least one entry in the report. For each child context for the event, a nested entry appears, sorted in a stack order: children on top, parents on bottom. If an event is dropped, the legend `DROPPED` appears next to it. + +[cols="1,3", options="header"] +|=== +| Field | Description + +| `eventId` +| A unique identifier for the event. For child events, it has the ID of the parent event context as prefix. + +| `runningTime` +| How long the event has been running. For child events, this time refers to the execution of this child context. The format is “mm:ss”. + +| `eventContextState` +a| +* `EXECUTING`: Event is being executing by the flow or executable component, or has finished but the response is still being processed. +* `RESPONSE_PROCESSED`: Event execution is complete and the response is handled. +* `COMPLETE`: Same as `RESPONSE_PROCESSED`, and all child events are `RESPONSE_PROCESSED`. +* `TERMINATED`: After `COMPLETE`, and all completion callbacks of the context execute. + +| `flowStack` +a| `flowStack` is composed by zero-to-many lines, each with this format. +[source,xml] +---- +at [componentId]@[componentLocation]([muleFileName]:[muleFileLineNumber]) [timeInLocation] ms +---- + +| `flowStack.componentId` +| Identifier of the component (for example, `http:request`). + +| `flowStack.componentLocation` +| Unique identifier of a component within a Mule application. The first part is the flow or policy name, followed by the index and chains where the component is nested. + +| `flowStack.muleFileName` +| Name of the Mule config file where the component is located. + +| `flowStack.muleFileLineNumber` +| Line number in the Mule configuration file where the component is located. + +| `flowStack.timeInLocation` +| Duration in milliseconds the event spends at the `flowStack` entry. +|=== + +[[diaf-schedulers]] +=== Schedulers + +This section shows the status and metrics of each scheduler. For Mule runtime instances with multiple deployed applications, entries are grouped by application. + +[cols="1,3", options="header"] +|=== +| Field | Description + +| `schedulerName` +| Name assigned to the scheduler when created, showing where in the code it happened. + +| `threadType` +a| Type of tasks the scheduler runs: + +* `IO`: A task that spends most of its execution waiting for I/O operations to complete. +* `CPU_INTENSIVE`: A task that runs longer than 10 milliseconds, with less than 20% of time blocked. +* `CPU_LIGHT`: A task that never blocks and runs is less than 10 milliseconds. +* `CUSTOM`: Threads that aren't managed by Mule runtime or shared between schedulers. Used when a thread pool needs exclusive use (for example, NIO selectors). + +| `shutdown` +| A shutdown scheduler doesn't accept new tasks. Tasks still running are allowed a graceful period to complete. + +| `terminated` +| A terminated scheduler is shut down and all in progress tasks are completed or forcefully terminated after a graceful shutdown period. + +| `activeTasks` +| Number of tasks currently executing by the scheduler. + +| `queuedTasks` +| Number of tasks waiting in a queue. Not shown if there's no queue, the queue size can't be queried, or no tasks are queued. + +| `rejections` +| Number of tasks rejected because the scheduler is at capacity. Shows rejections in the last 1, 5, 15, and 60 minutes. If there aren't any in those intervals, the alert isn't shown. + +| `throttles` +| Number of tasks throttled because the scheduler is at capacity. Shows throttles in the last 1, 5, 15, and 60 minutes. If there aren't any in those intervals, the alert isn't shown. +|=== + +== Technical Considerations + +* DIAF provides investigation hints. Check the logs for complete details. +* Heap dumps may contain sensitive data. Enable `--support` only in secure environments. +* In Mule runtime instances with multiple applications, DIAF sections are grouped by application. + +== Best Practices + +* Use DIAF for initial troubleshooting before collecting heap or thread dumps manually. +* Correlate events in the event dump section with logs by using the `eventId` for deeper analysis. +* Collect scheduled diagnostics during maintenance windows in production environments. \ No newline at end of file