Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions jenkins/L0_MergeRequest.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -1039,7 +1039,7 @@ def launchStages(pipeline, reuseBuild, testFilter, enableFailFast, globalVars)
'wheelDockerImagePy312': globalVars["LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE"],
]

launchJob("L0_Test-x86_64-Single-GPU", false, enableFailFast, globalVars, "x86_64", additionalParameters)
launchJob("L0_Test-x86_64-Single-GPU", reuseBuild, enableFailFast, globalVars, "x86_64", additionalParameters)
} catch (InterruptedException e) {
throw e
} catch (Exception e) {
Expand Down Expand Up @@ -1095,7 +1095,7 @@ def launchStages(pipeline, reuseBuild, testFilter, enableFailFast, globalVars)
'wheelDockerImagePy312': globalVars["LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE"],
]

launchJob("L0_Test-x86_64-Multi-GPU", false, enableFailFast, globalVars, "x86_64", additionalParameters)
launchJob("L0_Test-x86_64-Multi-GPU", reuseBuild, enableFailFast, globalVars, "x86_64", additionalParameters)

} catch (InterruptedException e) {
throw e
Expand Down Expand Up @@ -1143,7 +1143,7 @@ def launchStages(pipeline, reuseBuild, testFilter, enableFailFast, globalVars)
"dockerImage": globalVars["LLM_SBSA_DOCKER_IMAGE"],
]

launchJob("L0_Test-SBSA-Single-GPU", false, enableFailFast, globalVars, "SBSA", additionalParameters)
launchJob("L0_Test-SBSA-Single-GPU", reuseBuild, enableFailFast, globalVars, "SBSA", additionalParameters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

SBSA single-GPU: ensure reuseBuild correctness and commit guard.

Plumbing is correct. Per PR objective, reuse should be ignored when the Git commit differs. That logic isn’t present here. Consider deriving the reused build’s commit and comparing to env.gitlabCommit before passing reuseArtifactPath.

I can draft a helper to query the reused build’s recorded commit and null out reuse if it differs.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- Show file existence and path ----"
rg --hidden --files --glob '!node_modules' --glob '!dist' "L0_MergeRequest.groovy" || true

echo
echo "---- Print lines 1100-1165 of jenkins/L0_MergeRequest.groovy (if exists) ----"
if [ -f jenkins/L0_MergeRequest.groovy ]; then
  nl -ba -w3 -s': ' jenkins/L0_MergeRequest.groovy | sed -n '1088,1168p'
else
  echo "file not found: jenkins/L0_MergeRequest.groovy"
fi

echo
echo "---- Search for launchJob definition/usage ----"
rg --hidden -n --glob '!node_modules' "launchJob" || true

echo
echo "---- Show functions/places that reference reuseBuild or reuseArtifactPath ----"
rg --hidden -n --glob '!node_modules' "reuseBuild|reuseArtifactPath|gitlabCommit|env.gitlabCommit" || true

echo
echo "---- Show surrounding function where the launch occurs (context for the call site) ----"
# Try to find the method that contains the call near the shown line
if rg -n "L0_Test-SBSA-Single-GPU" jenkins/L0_MergeRequest.groovy >/dev/null 2>&1; then
  line_num=$(rg -n "L0_Test-SBSA-Single-GPU" jenkins/L0_MergeRequest.groovy | awk -F: '{print $2}' | head -n1)
  start=$(( line_num - 60 ))
  end=$(( line_num + 20 ))
  if [ "$start" -lt 1 ]; then start=1; fi
  nl -ba -w3 -s': ' jenkins/L0_MergeRequest.groovy | sed -n "${start},${end}p"
else
  echo "string 'L0_Test-SBSA-Single-GPU' not found in jenkins/L0_MergeRequest.groovy"
fi

Length of output: 641


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- List all .groovy files (first 200) ----"
rg --hidden --files -g '*.groovy' | sed -n '1,200p' || true

echo
echo "---- Search for the specific job string ----"
rg -n --hidden --no-ignore-vcs --line-number "L0_Test-SBSA-Single-GPU" || true

echo
echo "---- Search for launchJob usages (calls) ----"
rg -n --hidden --no-ignore-vcs --line-number "launchJob\\(" || true

echo
echo "---- Search for reuse-related identifiers ----"
rg -n --hidden --no-ignore-vcs --line-number "reuseBuild|reuseArtifactPath|env\\.gitlabCommit|gitlabCommit" || true

echo
echo "---- Try to find launchJob definition ----"
rg -n --hidden --no-ignore-vcs --line-number "^\\s*(def|function)\\s+launchJob\\b|launchJob\\s*=\\s*\\{\\s*\\$" || true

Length of output: 8047


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE1=jenkins/L0_MergeRequest.groovy
FILE2=jenkins/Build.groovy
FILE3=jenkins/L0_Test.groovy

echo "---- jenkins/L0_MergeRequest.groovy (120-140) ----"
sed -n '120,140p' "$FILE1" || true
echo

echo "---- jenkins/L0_MergeRequest.groovy (300-330) ----"
sed -n '300,330p' "$FILE1" || true
echo

echo "---- jenkins/L0_MergeRequest.groovy (960-1010) launchJob definition ----"
sed -n '960,1010p' "$FILE1" || true
echo

echo "---- jenkins/L0_MergeRequest.groovy (1116-1160) surrounding SBSA Single-GPU call ----"
sed -n '1116,1160p' "$FILE1" || true
echo

echo "---- jenkins/Build.groovy (270-420) downloadArtifacts / buildOrCache ----"
sed -n '270,420p' "$FILE2" || true
echo

echo "---- jenkins/L0_Test.groovy (1-48) top definitions ----"
sed -n '1,48p' "$FILE3" || true
echo

echo "---- Search for artifactCommit assignments ----"
rg -n --hidden --no-ignore-vcs "artifactCommit" || true

Length of output: 15155


SBSA single-GPU: enforce commit check before reusing artifacts

launchJob currently sets reuseArtifactPath whenever reuseBuild is truthy — there is no verification that the reused build's recorded commit matches env.gitlabCommit. That allows reusing artifacts from a different commit.

Files to change:

  • jenkins/L0_MergeRequest.groovy
    • String reuseBuild = gitlabParamsFromBot.get('reuse_build', null) (≈ line 134)
    • def launchJob(...) (defined ≈ line 969): replace the unconditional reuseArtifactPath assignment with a validated path only when the reused build's recorded commit equals env.gitlabCommit.
    • Add a helper (e.g. validateReuseBuildCommit or resolveReuseBuild) that queries the reused build's recorded commit (via Artifactory metadata or a small marker file under sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${reuseBuild}) and returns null when it differs.

Optional/related:

  • jenkins/Build.groovy (downloadArtifacts/buildOrCache, ≈ lines 270–420) currently falls back on rebuild if downloads fail, but does not check commit equality — keep as a safety net but perform the commit validation earlier (in L0_MergeRequest.groovy/launchJob).

Suggested minimal diff (conceptual):

  • Replace in launchJob:
    if (reuseBuild) {
    parameters['reuseArtifactPath'] = "sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${reuseBuild}"
    }

  • With:
    if (reuseBuild) {
    def validReuse = validateReuseBuildCommit(reuseBuild)
    if (validReuse) {
    parameters['reuseArtifactPath'] = "sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${validReuse}"
    } else {
    echo "Ignoring reuse_build=${reuseBuild}: recorded commit != ${env.gitlabCommit}"
    }
    }

I can draft the validateReuseBuildCommit helper (using Artifactory REST API or rtDownload to fetch a small commit marker) and a ready-to-drop patch if you want.

🤖 Prompt for AI Agents
In jenkins/L0_MergeRequest.groovy around lines 130–140 and 960–980 (and helper
addition anywhere near other helper defs), ensure reuse_build is not blindly
trusted: change the code that reads String reuseBuild =
gitlabParamsFromBot.get('reuse_build', null) to keep the value but do not set
reuseArtifactPath unconditionally in launchJob; instead, implement and call a
helper validateReuseBuildCommit(reuseBuild) that queries the reused-build
metadata (e.g., Artifactory REST API or download a small commit marker file at
sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${reuseBuild}) and returns the
reuseBuild string only if its recorded commit equals env.gitlabCommit, otherwise
return null; then in launchJob replace the unconditional
parameters['reuseArtifactPath'] assignment with a conditional that sets
parameters['reuseArtifactPath'] =
"sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${validReuse}" only when
validateReuseBuildCommit returned a non-null value and echo a message when
ignored; leave Build.groovy fallback as-is.

} catch (InterruptedException e) {
throw e
} catch (Exception e) {
Expand Down Expand Up @@ -1197,7 +1197,7 @@ def launchStages(pipeline, reuseBuild, testFilter, enableFailFast, globalVars)
"dockerImage": globalVars["LLM_SBSA_DOCKER_IMAGE"],
]

launchJob("L0_Test-SBSA-Multi-GPU", false, enableFailFast, globalVars, "SBSA", additionalParameters)
launchJob("L0_Test-SBSA-Multi-GPU", reuseBuild, enableFailFast, globalVars, "SBSA", additionalParameters)

} catch (InterruptedException e) {
throw e
Expand Down
46 changes: 46 additions & 0 deletions jenkins/L0_Test.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ LLM_ROOT = "llm"
ARTIFACT_PATH = env.artifactPath ? env.artifactPath : "sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${BUILD_NUMBER}"
UPLOAD_PATH = env.uploadPath ? env.uploadPath : "sw-tensorrt-generic/llm-artifacts/${JOB_NAME}/${BUILD_NUMBER}"

REUSE_ARTIFACT_PATH = env.reuseArtifactPath

X86_64_TRIPLE = "x86_64-linux-gnu"
AARCH64_TRIPLE = "aarch64-linux-gnu"

Expand Down Expand Up @@ -349,6 +351,10 @@ def runLLMTestlistOnSlurm_MultiNodes(pipeline, platform, testList, config=VANILL
// if the line cannot be split by "=", just ignore that line.
def makoOptsJson = transformMakoArgsToJson(["Mako options:"] + makoArgs)
def testListPath = renderTestDB(testList, llmSrcLocal, stageName, makoOptsJson)
// Reuse passed tests
if (REUSE_ARTIFACT_PATH) {
reusePassedTests(pipeline, llmSrcLocal, REUSE_ARTIFACT_PATH, stageName, testListPath)
}
Utils.exec(pipeline, script: "sshpass -p '${remote.passwd}' scp -r -p -oStrictHostKeyChecking=no ${testListPath} ${remote.user}@${remote.host}:${testListPathNode}",)

// Generate Multi Node Job Launch Script
Expand Down Expand Up @@ -1063,6 +1069,41 @@ def renderTestDB(testContext, llmSrc, stageName, preDefinedMakoOpts=null) {
return testList
}

def reusePassedTests(pipeline, llmSrc, reusedArtifactPath, stageName, testListFile) {
def reusedPath = "${WORKSPACE}/reused"
sh "mkdir -p ${reusedPath}"
def resultsFileName = "results-${stageName}"
def passedTestsFile = "${reusedPath}/${stageName}/passed_tests.txt"
try {
def resultsUrl = "https://urm.nvidia.com/artifactory/${reusedArtifactPath}/test-results/${resultsFileName}.tar.gz"
trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${reusedPath} && wget -nv ${resultsUrl}")
sh "cd ${reusedPath} && tar -zxf ${resultsFileName}.tar.gz"
// Get passed tests
sh """
python3 ${llmSrc}/jenkins/scripts/delete_passed_tests.py \
get_passed_tests \
--input-file=${reusedPath}/${stageName}/results.xml \
--output-file=${passedTestsFile}
"""
sh "The passed tests are: \$(cat ${passedTestsFile})"

// Copy the original test file to a new file
sh "cp ${testListFile} original_${testListFile}"
// Remove passed tests from original test file
sh """
python3 ${llmSrc}/jenkins/scripts/delete_passed_tests.py \
remove_passed_tests \
--input-file=${testListFile} \
--passed-tests-file=${passedTestsFile}
"""
sh "The test list after removing passed tests is: \$(cat ${testListFile})"
} catch (InterruptedException e) {
throw e
} catch (Exception e) {
echo "Failed to get passed tests: ${e.message}"
}
}

def getSSHConnectionPorts(portConfigFile, stageName)
{
def type = stageName.split('-')[0]
Expand Down Expand Up @@ -1409,6 +1450,11 @@ def runLLMTestlistOnPlatformImpl(pipeline, platform, testList, config=VANILLA_CO
extraInternalEnv += " CPP_TEST_TIMEOUT_OVERRIDDEN=${pytestTestTimeout}"

def testDBList = renderTestDB(testList, llmSrc, stageName)
// Reuse passed tests
if (REUSE_ARTIFACT_PATH) {
reusePassedTests(pipeline, llmSrc, REUSE_ARTIFACT_PATH, stageName, testDBList)
}

testList = "${testList}_${splitId}"
def testCmdLine = [
"LLM_ROOT=${llmSrc}",
Expand Down
94 changes: 94 additions & 0 deletions jenkins/scripts/reuse_passed_tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import argparse
import os
import sys
import xml.etree.ElementTree as ET

import test_rerun


def get_passed_tests(input_file, output_file):
if not os.path.exists(input_file):
print(f"Input file {input_file} does not exist")
return

# Parse the JUnit XML file and extract passed test names
passed_tests = []
try:
tree = ET.parse(input_file)
root = tree.getroot()
suite = root.find('testsuite')
for testcase in suite.iter('testcase'):
# Check test status
has_failure = testcase.find('failure') is not None
has_error = testcase.find('error') is not None
has_skipped = testcase.find('skipped') is not None
if not has_failure and not has_error and not has_skipped:
# Parse the test name
classname = testcase.attrib.get('classname', '')
name = testcase.attrib.get('name', '')
filename = testcase.attrib.get('file', '')
test_name = test_rerun.parse_name(classname, name, filename)
passed_tests.append(test_name)
except Exception as e:
print(f"Failed to parse {input_file}: {e}")
return
Comment on lines +32 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Catch narrow exception type and log to stderr

Catching broad Exception hides unrelated issues. Limit to XML parse errors and send diagnostics to stderr.

Apply this diff:

-    except Exception as e:
-        print(f"Failed to parse {input_file}: {e}")
+    except ET.ParseError as e:
+        print(f"Failed to parse {input_file}: {e}", file=sys.stderr)
         return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
print(f"Failed to parse {input_file}: {e}")
return
except ET.ParseError as e:
print(f"Failed to parse {input_file}: {e}", file=sys.stderr)
return
🤖 Prompt for AI Agents
In jenkins/scripts/reuse_passed_tests.py around lines 32-34, replace the broad
"except Exception as e" with a narrow XML parse error handler (e.g., except
xml.etree.ElementTree.ParseError as e or except ET.ParseError as e if you import
xml.etree.ElementTree as ET), ensure you import sys and the XML module if not
already, and write the diagnostic to stderr (print(f"Failed to parse
{input_file}: {e}", file=sys.stderr)) before returning.


# Write passed test names to output file, one per line
with open(output_file, 'w') as f:
for test in passed_tests:
f.write(test + '\n')


def remove_passed_tests(input_file, passed_tests_file):
if not os.path.exists(input_file):
print(f"Input file {input_file} does not exist")
return
if not os.path.exists(passed_tests_file):
print(f"Passed tests file {passed_tests_file} does not exist")
return

Comment on lines +42 to +49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

remove_passed_tests: input semantics, performance, and I/O safety

  • The help text says “Input XML file containing test results” but this function treats it as a plain list of tests. Fix the mismatch.
  • Use a set for O(1) lookups.
  • No need to delete the file before rewriting; opening with 'w' truncates atomically enough for this use.
  • Add type hints, docstring, and UTF-8 encoding. Also skip blank lines.

Apply this diff:

-def remove_passed_tests(input_file, passed_tests_file):
+def remove_passed_tests(input_file: str, passed_tests_file: str) -> None:
+    """Remove tests that already passed from a newline-delimited test list file.
+
+    Args:
+        input_file: Path to a file containing tests to run (one per line).
+        passed_tests_file: Path to a file containing already-passed tests (one per line).
+    """
     if not os.path.exists(input_file):
-        print(f"Input file {input_file} does not exist")
+        print(f"Input file {input_file} does not exist", file=sys.stderr)
         return
     if not os.path.exists(passed_tests_file):
-        print(f"Passed tests file {passed_tests_file} does not exist")
+        print(f"Passed tests file {passed_tests_file} does not exist", file=sys.stderr)
         return
 
-    passed_tests = []
-    # Read passed tests from file
-    with open(passed_tests_file, 'r') as f:
-        for line in f:
-            passed_tests.append(line.strip())
+    # Read passed tests from file
+    with open(passed_tests_file, 'r', encoding='utf-8') as f:
+        passed_tests = {line.strip() for line in f if line.strip()}
 
-    tests_to_keep = []
-    # Remove passed tests from input file
-    with open(input_file, 'r') as f:
-        for line in f:
-            if line.strip() not in passed_tests:
-                tests_to_keep.append(line.strip())
+    # Remove passed tests from input file
+    with open(input_file, 'r', encoding='utf-8') as f:
+        tests_to_keep = [line.strip()
+                         for line in f
+                         if line.strip() and line.strip() not in passed_tests]
 
-    # Delete input file
-    try:
-        os.remove(input_file)
-    except Exception as e:
-        print(f"Failed to delete {input_file}: {e}")
     # Write tests to keep to input file
-    with open(input_file, 'w') as f:
+    with open(input_file, 'w', encoding='utf-8') as f:
         for test in tests_to_keep:
             f.write(test + '\n')

Also applies to: 50-71

passed_tests = []
# Read passed tests from file
with open(passed_tests_file, 'r') as f:
for line in f:
passed_tests.append(line.strip())

tests_to_keep = []
# Remove passed tests from input file
with open(input_file, 'r') as f:
for line in f:
if line.strip() not in passed_tests:
tests_to_keep.append(line.strip())

# Delete input file
try:
os.remove(input_file)
except Exception as e:
print(f"Failed to delete {input_file}: {e}")
# Write tests to keep to input file
with open(input_file, 'w') as f:
for test in tests_to_keep:
f.write(test + '\n')


if __name__ == '__main__':
if (sys.argv[1] == "get_passed_tests"):
parser = argparse.ArgumentParser()
parser.add_argument('--input-file',
required=True,
help='Input XML file containing test results')
parser.add_argument('--output-file',
required=True,
help='Output file to write passed tests')
args = parser.parse_args(sys.argv[2:])
get_passed_tests(args.input_file, args.output_file)
elif (sys.argv[1] == "remove_passed_tests"):
parser = argparse.ArgumentParser()
parser.add_argument('--input-file',
required=True,
help='Input XML file containing test results')
parser.add_argument('--passed-tests-file',
required=True,
help='File containing passed tests')
args = parser.parse_args(sys.argv[2:])
remove_passed_tests(args.input_file, args.passed_tests_file)
1 change: 1 addition & 0 deletions tests/integration/defs/test_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -1462,6 +1462,7 @@ def test_openai_completions_example(llm_root, llm_venv, backend: str):

@pytest.mark.parametrize("backend", ["pytorch", "trt"])
def test_openai_chat_example(llm_root, llm_venv, backend: str):
pytest.fail("test_openai_chat_example")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unconditional failure; this breaks the entire param test.

pytest.fail("test_openai_chat_example") makes the test always fail across backends. If this was for debugging, please remove before merge.

Apply this diff:

-    pytest.fail("test_openai_chat_example")
+    # Removed accidental debug fail; enable test execution
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pytest.fail("test_openai_chat_example")
# Removed accidental debug fail; enable test execution
🤖 Prompt for AI Agents
In tests/integration/defs/test_e2e.py around line 1465 there is an unconditional
pytest.fail("test_openai_chat_example") which forces the entire parametrized
test suite to fail; remove that pytest.fail call (or replace it with an
appropriate assertion or skip if debugging was intended) so the test can run
normally across backends, and if this was temporary debugging add a comment
and/or use pytest.skip with a reason instead of fail.

test_root = unittest_path() / "llmapi" / "apps"
filter_expr = f"{backend} and not sampler"
llm_venv.run_cmd([
Expand Down