Skip to content

Commit

Permalink
CI log improvements (#621)
Browse files Browse the repository at this point in the history
* Add groups to script steps.

* [skip-tests] missing quote

* [skip-tests] Use function to only print group in GHA.

* Fix color.

* [skip-tests] Add group for env details.

* [skip-tests] Add group to run_as_coder repro instructions.

* Don't error on unbound.

* Don't print script args

* Color coder print message.

* Avoid unbound errors with GITHUB_ACTIONS.

* Don't run nvidia-smi manually in the test job.

* sccache stats group.

* Avoid sccache stats if sccache is not available.

* [skip-tests] Inject intentional error.

* Revert "[skip-tests] Inject intentional error."

This reverts commit 7270a0c.

* Use preset name in group name.

* Parameterize color.

* Print sccache stats in group.

* Add problem matcher.

* Add problem matcher before moving repo files.

* Remove the cmake regexs for now.

* Try different problem matcher.

* Just remove problem matchers for now.

* Fix if

* Remove redundant sccache stats.

* Try adding problem matcher again.

* Fix problem-matcher file name.

* [skip-tests] Run smaller matrix for debug.

* Fix path.

* Use json array for matcher.

* Fix json array.

* [skip-tests] Disable verify devcontainers for now.

* disable verify-devcontainers

* Exclude home/coder from the path in the matcher.

* Try a different regex.

* Exclude leading slash.

* Run as coder user.

* Revert "Run as coder user."

This reverts commit dace5f6.

* Add ninja summary stats.

* Fix permissions of ninja summary script.

* Make color conditional upon status.

* Make sure to get correct build status.

* Exit if build failed.

* Fix if statement.

* Print when build fails.

* Disable exiting on non-zero return.

* Don't use local, it resets exit code.

* Fix variable name.

* Emit error.

* Make sccache stats part of group title.

* Make repro instructions a conditional step.

* Get rid of old code.

* Go back to putting the repro instructions in the command step.

* Don't output error::.

* Update problem matcher.

* Don't capture cmake output.

* Fix group name.

* Actually disable exiting on non-zero return.

* Add echo -e.

* Fix spacing.

* Redundant "build".

* Add space to fix emoji.

* Move end message logic into end group.

* Fix group name.

* Don't print in GHA on success.

* Fix emojis.

* Refactor group command logic into function.

* Docs.

* Return status from run_command.

* Revert test changes.

* Update repro instructions.

* Remove excess.

* Use print_env_details directly to avoid duplicates.

* Update problem-matcher.json

* Add timing to build/test scripts.
  • Loading branch information
jrhemstad committed Nov 30, 2023
1 parent 387e1f5 commit c4769d7
Show file tree
Hide file tree
Showing 13 changed files with 606 additions and 44 deletions.
14 changes: 14 additions & 0 deletions .github/problem-matchers/problem-matcher.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"problemMatcher": [
{
"owner": "nvcc",
"pattern": [
{
"regexp": "^\\/home\\/coder\\/(.+):(\\d+):(\\d+): (\\w+): \"(.+)\"$",
"severity": 4,
"message": 5
}
]
}
]
}
1 change: 0 additions & 1 deletion .github/workflows/build-and-test-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,4 @@ jobs:
runner: linux-${{inputs.cpu}}-gpu-v100-latest-1
image: ${{inputs.container_image}}
command: |
nvidia-smi
${{ inputs.test_script }}
22 changes: 17 additions & 5 deletions .github/workflows/run-as-coder.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,18 +39,30 @@ jobs:
run: |
cp -R cccl /home/coder/cccl
chown -R coder:coder /home/coder/
- name: Add NVCC problem matcher
run: |
echo "::add-matcher::cccl/.github/problem-matchers/problem-matcher.json"
- name: Configure credentials and environment variables for sccache
uses: ./cccl/.github/actions/configure_cccl_sccache
- name: Run command
shell: su coder {0}
run: |
set -exo pipefail
set -eo pipefail
cd ~/cccl
echo -e "\e[1;34mRunning as 'coder' user in $(pwd):\e[0m"
echo -e "\e[1;34m${{inputs.command}}\e[0m"
eval "${{inputs.command}}" || exit_code=$?
if [ ! -z "$exit_code" ]; then
echo "::error::Error! To checkout the corresponding code and reproduce locally, run the following commands:"
echo "git clone --branch $GITHUB_REF_NAME --single-branch --recurse-submodules https://github.com/$GITHUB_REPOSITORY.git && cd $(echo $GITHUB_REPOSITORY | cut -d'/' -f2) && git checkout $GITHUB_SHA"
echo "docker run --rm -it --gpus all --pull=always --volume \$PWD:/repo --workdir /repo ${{ inputs.image }} ${{inputs.command}}"
exit $exit_code
echo -e "::group::️❗ \e[1;31mInstructions to Reproduce CI Failure Locally\e[0m"
echo "::error:: To replicate this failure locally, follow the steps below:"
echo "1. Clone the repository, and navigate to the correct branch and commit:"
echo " git clone --branch $GITHUB_REF_NAME --single-branch https://github.com/$GITHUB_REPOSITORY.git && cd $(echo $GITHUB_REPOSITORY | cut -d'/' -f2) && git checkout $GITHUB_SHA"
echo ""
echo "2. Run the failed command inside the same Docker container used by the CI:"
echo " docker run --rm -it --gpus all --pull=always --volume \$PWD:/repo --workdir /repo ${{ inputs.image }} ${{inputs.command}}"
echo ""
echo "For additional information, see:"
echo " - DevContainer Documentation: https://github.com/NVIDIA/cccl/blob/main/.devcontainer/README.md"
echo " - Continuous Integration (CI) Overview: https://github.com/NVIDIA/cccl/blob/main/ci-overview.md"
fi
198 changes: 161 additions & 37 deletions ci/build_common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ function usage {
# Copy the args into a temporary array, since we will modify them and
# the parent script may still need them.
args=("$@")
echo "Args: ${args[@]}"
while [ "${#args[@]}" -ne 0 ]; do
case "${args[0]}" in
-v | --verbose) VERBOSE=1; args=("${args[@]:1}");;
Expand Down Expand Up @@ -90,7 +89,6 @@ export CTEST_PARALLEL_LEVEL="1"
export CXX="${HOST_COMPILER}"
export CUDACXX="${CUDA_COMPILER}"
export CUDAHOSTCXX="${HOST_COMPILER}"

export CXX_STANDARD

# Print "ARG=${ARG}" for all args.
Expand All @@ -107,67 +105,193 @@ function print_var_values() {
done
}

echo "========================================"
echo "pwd=$(pwd)"
print_var_values \
BUILD_DIR \
CXX_STANDARD \
CXX \
CUDACXX \
CUDAHOSTCXX \
NVCC_VERSION \
CMAKE_BUILD_PARALLEL_LEVEL \
CTEST_PARALLEL_LEVEL \
CCCL_BUILD_INFIX \
GLOBAL_CMAKE_OPTIONS
echo "========================================"
echo
echo "========================================"
echo "Current commit is:"
git log -1 || echo "Not a repository"
echo "========================================"
echo
# begin_group: Start a named section of log output, possibly with color.
# Usage: begin_group "Group Name" [Color]
# Group Name: A string specifying the name of the group.
# Color (optional): ANSI color code to set text color. Default is blue (1;34).
function begin_group() {
# See options for colors here: https://gist.github.com/JBlond/2fea43a3049b38287e5e9cefc87b2124
local blue="34"
local name="${1:-}"
local color="${2:-$blue}"

if [ -n "${GITHUB_ACTIONS:-}" ]; then
echo -e "::group::\e[${color}m${name}\e[0m"
else
echo -e "\e[${color}m================== ${name} ======================\e[0m"
fi
}

# end_group: End a named section of log output and print status based on exit status.
# Usage: end_group "Group Name" [Exit Status]
# Group Name: A string specifying the name of the group.
# Exit Status (optional): The exit status of the command run within the group. Default is 0.
function end_group() {
local name="${1:-}"
local build_status="${2:-0}"
local duration="${3:-}"
local red="31"
local blue="34"

if [ -n "${GITHUB_ACTIONS:-}" ]; then
echo "::endgroup::"

if [ "$build_status" -ne 0 ]; then
echo -e "::error::\e[${red}m ${name} - Failed (⬆️ click above for full log ⬆️)\e[0m"
fi
else
if [ "$build_status" -ne 0 ]; then
echo -e "\e[${red}m================== End ${name} - Failed${duration:+ - Duration: ${duration}s} ==================\e[0m"
else
echo -e "\e[${blue}m================== End ${name} - Success${duration:+ - Duration: ${duration}s} ==================\n\e[0m"
fi
fi
}

declare -A command_durations

# Runs a command within a named group, handles the exit status, and prints appropriate messages based on the result.
# Usage: run_command "Group Name" command [arguments...]
function run_command() {
local group_name="${1:-}"
shift
local command=("$@")
local status

begin_group "$group_name"
set +e
local start_time=$(date +%s)
"${command[@]}"
status=$?
local end_time=$(date +%s)
set -e
local duration=$((end_time - start_time))
end_group "$group_name" $status $duration
command_durations["$group_name"]=$duration
return $status
}

function string_width() {
local str="$1"
echo "$str" | awk '{print length}'
}

function print_time_summary() {
local max_length=0
local group

# Find the longest group name for formatting
for group in "${!command_durations[@]}"; do
local group_length=$(echo "$group" | awk '{print length}')
if [ "$group_length" -gt "$max_length" ]; then
max_length=$group_length
fi
done

echo "Time Summary:"
for group in "${!command_durations[@]}"; do
printf "%-${max_length}s : %s seconds\n" "$group" "${command_durations[$group]}"
done

# Clear the array of timing info
declare -gA command_durations=()
}


print_environment_details() {
begin_group "⚙️ Environment Details"

echo "pwd=$(pwd)"

print_var_values \
BUILD_DIR \
CXX_STANDARD \
CXX \
CUDACXX \
CUDAHOSTCXX \
NVCC_VERSION \
CMAKE_BUILD_PARALLEL_LEVEL \
CTEST_PARALLEL_LEVEL \
CCCL_BUILD_INFIX \
GLOBAL_CMAKE_OPTIONS

echo "Current commit is:"
git log -1 || echo "Not a repository"

if command -v nvidia-smi &> /dev/null; then
nvidia-smi
else
echo "nvidia-smi not found"
fi

end_group "⚙️ Environment Details"
}


function configure_preset()
{
local BUILD_NAME=$1
local PRESET=$2
local CMAKE_OPTIONS=$3
local GROUP_NAME="🛠️ CMake Configure ${BUILD_NAME}"

pushd .. > /dev/null

cmake --preset=$PRESET --log-level=VERBOSE $GLOBAL_CMAKE_OPTIONS $CMAKE_OPTIONS
echo "$BUILD_NAME configure complete."

run_command "$GROUP_NAME" cmake --preset=$PRESET --log-level=VERBOSE $GLOBAL_CMAKE_OPTIONS $CMAKE_OPTIONS
status=$?
popd > /dev/null
return $status
}

function build_preset()
{
function build_preset() {
local BUILD_NAME=$1
local PRESET=$2
local green="1;32"
local red="1;31"
local GROUP_NAME="🏗️ Build ${BUILD_NAME}"

source "./sccache_stats.sh" "start"
pushd .. > /dev/null

cmake --build --preset=$PRESET -v
echo "$BUILD_NAME build complete."

pushd .. > /dev/null
run_command "$GROUP_NAME" cmake --build --preset=$PRESET -v
status=$?
popd > /dev/null
source "./sccache_stats.sh" "end"

minimal_sccache_stats=$(source "./sccache_stats.sh" "end")

# Only print detailed stats in actions workflow
if [ -n "${GITHUB_ACTIONS:-}" ]; then
begin_group "💲 sccache stats"
echo "${minimal_sccache_stats}"
sccache -s
end_group

begin_group "🥷 ninja build times"
echo "The "weighted" time is the elapsed time of each build step divided by the number
of tasks that were running in parallel. This makes it an excellent approximation
of how "important" a slow step was. A link that is entirely or mostly serialized
will have a weighted time that is the same or similar to its elapsed time. A
compile that runs in parallel with 999 other compiles will have a weighted time
that is tiny."
./ninja_summary.py -C ${BUILD_DIR}/${PRESET}
end_group
else
echo $minimal_sccache_stats
fi

return $status
}

function test_preset()
{
local BUILD_NAME=$1
local PRESET=$2
local GROUP_NAME="🚀 Test ${BUILD_NAME}"

pushd .. > /dev/null

ctest --preset=$PRESET
echo "$BUILD_NAME testing complete."

run_command "$GROUP_NAME" ctest --preset=$PRESET
status=$?
popd > /dev/null
return $status
}

function configure_and_build_preset()
Expand Down
4 changes: 4 additions & 0 deletions ci/build_cub.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

source "$(dirname "$0")/build_common.sh"

print_environment_details

# CUB benchmarks require at least CUDA nvcc 11.5 for int128
# Returns "true" if the first version is greater than or equal to the second
version_compare() {
Expand Down Expand Up @@ -35,3 +37,5 @@ CMAKE_OPTIONS="
"

configure_and_build_preset "CUB" "$PRESET" "$CMAKE_OPTIONS"

print_time_summary
4 changes: 4 additions & 0 deletions ci/build_libcudacxx.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@

source "$(dirname "$0")/build_common.sh"

print_environment_details

PRESET="libcudacxx-cpp${CXX_STANDARD}"
CMAKE_OPTIONS=""

configure_and_build_preset libcudacxx "$PRESET" "$CMAKE_OPTIONS"

print_time_summary
4 changes: 4 additions & 0 deletions ci/build_thrust.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

source "$(dirname "$0")/build_common.sh"

print_environment_details

PRESET="thrust-cpp$CXX_STANDARD"

CMAKE_OPTIONS=""

configure_and_build_preset "Thrust" "$PRESET" "$CMAKE_OPTIONS"

print_time_summary
Loading

0 comments on commit c4769d7

Please sign in to comment.