Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Should batslib_count_lines Handle Trailing Newlines? #11

Open
CauhxMilloy opened this issue Apr 28, 2024 · 1 comment · May be fixed by #12
Open

How Should batslib_count_lines Handle Trailing Newlines? #11

CauhxMilloy opened this issue Apr 28, 2024 · 1 comment · May be fixed by #12

Comments

@CauhxMilloy
Copy link

CauhxMilloy commented Apr 28, 2024

Currently, batslib_count_lines is written to try and ignore the last newline to be equivalent to it not being there.

This leads '' and '\n' as both being "0 line". Similarly, 'a' and 'a\n' will both be considered as "1 line". This is explicitly checked in tests.

Issue

However, when using run --keep-empty-lines, assertions will check trailing newlines. However, this leads batslib_is_single_line to still return 0, and the error messages will be printed by batslib_print_kv_single instead of batslib_print_kv_multi (which shows the line numbers, a helpful indicator for debugging).
Example (using bats-assert):

@test "trailing empty line" {
  run --keep-empty-lines printf 'a\n'
  assert_output "$(printf 'a')"
}

prints:

-- output differs --
expected : a
actual   : a

--

The new line is there, but it's hard to really notice. More egregiously, a multi-line output will be very misleading.
Example:

@test "trailing empty line on multiline" {
  run --keep-empty-lines printf 'a\nb\n'
  assert_output "$(printf 'a\nb')"
}

prints:

-- output differs --
expected (2 lines):
a
b
actual (2 lines):
a
b
--  

This both hides the extra new line and states that the line counts are the same.

Discussion

I'm writing out this GH Issue (and not just sending some PR(s)) because there's a few ways that this could possibly be addressed. I wanted to chat this over and see which way you'd think would be best. I'm happy to put together a PR for wherever we land.

Here's some options that I've considered, feel free to suggest more:

1. Update the impl for batslib_count_lines.

This could be done by changing

batslib_count_lines() {
  local -i n_lines=0
  local line
  while IFS='' read -r line || [[ -n $line ]]; do
    (( ++n_lines ))
  done < <(printf '%s' "$1")
  echo "$n_lines"
}

to

batslib_count_lines() {
  local -r input="$1"
  local -i n_lines=0
  local line
  if [ -n "$input" ]; then
    while IFS='' read -r line; do
      (( ++n_lines ))
    done < <(printf '%s' "$input")
    # Increment for the EOF (not captured by the loop above).
    (( ++n_lines ))
  fi
  echo "$n_lines"
}

There are some related extra changes that are necessary with this approach. batslib_prefix needs to be updated in a similar way (else batslib_print_kv_multi in batslib_print_kv_single_or_multi will print the incorrect number of lines).
Potential impl change from

batslib_prefix() {
  local -r prefix="${1:-  }"
  local line
  while IFS='' read -r line || [[ -n $line ]]; do
    printf '%s%s\n' "$prefix" "$line"
  done
}

to

batslib_prefix() {
  local -r prefix="${1:-  }"
  local line=''
  local -i lines_read=0
  while IFS='' read -r line; do
    printf '%s%s\n' "$prefix" "$line"
    (( ++lines_read ))
  done
  # Print for the line with EOF (not captured by the loop above).
  # Do not print if stdin was completely empty.
  if (( lines_read > 0 )) || [[ -n "$line" ]]; then
    printf '%s%s\n' "$prefix" "$line"
  fi
}

batslib_mark would likely also need to be updated in a similar way (these 3 functions all use while IFS='' read -r line || [[ -n $line ]]; do).

While I think this is pretty straightforward (and my personal preference), I understand that this may have backwards compatibility issues and may have implications for other functionality built upon this function (which may really care about ignoring trailing newlines).

I can see that bats-assert makes use of these 3 functions in refute_line. The suggested changes to batslib_prefix and batslib_mark wouldn't really change anything. Though the indirect change to batslib_is_single_line (via the change to batslib_count_lines) might lead to some more calls being made to batslib_print_kv_multi instead of batslib_print_kv_single. I'd suspect that wouldn't affect most cases, as the trailing newline is removed unless explicitly forced to be kept (which is likely desirable).

2. Add --keep-empty-lines (or similar) to batslib_count_lines.

This option keeps the old functionality and allows the caller to change into tracking all newlines. This could either be done as opt-in with --keep-empty-lines or --include-trailing-newline; or it could be opt-out, where the default is to include it and --ignore-trailing-newline is added.

This option (specifically the opt-in version) is more backwards compatible with existing logic. However, it does require adding/piping the flag through various functions, which can be a bit verbose.
Concerns about differentiating --keep... from intended input can be worked-around via there should always be an even number of positional arguments to these specific functions.

3. Create alternative batslib_count_lines function.

In this option, a new function (e.g. batslib_count_all_lines) can be added to use the new logic.
This is very backwards compatible, as it does not touch the existing batslib_count_lines function. However, it would require doing the same for other functions like batslib_is_single_line, batslib_print_kv_multi, batslib_print_kv_single_or_multi, etc.


Those are the suggestions that I have thus far. As I mentioned, I think option 1 is the best option. Let me know what you think.

Thanks!

P.S. There's a psudo-related TODO(ztombol) on batslib_count_lines about fixing inconsistencies vs ${#lines[@]}. This isn't really the same issue, but it's somewhat related. Thought I'd mention it.

@CauhxMilloy
Copy link
Author

CauhxMilloy commented May 4, 2024

As another alternative option:

4. Auto-detect which logic to use.

batslib_count_lines could potentially auto-detect whether to keep empty lines or not based on the input string. If the input variable has trailing newline(s), then they must have been manually kept (via something like --keep-empty-lines).
This could be checked by something like [[ "${input: -1}" = $'\n' ]].
If there is a newline, then use the logic given in Option 1. If there is no newline, use the existing logic.
I think this might fix the issue and be backwards compatible.

While this option offers a more seamless approach for batslib_count_lines, there still remains the issue of batslib_prefix (which affects batslib_count_lines in batslib_print_kv_single_or_multi). To address this issue, either Option 1 (always print trailing new line if non-empty string) or Option 2 (add a --keep-empty-lines option) would be a fine choice to update batslib_prefix. Option 2 is likely better for this scenario, as batslib_print_kv_single_or_multi can optionally pass it for each value based on the same auto-detect logic ([[ "${pairs[$i]: -1}" = $'\n' ]]).
Again, batslib_mark should be updated in the same way as batslib_prefix.

CauhxMilloy added a commit to CauhxMilloy/bats-assert that referenced this issue May 8, 2024
Adding support for "golden files". Golden files are (often text) files that are checked into a repo and are used in testing for asserting strings against their content.
They have a number of useful properties that make them a great alternative to single (possibly repetitive) assertions in files.
Some benefits include:
  * WYSIWYG plaintext output assertions (separating asserted output from test case logic).
  * Test failure output that is searchable (asserted output is checked into repo as files).
  * Clear file diffs of test assertions during merge / code review.
  * Terse assertions in test cases (single assert instead of many verbose `assert_line` and `refute_line` for every line of output).
  * Reusable golden files (declared once, used for many test cases).
  * Clear intention (same exact expected output) when asserted against same goldens in multiple test cases.
  * Can be clearly diff'd across multiple lines in failure message(s).
  * Easily updated.

This PR adds support to bats for golden files with the `assert_equals_golden` and `assert_output_equals_golden` test helper functions.
`assert_equals_golden` takes 2 arguments, <actual> and <golden file path>, to assert <actual> is the same as the contents in <golden file path>.
`assert_output_equals_golden` takes 1 argument, <golden file path>, to assert `output` is the same as the contents in <golden file path>.

These functions support a number of features, including `--regexp` and automatic updating of golden files (via setting the `BATS_ASSERT_UPDATE_GOLDENS_ON_FAILURE` environment variable). Golden file assertions will properly account for empty lines or trailing newlines (e.g. when asserting against output obtained by `run --keep-empty-lines`).

This PR adds `assert_equals_golden.bats` which contains test cases for the newly added functionality.

Note that the output of these functions (that is asserted on in tests) has incorrect "lines" count.
This is due to an incompatibility between `run --keep-empty-lines` and how various bats-support helper functions work.
See bats-core/bats-support#11.
CauhxMilloy added a commit to CauhxMilloy/bats-support that referenced this issue May 8, 2024
* Updating `batslib_count_lines` to automatically detect trailing newlines and count them if present.
  * This works on the concept that a string has a trailing newline if it was intended -- e.g. via `run --keep-empty-lines` or similar.
* Updating `batslib_prefix` and `batslib_mark` to accept `--keep-empty-lines` option to properly print empty lines and trailing newlines.
* Updating `batslib_print_kv_single_or_multi` calls to `batslib_prefix`.
  * Will automatically detect trailing new lines (on a per-value basis).
  * If trailing newline is present, `--keep-empty-lines` will be passed to `batslib_prefix`.

This implements option 4 for `batslib_count_lines` and `batslib_print_kv_single_or_multi`, and option 2 for `batslib_prefix` and `batslib_mark`, as discussed in bats-core#11.

Fixes bats-core#11.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant