Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure proper log line/filename in warning/errors #3038

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copperchin
Copy link
Contributor

@copperchin copperchin commented Sep 5, 2022

Fix #3037

Pull Request Checklist

Resolves: Warning and Error logs messages show wrong source #3037

  • Ensured tests pass and (if applicable) updated functional test output
  • Conformed to code style guidelines by running appropriate linting tools
  • Added tests for changed code
  • Updated documentation for changed code

pelican/log.py Outdated Show resolved Hide resolved
pelican/log.py Outdated Show resolved Hide resolved
pelican/tests/test_log.py Outdated Show resolved Hide resolved
pelican/tests/test_log.py Outdated Show resolved Hide resolved
pelican/tests/test_log.py Outdated Show resolved Hide resolved
@copperchin
Copy link
Contributor Author

Looking at this more, I'm seeing that the tests fail due to xdist: there's now race conditions on the global root logger...

I can get around this by testing log.init() on other (ie. non-root) loggers; but for that to work, the module-level code changing the root logger class would need to be implemented in log.init(), affecting only the named logger (rather than assuming root).


This is all fine and good, but at that point I feel like I might as well tackle an issue that these code smells point to:


Frankly, I was never happy to see this feature implemented via custom logger classes -- I'd prefer to instead:

  • use a custom logging.Filter to manage the quit-on-fatal behavior
  • drop the LimitLogger class, converting the LimitLogger methods to module-level add_limit/remove_limit definitions.

This of course changes the interface...which I'm hesitant to just throw out there. Although, as far as I can tell, nothing in this codebase uses them outside of log.py and a few tests modules; I can't speak for any of the plugins though.

Any thoughts?

@avaris
Copy link
Member

avaris commented Sep 8, 2022

This of course changes the interface...which I'm hesitant to just throw out there. Although, as far as I can tell, nothing in this codebase uses them outside of log.py and a few tests modules; I can't speak for any of the plugins though.

That's the point :). log.py sets things up (with global state or root logger) so that no other module needs to do any setup. They can just do:

import logging
logger = logging.getLogger(__name__)

and things will work.

I would agree that some things could be cleaned away. pelican essentially relies on root logger to do the heavy lifting. handler are defined there. Although LimitLogger/FatalLogger set up so that individual loggers have certain filters, but they don't have to. That can be pushed to the root logger as well. So I would be on board with cutting LimitLogger and FatalLogger, turning them into logging.Filters and installing them to the root logger in init.

init would not be really testable, since that would alter global state, but hey... python logging is practically a giant global state anyway, so hard to get around that. But it would be possible to test filters independently, since they could be installed to sub-loggers.

@justinmayer
Copy link
Member

@copperchin: Any chance you could take another look at your PR here and decide how you'd like to carry it forward?

@copperchin
Copy link
Contributor Author

Yes, I'll take another look at this.

I don't actually remember why this stalled but, reviewing the notes here, it looks like this might have been blocked by a deeper issue. If that's the case I'll open a separate issue for that and link it here.

@copperchin
Copy link
Contributor Author

copperchin commented Aug 7, 2023

So I think this stalled because I became mired in a lot of unrelated test failures.

At first, I was getting a very large number of test failures due to timezones. Turns out this is an issue in the main branch even without any additional changes. Installing tzdata resolves that. I was going to suggest that package be added to the requirements/test.pip file, but I didn't see anyone else reporting that issue so I'm a bit bewildered as to why.

The other things I'm working through is a number of remaining test failures in what feels like unrelated code. These failures arise because rather than testing the direct result of an operation, they count the number of warnings logged.

Now, in theory, the changes I'm putting forward shouldn't change the actual logging behavior, so I'm looking into why these other tests no longer pass, but I am a bit concerned about leaving tests to rely on specific logging behavior to pass/fail.

Anyway. I'll keep kicking at this for now, but wanted to leave an update.

@avaris
Copy link
Member

avaris commented Aug 14, 2023

@copperchin If you need any assistance or guidance, I'd be happy to lend a hand.

@copperchin
Copy link
Contributor Author

@avaris Thanks!

I think I had a breakthrough that resolved most of the issues when I last looked at it. I haven't had much time at my machine for the last week and a bit but I'm hoping to be able to spend a bit of time tonight on it; I don't think an updated PR is far off.

@copperchin
Copy link
Contributor Author

Just working on adding some tests, doing some cleanup, and adding docstrings to my changes.

I have a couple of questions regarding style preferences for pelican code:

  1. I don't see much use of type hints yet in the repo, but as the latest supported python version is now 3.7, are type hints something we are ok to start using?

  2. Also, I couldn't find whether there's a preferred style for docstrings (specifically re: describing parameters, return, etc.). I'm personally a fan of NumPy style, but I see a few modules using what appears to be a reStructuredText version of EpyDoc/EpiText.

@copperchin copperchin force-pushed the fix-3037 branch 2 times, most recently from e21bdc7 to fce597d Compare October 5, 2023 15:32
@avaris
Copy link
Member

avaris commented Oct 28, 2023

@copperchin sorry, I totally missed your questions

I have a couple of questions regarding style preferences for pelican code:

  1. I don't see much use of type hints yet in the repo, but as the latest supported python version is now 3.7, are type hints something we are ok to start using?

I have a love/hate relationship with type hints. If you start your project with type hints in mind, they are usually great and helpful. But adding type hints to existing and dated projects was most of the time frustrating (unless you are willing to refactor it to the point of almost rewriting). Unions all around, turning single line defs to many many lines and obfuscating code more than it helps with readability... So, in short it depends. If they are helpful in documenting the code I don't mind them. That being said, I wouldn't hold my breath for all of pelican typed and checked :).

  1. Also, I couldn't find whether there's a preferred style for docstrings (specifically re: describing parameters, return, etc.). I'm personally a fan of NumPy style, but I see a few modules using what appears to be a reStructuredText version of EpyDoc/EpiText.

I don't think we have a docstring style. As long as it is accurate I am fine with mostly anything :).

@copperchin
Copy link
Contributor Author

@avaris Yeah I see what you mean; I initially started adding type hints for new content but it felt so out of place that it seemed more distracting than helpful. I also balked at how long it made lines >_< haha.

I see the latest surge of merges (a great thing!) has left this PR with some conflicts -- I'll try to get those resolved this weekend.

@avaris avaris self-assigned this Oct 29, 2023
@avaris
Copy link
Member

avaris commented Oct 29, 2023

@copperchin is it possible to incorporate #3108 into this? Once this is merged, that PR effectively needs to be rewritten :). But if it will delay this one further, don't worry about it. I can deal with all the conflicts, refactoring after this is handled.

@copperchin
Copy link
Contributor Author

copperchin commented Oct 29, 2023

Funny you should bring that up; I came across #2893 yesterday and commented there -- missed the PR #3108 though. My suggestion was to keep the 2-tuple form and just use regex objects where regex is desired -- this would preserve backwards compatibility.

I'm happy to implement that change but I don't want to include it in the scope of this change; I'll work on it after this gets merged.

Overview
========
Motivated by issue getpelican#3037 (Ensure proper log line/filename in
warning/errors).

``pelican.log`` is no longer required to be early in the import order,
as the package no longer relies on a custom logging class.

Logging setup is now uncoupled so that format and filters can be applied
independently, which makes both setup and testing more flexible.

Behavior of commandline calls have not changed, except for some
modification to help texts.

``LOG_FATAL`` and ``LOG_ONCE_LEVEL`` are new configuration keys.

Summary of changes
==================

:pelican/__init__.py:

    * Parsing commandline:
        * Unify type-converstion from string to logging levels.
        * Tweak some help text in ``parse_arguments``.
    * Preserve ``log`` namespace throughout (for clarity).
    * FIX: ``pelican.listen`` verbosity: avoid assumption that the
      logger's level will be NOTSET.

:pelican/log.py:

    * FIX (getpelican#3037): warnings & errors now display the correct line numbers.
    * Split logging filters with focus on a specific aspect of behavior.
      This allows for more
        a) independence in testing and
        b) powerful logger configuration, with fewer side effects.
    * Isolate configuration *reading* from configuration *application*
      (see ``pelican/settings.py``).

:pelican/settings.py:

    * Remove ``pelican.log`` dependency.
    * Isolate configuration *reading* from configuration *application*;
      to remove dependency on a global state when modifying log filters.
    * Add validation for LOG_LIMIT values.

:pelican/tests/__init__.py:

    * Remove dependency on ``pelican.log`` module as test runners setup
      their own console handler anyway.
    * Only add ``NullHandler`` in absence of a configured logger.

:pelican/tests/support.py:

    * Remove ``LoggedTestCase`` class; tests requiring logging analysis
      use a contextmanager (``LogCountHandler.examine``) instead.

:pelican/tests/test_log.py:

    * Rewrite to reflect changes in pelican's logging module.
    * Increase code coverage.

:pelican/tests/test_contents.py:
:pelican/tests/test_pelican.py:
:pelican/tests/test_utils.py:

    * Import ``unittest`` directly, rather than from other modules.
    * Switch to using contextmanager (``LogCountHandler.examine``) for
      test-specific log inspection.
    * Remove dependency on a custom TestCase (used in logging analysis).
    * Remove code involved with temporarily disabling log filters --
      the filters in question are now only enabled in tests directly
      testing them (eg. in ``test_log.py``)

:pelican/tests/test_settings.py:

    * Add test for LOG_FILTER values from configs.
Copy link
Member

@avaris avaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copperchin This is from an initial pass. I need to play with it to see how it behaves, depending on that more might come :).

try:
return logging.getLevelNamesMapping()[level_name]
except AttributeError: # Python < 3.11
return logging._nameToLevel.copy()[level_name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although unintended, I'd rather rely on a semi-official method (getLevelName) than an internal interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with getLevelName is that it's intended to take an int and return a str, but I need the opposite conversion.

While its technically possible to muscle this behavior out of this logging.getLevelName

  1. the name is misleading and counter to the intended effect
  2. the documentation basically says this behavior was accidental, removed, and then added back because it was breaking old code. In short -- using it is relying on a hack.

The preferred way of doing the conversion (logging.getLevelNamesMapping) is used instead -- but as this isn't available in Python < 3.11 I'm using the alternative as shown above, when necessary.

The other reason I don't use getLevelName is that it can potentially return a string, and I need to guarantee an int.

I'd rather keep this as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is accidental (hence why I said "unintended") and maaybe considered a hack... But it is documented and is far more likely to remain as is in a minor patch than an undocumented internal dict. That's why I suggested :). I don't feel that strongly about it, thus I can live with it as is. Why do you .copy() it though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just imitating what the standard lib does:

# logging.py
def getLevelNamesMapping():
    return _nameToLevel.copy()

I'm guessing its there to future proof against an accidental pass by reference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. Yeah, it's probably copying so that any modification on the returned dict doesn't affect the original version. It doesn't seem necessary here.

"""
try:
logger.addFilter(flt)
yield flt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually use it anywhere but seems like yielding logger would be the more useful case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger doesn't need to be yielded: the caller has already provided it, and it isn't even always referenced within the context.

The filter, however, is intended to be temporary; yielding the filter allows the caller to generate the filter on entering the context and have it garbage collected on exiting the context. It essentially allows this:

from pelican import log
import logging


with log.temp_filter(logging.getLogger("test_module", MyCustomFilter()) as tmp_filter:
   test_module.do_stuff()
 

# outer scope needs not keep reference to filter; it can be garbage collected

This is an interface; while not used called within this module, it is actually used in several places in testing, and can be useful in scenarios where a filter is needed only for a subsection of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you yield has nothing to do with references. You don't need a reference to either of them outside with block regardless of what is yielded. Context manager is already doing that during with. You can yield None or bare yield if you wanted. yield only defines the value that will be assigned to the name after as.

What I'm saying is, you're far more likely to do something with the logger inside the with block than a filter or a handler. i.e. your example could become:

with log.temp_filter(logging.getLogger("test_module"), MyCustomFilter()) as logger:
    logger.warning(...)  # 
    test_module.do_stuff()

"""
try:
logger.addHandler(hnd)
yield hnd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, yield logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above ;)
The logger is already provided, it's more useful to yield the temporary handler to circumvent the need to keep reference to it outside the context.


"""
super().__init__()
self.limit_threshold = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies it can be None but setting it None will make it error in the comparisons below. e.g.:

if counter > self.limit_threshold:

I assume None disables it, but then the filter method needs to include that logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This needs to be changed, and I should add the None case to my tests.

Comment on lines +203 to +204
def ignore(self, level, message):
self._ignored.add((level, message))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this might have been an artifact of one of the refactors I did in this branch earlier...I think I decided to leave that in as an interface.

Looking at this again now, I'd like to keep it -- I can use it to condense the implementation of OnceFilter.filter() which is essentially duplicating the same work as part of its process.

self.assertEqual(log.severity_from_name('trace'), 5)

with self.assertRaises(KeyError):
self.assertEqual(log.severity_from_name("wrong"), logging.NOTSET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEqual is misleading. It looks like this is expected to return NOTSET, but it's expected to raise KeyError. assertEqual will not run anyway. This is enough:

with self.assertRaises(KeyError):
    log.severity_from_name("wrong")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree, thanks for spotting that!


def test_levels(self):
logger = self.logger
with log.temp_handler(self.logger, log.AbortHandler(logging.ERROR)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe use another test for logging.WARNING

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can add another test to confirm it works for not just logging.ERROR.

Comment on lines +74 to +75
def test_alt_logger(self):
self.assertEqual(log.set_filters(self.logger), self.logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is tested here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confirming that when a logger is specified in set_filters the returned logger matches the specified one. If the logger is not specified, the PKG_LOGGER is expected to return. I can add a comment here to clarify, and better still, a check that the PKG_LOGGER is returned when no logger is provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, comments would be good. While writing the comment it took me a minute to (maybe) figure out the intent and that was a guess at best :).

count_msgs(len(LOG_LVLS * 2), msg)


class classTestOnceFilter(_LoggingTestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in name

with LogCountHandler.examine(logger) as count_msgs:
for level in LOG_LVLS:
logger.log(level, msg)
count_msgs(5, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 -> len(LOG_LVLS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah! I missed one! Thanks! ':)

@copperchin
Copy link
Contributor Author

Thanks for the in-depth review @avaris ! This is a pretty big diff, so I really appreciate the effort.

I'm hoping to be able to get revisions up here in the next fews days at most, to keep the momentum :)

@avaris
Copy link
Member

avaris commented Oct 31, 2023

I also have a sneaking suspicion that filters will not be active for anything outside pelican.* namespace: e.g. legacy plugins. But I have not tested it. It would probably be resolved if they are also installed to the root logger.

@MinchinWeb
Copy link
Contributor

The topline issue can be solved with #3257, which changes only 2 lines. (I didn't realize this was underway, and in your defense, my fix relies on a Python 3.8 feature, which may not have been an acceptable solution a year ago when you started.) I realize this has grown in scope since...

At first, I was getting a very large number of test failures due to timezones. Turns out this is an issue in the main branch even without any additional changes. Installing tzdata resolves that. I was going to suggest that package be added to the requirements/test.pip file, but I didn't see anyone else reporting that issue so I'm a bit bewildered as to why.

Hopefully this was fixed by #3246. I suspect it might be a Windows-only issue, which would lower the number of people reporting it.

@egberts
Copy link
Contributor

egberts commented Jul 24, 2024

That's the point :). log.py sets things up (with global state or root logger) so that no other module needs to do any setup. They can just do:

import logging
logger = logging.getLogger(__name__)

and things will work.

Ooooh, this is nice. Can this also be applied toward the unittest/pytest unit test script files in tests directory, @avaris ?

One example is to have a test case that actually verifies not only the STDOUT but the logger as well in ensuring that the line number and column offset of a syntax error really does appear. Right now, I've marked it as TODO in tests/test_settings_syntax.py of #3381.

@egberts
Copy link
Contributor

egberts commented Jul 25, 2024

pytest registers custom handlers to the root logger to be able to capture log records emitted in your code, so you can test whether your program logging behaviour is correct.

So conftest.py a glue to hold common routines (whether it is scoped as "session"/"module" or not) will introduced a new rootHandler for logging, which smashes any prior existing unittests' rootHandler whose using assert(s) related to log output (as I discovered in #3375, #3377, #3379 as breaking test_log.py).

@egberts
Copy link
Contributor

egberts commented Jul 25, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Warning and Error logs messages show wrong source
5 participants