Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ospath/osstring #491

Open
amigalemming opened this issue Aug 16, 2024 · 8 comments
Open

Support for ospath/osstring #491

amigalemming opened this issue Aug 16, 2024 · 8 comments

Comments

@amigalemming
Copy link

Can we have support for OsPath/OsString?

The idea of those types is to avoid converting from system (filepath) strings to String and back. But currently this is what happens if we want to construct an argument parser that returns OsPath.

@tbidne
Copy link
Contributor

tbidne commented Sep 5, 2024

I, too, would like this. It's easy enough to provide the conversion on top of the string reader:

osPath :: ReadM OsPath
osPath = OA.str >>= \fp -> case OsPath.encodeUtf fp of
  Right txt -> pure txt
  Left ex -> fail $ "failed: " ++ displayException ex

But I imagine you want to skip the conversion entirely. Seems possible to me, but there are some tricky decisions.

1. Implement new getArgs'

This would require replacing optparse's usage of getArgs :: IO [String] with getArgs' :: IO [OsString]. Fortunately, the function itself is pretty straightforward. The unix library already does this for PosixString, which differs from base's getArgs only in the final conversion:

getArgs :: IO [PosixString]
getArgs =
  alloca $ \ p_argc ->
  alloca $ \ p_argv -> do
   getProgArgv p_argc p_argv
   p    <- fromIntegral <$> peek p_argc
   argv <- peek p_argv
   -- base uses 'peekCString :: TextEncoding -> CString -> IO String'
   peekArray (p - 1) (advancePtr argv 1) >>= mapM (fmap PS . B.packCString)

So, assuming we can just do the same thing for WindowsString, we'd have:

getArgs' :: IO [OsString]
getArgs' =
  alloca $ \ p_argc ->
  alloca $ \ p_argv -> do
   getProgArgv p_argc p_argv
   p    <- fromIntegral <$> peek p_argc
   argv <- peek p_argv
#if WINDOWS
   peekArray (p - 1) (advancePtr argv 1) >>= mapM (fmap (OsString . WS) . B.packCString)
#else
   peekArray (p - 1) (advancePtr argv 1) >>= mapM (fmap (OsString . PS) . B.packCString)
#endif

Armed with this, you have two choices:

  1. Replace ReadM = ReaderT String (Except ParseError) with ReadM = ReaderT OsString (Except ParseError).

    This is a breaking change because ReadM is fully exposed. It's unfortunate, but most of optparse's API doesn't expose these internals, so maybe it's palatable.

  2. Provide a second ReadOsStringM = ReaderT OsString (Except ParseError).

    At first glance this seems possible, as there aren't too many places users use ReadM directly. You'd have to copy a few API functions like:

    option :: ReadOsStringM a -> Mod OptionFields a -> Parser a
    argument :: ReadOsStringM a -> Mod ArgumentFields a -> Parser a

    but at least there aren't too many. The real annoyance is that the internals use ReadM quite a bit, so you'd probably have to write a sum type wrapper and modify types like:

    data CReader a = CReader
      { crCompleter :: Completer
      , crReader :: ReadM a }

Personally, I think 1 is the better solution.

2. Where does getArgs' live?

Assuming that's right, the next question is where getArgs :: IO [OsString] would live.

Background

  • OsString/OsPath first appeared in filepath-1.4.100.0 (GHC 9.6) and were later moved to os-string-1.0.0 in filepath-1.5.0.0 (GHC 9.10).

  • Currently, optparse transitively depends on filepath via process. Technically this isn't guaranteed, as the process dependency is hidden behind a flag. But the flag defaults to True, so the only people who would be affected by optparse adding a filepath dependency would be those who use optparse with the process flag explicitly disabled, and somehow do not have filepath somewhere in their dep tree. I would not be surprised if no such user exists, so I will assume that adding filepath is fine.

Dependency graph

deps

filepath

The best place for getArgs' to live -- from optparse's perspective -- would be filepath. Then optparse could provide the readers when the filepath version is high enough, using cpp.

os-string

On the other hand, os-string might be a more logical place for getArgs'. Unfortunately, this complicates the dependency story.

Right now optparse puts no direct restrictions on filepath. Requiring os-string would require the very new filepath-1.5.0.0, presumably a no-go. Really, what we want to express is:

-- cabal file
if (filepath >= 1.5.0.0)
  build-depends: os-string

But there's no way to do this sort of thing with cabal. The filepath and directory libraries handle this compat issue by providing an os-string flag, and we could do something similar here:

if flag(os-string)
  build-depends: os-string

But it would have to be off by default, at least for a long time. And providing different APIs based on flags would be controversial, even if the impact is mitigated since most users will be exes.

Something else?

If the maintainers don't want to add getArgs' to filepath or os-string, it could be vendored here since it only requires bytestring. This wouldn't solve the dependency story, since you need the internal OsString types from the os-string lib. But it's an option.

3. How to handle existing String arguments?

Suppose we go with one of the options above, and implement ReadM = ReaderT OsString (Except ParseError). How would you implement str? Currently it is:

str :: IsString s => ReadM s
str = fromString <$> readerAsk

It's trivial since ReadM's argument is String. But now it's OsString, so we need f :: OsString -> String. Unfortunately, there is no total function of that type (ignoring silly things like show or lenient decodes). The "right" thing to do is probably something like:

str :: IsString s => ReadM s
str = do
  s <- readerAsk
  case OsPath.decodeUtf s of
    Left ex -> fail $ displayException ex
    Right t -> pure $ fromString t

In spite of appearances, this is arguably not a (total to partial) downgrade since the original str was never total, as String conversions are inherently shady. But it still doesn't feel great.


I'm not sure what to do here. The conversion function I posted at the top is useful and easy to add in a backwards-compatible way, so that would be the low-effort "solution", even if it is suboptimal in that we're performing an unnecessary conversion. But it would be really nice to avoid the conversions, if there's an ergonomic way to do it.

I'd love to hear what more experienced people think.

CC @hasufell.

@hasufell
Copy link

hasufell commented Sep 5, 2024

If the maintainers don't want to add getArgs' to filepath or os-string

It could live in file-io. That's kind of the place where base-like IO functionality with OsString lives. Or we could create another package. filepath and os-string are both inappropriate, imo.

Unfortunately, there is no total function of that type (ignoring silly things like show or lenient decodes).

You want to keep the current behavior of base, which is decodeLE. That can throw an IO exception (same as base). That's as total as the current behavior.

But it would be really nice to avoid the conversions, if there's an ergonomic way to do it.

The only sensible way forward is to convert everything to OsString and break the API. Going from String to OsString is pointless and defeats the main purpose of OsString: not assuming encodings!

@hasufell
Copy link

hasufell commented Sep 5, 2024

For windows getArgs, I added this to Win32: haskell/win32#221

@tbidne
Copy link
Contributor

tbidne commented Sep 5, 2024

Edit: Sorry, missed the first message. I think a file-io dependency is sensible. And decodeLE seems spot-on. These comments renew my resolve that this is quite feasible, though it needs to be behind a flag, to keep optparse working with older versions of filepath.


Ah, nice. Thanks for pointing that out. So we can obtain getArgs' from Win32 and unix directly (dependencies of process -> directory).

It looks like the required optparse changes would be pretty significant. It wouldn't change the API much, I think, but we'd probably still have to depend on os-string since there is some low level manipulation of Char. Example: parseWord

parseWord :: String -> Maybe OptWord
parseWord ('-' : '-' : w) = Just $ let
  (opt, arg) = case span (/= '=') w of
    (_, "") -> (w, Nothing)
    (w', _ : rest) -> (w', Just rest)
  in OptWord (OptLong opt) arg
parseWord ('-' : w) = case w of
  [] -> Nothing
  (a : rest) -> Just $ let
    arg = rest <$ guard (not (null rest))
    in OptWord (OptShort a) arg
parseWord _ = Nothing

I presume you'd replicate this with unpack and span, with the latter requiring os-string.

With that in mind, I think you'd probably want to hide this functionality behind a flag, off by default. Normal optparse does what it always does. If os-string is enabled, then you use the new getArgs' and the types change accordingly. This requires using a very new version of filepath, but of course that's what you wanted by opting in.

@hasufell
Copy link

hasufell commented Sep 6, 2024

though it needs to be behind a flag, to keep optparse working with older versions of filepath.

What do you mean by this exactly?

Exposed API must not depend on cabal flags.

@tbidne
Copy link
Contributor

tbidne commented Sep 6, 2024

When I made that comment, I thought (hoped) that nothing related to the "intended API" would be affected. Sure, ReadM is affected -- and unfortunately it's fully exposed -- but that just seems to be an artifact of the implementation (i.e. I am skeptical that there are users who need ReadM's internal details). The fact that most of optparse's users are executables (thus unaffected by flag non-composability) is a bonus.

Alas, even if that hypothetical plan was palatable (I did say it would be controversial), the details leak in quite a few places, including some exported by the top-level Options.Applicative (e.g. ParseError would likely want OsString). So this idea is DOA.

Unfortunately, I think that means we're stuck, at least for some time. Switching to OsString would (transitively) require filepath >= 1.5, which was released at the end of 2023. Personally, I'm fine with this requirement, but for a library like optparse that is (justifiably) conservative with adding new dependencies, requiring such a recent release feels like a no-go. Especially for a boot library.

I believe the latest requirement is prettyprinter >=1.7 (2020), so the gap isn't extreme. But I still wouldn't expect optparse to pick up such a recent dependency. I'm happy to be wrong on this point.

@hasufell
Copy link

hasufell commented Sep 6, 2024

What exactly is the issue with it? filepath is reinstallable, unless you depend on the ghc package (unfortunately that's also the case when using doctests). So even with 8.10.7, it's usable.

@tbidne
Copy link
Contributor

tbidne commented Sep 6, 2024

Yes, I was thinking primarily of the unfortunate scenario where one depends on ghc.

Also, as a nix user, I am keenly aware that it is very difficult to override boot libraries. Effectively it means you wait until you can move to the ghc where the library is the default, meaning 9.10. Now, I don't expect sympathy for nix issues from the (majority of the) rest of the haskell universe. I'll just deal with it. But it's a pain point I'm aware of, and I don't want to presume that there are not others.

Like I said, I'd be happy with the change. But I understand if there is reticence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants