automatic whitespace handling #49

ghazel · 2011-07-24T00:57:34Z

Parslet grammers are littered with whitespace checks, making them harder to read. Leaving them out fails to parse valid things properly. Take the javascript parser as an example: https://github.com/matthewd/capuchin/blob/d47f4b19eb888b6a4fc5428d3d1fdfcdb551b183/lib/capuchin/parser.rb

There is sp? everywhere. There are very few cases where whitespace is not allowed, and decorating those cases with a different operator to join the atoms seems sufficient.

So, this is a feature request for some sort of functionality like this. pyPEG has a skipws option which seems to work ok.

The text was updated successfully, but these errors were encountered:

kschiess · 2011-07-26T06:13:40Z

I can see why you would want this, but am not convinced if we really need it. After all, we can process parslet atoms as if they were data, so appending whitespace to all and everything will not be hard. This really belongs to the mailing list - and if you provide a patch/ an implementation idea, we'll consider it more thoroughly.

mikeyhew · 2016-11-15T04:45:30Z

I have some code that implements this: master...mikeyhew:ignore-whitespace. It changes the >> operator so that it consumes 0 or more spaces in between parslets, and adds << for when you don't want to allow spaces. I'm been using it in this project and it has worked well so far, making it more pleasant to write the grammar.

@kschiess It would be interesting to hear what you think about the general idea, as well as whether this would break anything. (I think it caused an error with the infix_expression helper already, but didn't spend much time debugging.)

kschiess · 2016-11-24T13:41:21Z

I'll take a look soon.

kschiess · 2017-01-16T08:27:22Z

I like the idea that this is an option you give to the whole parse process. Perhaps we could (as an implementation) create a source that skips whitespace? I do realize this is a problem for a lot of people.

aaronlippold · 2017-09-09T14:53:55Z

Hi, any progress on this? This would be a valuable addition. Thanks.

kschiess · 2017-11-19T15:29:40Z

We would welcome a PR that solves this, however we won't be able to dedicate our time to this.

mikeyhew · 2017-11-19T17:54:54Z

@kschiess the problem with a global option is that it restricts what you can parse. Even if your grammar is mostly whitespace-insensitive, there are still times when you need >> without whitespace in between. For example, parsing identifiers:

rule(:ident) { match['a-zA-Z'] >> match['a-zA-Z0-9'] }
# how would you do this if the `Source` ignores whitespace?

kschiess · 2018-02-13T08:46:27Z

I'll merge any kind of solution that doesn't lock people into whitespace-agnostic parsers. The default should be not to ignore whitespace. But I think we can make it easy to have a choice.

kschiess closed this as completed Jul 26, 2011

kschiess reopened this Nov 24, 2016

kschiess added the discussion label Jan 11, 2019

kschiess self-assigned this Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic whitespace handling #49

automatic whitespace handling #49

ghazel commented Jul 24, 2011

kschiess commented Jul 26, 2011

mikeyhew commented Nov 15, 2016

kschiess commented Nov 24, 2016

kschiess commented Jan 16, 2017

aaronlippold commented Sep 9, 2017

kschiess commented Nov 19, 2017

mikeyhew commented Nov 19, 2017

kschiess commented Feb 13, 2018

automatic whitespace handling #49

automatic whitespace handling #49

Comments

ghazel commented Jul 24, 2011

kschiess commented Jul 26, 2011

mikeyhew commented Nov 15, 2016

kschiess commented Nov 24, 2016

kschiess commented Jan 16, 2017

aaronlippold commented Sep 9, 2017

kschiess commented Nov 19, 2017

mikeyhew commented Nov 19, 2017

kschiess commented Feb 13, 2018