Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: extending the typescript tree-sitter grammar to enable more advanced and contextual parsing #10

Open
1 task done
Tracked by #3625
AlbertMarashi opened this issue Sep 13, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@AlbertMarashi
Copy link

AlbertMarashi commented Sep 13, 2024

Did you check the tree-sitter docs?

Is your feature request related to a problem? Please describe.

Currently, we have a few places where language injections will fail due to the nature of how tree-sitter works.

As far as I am aware, it does not seem possible to have tree-sitter treat a specific injection as a subnode within a sublanguage.

Let's take the following example:

<script generics="T extends string">

If we treat the attribute_value inside of the generics attribute as typescript, this will error in the typescript parser, since T extends string is not a valid top-level typescript statement.

image

This occurs in various other locations.

For example, we will want to treat the function in here as a function definition

{#snippet foo({bar})}

or likewise here as a function call expression

{@render foo({ bar: "baz" })}

This would bring the svelte tree-sitter more inline with how the tsx treesitter extends the typescript language definition to become a more precise and accurate parser that removes some of the complications around expression parsing.

Describe the solution you'd like

Per @savetheclocktower's suggestions (zed-industries/zed#17529 (comment)), there are a few ways to implement support this.

  1. Formally extend or reference the JS and TS grammars within tree-sitter-svelte (which would vastly increase the size of the parser, I imagine, but would allow for very granular usage of arbitrary rules)
  2. Create one-off grammars based on the JS and TS parsers (respectively) that include just the subset of each parser needed to parse solely inside of a function arguments context (less coupled than the option above, but no less wasteful)
  3. Inject the existing JS and TS grammars into these ranges and find a way to re-scope the ranges selectively (“if you're in a svelte snippet injection, scope this as function.definition even though it's a function call,” etc.; hacky and would involve lots of compromises)

I'm not so sure whether something like 3. is possible with treesitter unfortunately, therefore I think going with 1. will be best.

Describe alternatives you've considered

No response

Additional context

@savetheclocktower
Copy link
Contributor

I'll keep typing my way through this until I figure out a better idea, because I don't even like any of the options I proposed. Option 1 has the best chance of shipping soon, but it couples things that shouldn't be coupled.

Things that would ideally be highlighted by the editor's separate TS/JS Tree-sitter parsers:

  • SCRIPT tags

Things that would be highlighted by coupled, special-purpose TS/JS rules pulled into tree-sitter-svelte:

  • <script generic="T extends string[]">
  • {#snippet foo({ bar, baz })}
  • Possibly the condition blocks of #if, :else if, etc.

Things that are in between, but are almost certainly better off being highlighted by the editor’s separate TS/JS parsers:

  • Expression tags
  • Arbitrary JavaScript blocks in @render, @each, @await

The painful part here is:

  • A user will expect all the JS/TS highlights to behave similarly to one another…
  • …but we have to pick particular versions of tree-sitter-javascript and tree-sitter-typescript that may not agree with what the editor has installed…
  • …and whoever writes the Svelte integration for a given editor is in charge of making sure that the highlighting queries for (JavaScript|TypeScript)-fragments-in-Svelte agree as much as possible with that editor's highlighting queries for JavaScript and TypeScript proper

@savetheclocktower
Copy link
Contributor

Let's take the snippet example:

{#snippet foo({ bar, baz })}
{/snippet}

How much of this should be owned by a JavaScript/TypeScript injection?

One option is foo({ bar, baz }). In isolation, this gets interpreted as a function call with two shorthand properties. But that's wrong; what we want is to treat it like the foo({ bar, baz }) in…

function foo({ bar, baz }) { /* ... */ }

…in almost every way, since it's equivalent to a function definition. We want foo to be recognized as a major document symbol and bar and baz to be treated like function parameter definitions. But there's no good way to do this with injections — unless you adopt option 2 from the ticket description, but that's still a major headache.

Another option is what this grammar currently does: it parses as much as can be inferred by tree-sitter-svelte without it having to become a parser of arbitrary JS. It parses the foo, "(", and ")" as their own nodes, and leaves only the { bar, baz } as svelte_raw_text to be parsed by the injection. (That example would be trivial to parse on its own, but it's better to leave it to a JS parser, since each parameter can have an arbitrary default value.)

This results in worse injection parsing. That's because the bare code { foo, bar } in JS is not treated like an object literal — top-level braces are interpreted as statement blocks. JS takes this to mean that you are referencing two variables and doing nothing with them. If these parameters have default values, they get highlighted a bit better, but still not quite accurately.

The upside is that more of the structure of the snippet can be seen by tree-sitter-svelte. I went with this option because it's the only practical way to get snippet names to be recognized as document symbols. I don't think this is implemented in Zed yet, but in tags.scm this is expressed as…

(snippet_statement
  (snippet_start
    (snippet_name) @name)
) @definition.snippet

…and allows foo to show up in a symbol list in Pulsar:

Screenshot 2024-09-12 at 8 19 56 PM

But you're not obligated to inject only into svelte_raw_text! A third option is to keep the structure of option 2, but write the injection in such a way that it also includes the ( and ) adjacent to the svelte_raw_text. This at least makes the JS recognize bar and baz as shorthand object literal properties. (Still wrong, but less so.)


This is such an awkward problem because it breaks some assumptions of injections. A typical injection is saying, “hey, come parse this thing, because you know how to parse it and I don't.” All you know about the thing is that it's Language X, and you don't know how to parse it.

This is us saying “hey, come parse this thing, but parse it as though you're in the following situation…” — and any way of describing that situation involves knowing implementation details of the parser. My ideal feature here would be a way of desginating an arbitrary “entrance rule” for an injection — in this case, it'd be formal_parameters instead of program — but now there's a translucent window in what used to be a black box.

I'm honestly curious about how the Svelte folks want editors to solve this. I hope they've thought about it at least a little bit.

@AlbertMarashi
Copy link
Author

You're right, this isn't a simple issue to solve, even if we go with option 1. we will still need to replicate the typescript highlights.scm, which I'd ideally like to avoid for the reasons related to maintinance overhead.

Perhaps my issue in tree-sitter may address it

@clason
Copy link

clason commented Sep 13, 2024

I said it before, I'll say it again: Tree-sitter is simply a poor fit for templating languages, and you'll be fighting the tool all the way. Since you're coming from Zed, you may have Max's ear so you could try to explain (very generally!) what is lacking from tree-sitter; maybe he'll see a way around it.

(For the record, 2. seems to be the most popular option -- see glimmer-javascript etc.)

@AlbertMarashi
Copy link
Author

@savetheclocktower just linking this issue comment here

image

Currently, typescript thinks that this is a function call expression, so the child is being highlighted as a function.

zed-industries/zed#18033 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants