Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Overlapping TTML documents" #4

Open
andreastai opened this issue Nov 24, 2015 · 13 comments
Open

"Overlapping TTML documents" #4

andreastai opened this issue Nov 24, 2015 · 13 comments

Comments

@andreastai
Copy link
Collaborator

After reading the current version I noticed two sections which are from my point of view not correct or could be misunderstood:

In above workflow it may be difficult for tools that have only simple TTML capabilities, to process a TTML document for the purpose of creating small, self contained, non overlapping TTML documents (sometimes called intermediate synchronic document, ISD)

An ISD is not a TTML document and therefore the part

(sometimes called intermediate synchronic document, ISD)

should be deleted. Also the term "no overlapping TTML documents" could be missleading. I could be read as TTML documents that do not overlap with each other but what you mean is that there is no timewise overlapping subtitle content inside a TTML document (e.g. two p or span elements with different timings).

Also the following recommendation for me is only one option amongst others:

The MP4 standard assumes that only one sample at a time is active. This means that the MP4 parser will deliver one TTML document at a time to the TTML renderer and will assume that the previous TTML document will be replaced by the new one, and that it will be used for a given duration. This standard behavior thus constrains the upper part of the workflow, in the sense that samples cannot overlap and therefore the contained TTML document should not overlap. This improves interoperability by reducing the number of choices left.

It reads as if it is recommended that a TTML document has no timewise overlapping content (like the one in the TTML example). For example given in the text you had to create two documents: one sample from 0-2s and another from 2 to 6s. I do not think that this is necessary and I haven´t seen any problems sofar with players that get that kind of content.

@rbouqueau
Copy link
Owner

(sometimes called intermediate synchronic document, ISD)

should be deleted.

If that's incorrect we should indeed remove it.

the term "no overlapping TTML documents" could be missleading. I could be read as TTML documents that do not overlap with each other but what you mean is that there is no timewise overlapping subtitle content inside a TTML document (e.g. two p or span elements with different timings).

Would adding "timewise" prior to "overlapping" fix the issue?

It reads as if it is recommended that a TTML document has no timewise overlapping content (like the one in the TTML example). For example given in the text you had to create two documents: one sample from 0-2s and another from 2 to 6s. I do not think that this is necessary and I haven´t seen any problems sofar with players that get that kind of content.

I also read it this way.

The redundancy or overlapping of information (e.g. timings) at separate levels is a source of complexity and confusion for the implementors of players. The current text says to TTML producers to avoid producing such content if they can.

Maybe we should insist that this applies for MP4 and DASH packaging. Or do you think we have a stronger disagreement here (where it would be great to propose a replacement that we can discuss)?

What do you think?

rbouqueau added a commit that referenced this issue Nov 26, 2015
@andreastai
Copy link
Collaborator Author

If that's incorrect we should indeed remove it.

Yes

Would adding "timewise" prior to "overlapping" fix the issue?

Only partly. It is not the documents that are overlapping but the subtitles content in a single document.

The redundancy or overlapping of information (e.g. timings) at separate levels is a source of complexity and confusion for the implementors of players. The current text says to TTML producers to avoid producing such content if they can.

I do not think that it is necessary to give this recommendation.

Maybe we should insist that this applies for MP4 and DASH packaging. Or do you think we have a stronger disagreement here (where it would be great to propose a replacement that we can discuss)?

Yes, may be ; ) In general I would not give this advise because I did not see any proof that DASH players have problems with this subtitle content. You may leave it in as point for discussion but I think you should mark it as one option. I think it depends very much on the implementation which approach is best.

See below for some more details:

Doc1 (simplified just to show timing)

p begin:00:00:01.000 end:00:00:03.000
p begin:00:00:02.000 end:00:00:03.000

Doc2 (simplified just to show timing)

p begin:00:00:01.000 begin:00:00:02.000
p begin:00:00:04.000 begin:00:00:06.000

As I read your advice you would recommend to have neither TTML doc1 nor TTML doc2 in one sample so the better way should be to split them. But at least for Doc2 I can not see that current dash players have problems with this kind of content. With documents like Doc1 I did not made tests yet but I also do not expect any problem.

If you folllow the current recommendation you would have TTML segments with different durations. But there is often the approach to have the same segment length.

@rbouqueau
Copy link
Owner

Would adding "timewise" prior to "overlapping" fix the issue?

Only partly. It is not the documents that are overlapping but the subtitles content in a single document.

I think I don't understand. Could you give me an example of "subtitles content" overlapping (or any overlapping that is not timely)?

Doc1 (simplified just to show timing)

p begin:00:00:01.000 end:00:00:03.000
p begin:00:00:02.000 end:00:00:03.000

Doc2 (simplified just to show timing)

p begin:00:00:01.000 begin:00:00:02.000
p begin:00:00:04.000 begin:00:00:06.000

Thank you for the examples.

As I read your advice you would recommend to have neither TTML doc1 nor TTML doc2 in one sample so the better way should be to split them. But at least for Doc2 I can not see that current dash players have problems with this kind of content.

All depends on the DASH segment duration since we consider the timewise-overlap between the DASH segments and the TTML document.

DASH duration = 3s:

  • Doc1: ok.
  • Doc2: ok.

DASH duration = 2s:

  • Doc1: we recommend to split the first

    at t=2s

(1) p begin:00:00:01.000 end:00:00:02.000 //splitted at 2s
(1) p begin:00:00:02.000 end:00:00:03.000 //splitted from 2s
(2) p begin:00:00:02.000 end:00:00:03.000
  • Doc2: ok

If you folllow the current recommendation you would have TTML segments with different durations. But there is often the approach to have the same segment length.

The recommendation asks to pre-process the TTML so that the segment boudaries are consistent between the different DASH adaptation sets/representations. You can pre-process for fixed-duration segments or variable duration (e.g. DASH segment-timeline or within the DASH-IF 50% segment duration tolerance).

The recommended pre-processing consists of creating independent TTML documents which timings fit with the DASH segment boundaries. Is it clear, or may we add or modify something in the document in your opinion?

@nigelmegitt
Copy link
Collaborator

Could you give me an example of "subtitles content" overlapping (or any overlapping that is not timely)?

Subtitles can spatially overlap, e.g. two overlapping regions both containing content at the same time.

The recommended pre-processing consists of creating independent TTML documents which timings fit with the DASH segment boundaries. Is it clear, or may we add or modify something in the document in your opinion?

I do not recommend this, but I accept it is an option. Earlier in the document there's a statement about minimising duplication - this strategy actually maximises duplication because, in the case of @TairT's first example, you would need separate TTML documents from 1-2 seconds and 2-3 seconds. Maybe the challenge here is that most formats being packaged do not have any timing concept other than a continuous linear flow of time whereas TTML allows for arbitrarily timed content to be defined.

The statement (in the current document):

This standard behavior thus constrains the upper part of the workflow, in the sense that samples cannot overlap in time and therefore the contained TTML document should not overlap in time.

is similarly confusing. It's true that a consequence of the 'one active document at a time' rule is that receiving processors do not need to handle overlaps between documents, however it is false that the contained TTML content should not overlap in time - it's actually irrelevant whether it does or does not overlap in time since the 'one document at a time' rule will override that.

The important thing is to know when the segment boundaries are and select into the segment TTML document all content that overlaps temporally with the segment timings that are required. We should not hint or recommend that the timings of content are somehow better if they are manipulated to match the segment boundaries: doing so would actually prevent the behaviour proposed in the paragraph about sample_has_redundancy since on inspection of the two TTML documents they would appear to have different timings and therefore possibly be different documents.

@rbouqueau
Copy link
Owner

Could you give me an example of "subtitles content" overlapping (or any overlapping that is not timely)?

Subtitles can spatially overlap, e.g. two overlapping regions both containing content at the same time.

Ok, thank you. So it seems to me like the modification to the text removed the confusion.

Spatial overlapping considerations seem out of the scope of this document. I understand for now that it relies entirely on the Authoring Tool. Let me know if I'm wrong.

Maybe the challenge here is that most formats being packaged do not have any timing concept other than a continuous linear flow of time whereas TTML allows for arbitrarily timed content to be defined.

Agreed. The tools available for packaging (e.g. ISOBMF) are not appropriate for overlapping samples. That's why we introduced the TTML Preprocessor in our discussions.

Earlier in the document there's a statement about minimising duplication - this strategy actually maximises duplication because, in the case of @TairT's first example, you would need separate TTML documents from 1-2 seconds and 2-3 seconds.

That's correct and that's a consequence of the previous remark about packaging.

Now about the example, the sample_has_redundancy is in my mind mostly for packagers which won't parse TTML and put the whole document in every segments. I don't expect TTML cues to be big in size so repeating them seems acceptable. Maybe we should clarify this, what do you think?

FYI the current implementation of EBU-TTD import in MP4Box fallbacks to "full TTML document in a single sample" if it encounters any timewise overlap. Then when DASH-ing this unique sample is repeated according to the user-input segment duration (with sample_has_redundancy set to 1).

We should not hint or recommend that the timings of content are somehow better if they are manipulated to match the segment boundaries

I agree ; the wording may be too general. I meant it to be true after the TTML Preprocessor if any. Would it be more acceptable?

@nigelmegitt
Copy link
Collaborator

Spatial overlapping considerations seem out of the scope of this document.

Agreed.

I don't expect TTML cues to be big in size so repeating them seems acceptable. Maybe we should clarify this, what do you think?

Always good to clarify if there might be different interpretations of the text.

I meant it to be true after the TTML Preprocessor if any. Would it be more acceptable?

I just don't recognise that it should be true at all. Adjusting the TTML content, even in the preprocessor, limits what clever decoders can do in an unhelpful way, and doesn't affect minimally compliant decoders.

@cconcolato
Copy link
Collaborator

Can we agree that it's easier for a TTML-unaware DASH packager to package TTML content if the documents are 'atomic', i.e. cannot be split further in time without duplicating some of the content? In that case, the task of the packager can be quite simple: aggregate documents (without deep inspection) into samples then into segments: if the end of the segment is in between documents, there is nothing to do; if the document spans the segment boundary, the packager has to either terminate the segment at a different time (variable segment duration) or duplicate the document in the 2 segments (indicating redundancy).

Now using atomic documents is probably an extreme case, one could apply the same strategy with any document (including a single, long duration document) but with non atomic documents: the duplication of content will be more important (although I agree, probably still negligible in many cases); and the risk of having players misbehave will be higher (because some of them may no apply correctly the 'one document at a time' rule).

Maybe we can clarify the two extreme cases and simply hint on possible consequences.

@andreastai
Copy link
Collaborator Author

Thanks all for this very useful discussion. It is exactly the debate we need.

For clarification we need more examples and a definition of the different use case scenarios. I will try to provide some more examples with text in this thread or a separate wiki page. It may take some days until this will be contributed.

For the statement we have to distinguish between options we list and which of the options we recommend. While we seem to agree on the options we may see different pros and cons of these options.

One further dimension I would like to bring in this discussion are current implementations. With dash.js and bitdash we have two implementations that can decode and present TTML in the EBU-TT-D form. We can check if there are problems with TTML segments with different, overlapping timings within. On the packaging and "dashing" side we also have existing implementations.

In general I agree with the points @nigelmegitt made about splitting documents. I am not sure with @cconcolato assumption that a DASH packager that packages TTML can be TTML unaware but this falls more in the system architecture design and may depends on the concrete implementation.

@cconcolato
Copy link
Collaborator

For clarification we need more examples and a definition of the different use case scenarios.

I agree.

I am not sure with @cconcolato assumption that a DASH packager that packages TTML can be TTML unaware but this falls more in the system architecture design and may depends on the concrete implementation.

For me the whole purpose of this document is to make sure that it is possible to separate the layers (TTML, MP4, DASH) to foster the development of tools. If the only way to have package TTML in DASHed MP4 is to have a single monolithic tool that understands all the technologies, I don't think there will be many such implementations and our document will therefore not be necessary.

One further dimension I would like to bring in this discussion are current implementations.

This is a good idea. What are the tools you know, besides MP4box, even if they are not OSS or available, that support DASHing an MP4 file that contains a TTML track? How would such tool behave if you feed it a two-hours long movie with a single 2-hours long TTML document and asks to create 10s segments? My guess is that most tools will either create side-car file(s) (à la HLS + WebVTT) or create MP4 segments in the TTML representation that each duplicate the input document; in other words, that few tools will inspect the TTML document and split it into smaller documents to avoid duplication. Note that given the size of TTML documents, it may not be so bad to duplicate it. Note also that players may support it as well. The question is should we really recommend implementations to follow that path? Note that this is probably a bad example because using a side-car file or a single long MP4 segment would probably suffice here.

@nigelmegitt
Copy link
Collaborator

@cconcolato wrote:

Can we agree that it's easier for a TTML-unaware DASH packager to package TTML content if the documents are 'atomic', i.e. cannot be split further in time without duplicating some of the content?

For me, no, I don't agree with this. I think there's a closely related thing I could agree to though, which is that there should be a thing that may be part of the encoder or part of the packager or separate (and located between them), which is a TTML Segmenter, whose output is a set of documents and possibly an index sidecar describing the activation/deactivation times for each document. This could then be packaged up cleanly by a DASH packager which would simply treat each document as a sample and use the times to create the segments.

@cconcolato
Copy link
Collaborator

@nigelmegitt I think we're converging. Indeed the documents don't need to be 'atomic'. As I said, it was an extreme case to highlight the behavior. In fact, with the NHML approach, MP4Box exactly follows what you say: the index sidecar file is the NHML file and when NHML is used, the TTML content is not deep-parsed.

So, would you be ok with the following?

  • Renaming the current "TTML pre-processor" to a "TTML Segmenter"
  • Indicating its role: i.e. that it segments one (or more) TTML input document(s) into output TTML documents, if possible non-overlapping in time to avoid data duplication.
  • Describing 2 extreme cases with pros/cons:
    • the TTML segmenter does nothing and outputs a single document. This means that TTML-unaware packagers will have to either create a single segment or duplicate the entire document in each segment;
    • the TTML segmenter splits the input document(s) into atomic TTML documents. This has the benefit of avoiding content duplication across segments, but maybe unnecessarily making too many samples and also needlessly duplicating the document headers. This allows the TTML segmenter to be unaware of the DASH segment duration and to allow TTML-unaware packager to generate different DASH versions with different segment durations from the same TTML content with a single sidecar.
  • Indicating that an intermediate approach where the TTML segmenter segments the input document(s) with a known target DASH duration is probably a better solution, leading to a single or few samples per DASH segment.

@nigelmegitt
Copy link
Collaborator

@cconcolato yes, looks like we're converging!

if possible non-overlapping in time to avoid data duplication

reword to:

each containing only the timed data needed for presentation within its segment time to avoid unnecessary data duplication

Then:

the TTML segmenter splits the input document(s) into atomic TTML documents. This has the benefit of avoiding content duplication across segments, but maybe unnecessarily making too many samples and also needlessly duplicating the document headers.

I don't think this is quite right - it is not guaranteed to avoid content duplication across segments; also it isn't really obvious what "atomic" actually means here.

Perhaps what you mean is that in this case the TTML segmenter splits up the input document into TTML documents while attempting to identify helpful segmentation boundary times to avoid duplication of content across segments (which is not guaranteed to be possible without simply outputting the unsplit input document, because it is possible to construct a TTML document that has all content temporally overlapping some other content). Not sure if I've understood correctly though?

The three options I can see are:

  • A single document is created for the whole "programme" and is marked as redundant in all but the first segment, using the sample_has_redundancy flag.
  • The TTML segmenter attempts to split the input documents using a strategy to choose the best times at which to begin and end each segment based on the content and other heuristics such as maximum segment size in data or in time. This strategy would typically try to avoid any content overlapping with other segments but this may not always be possible. The output of the TTML segmenter is both a set of TTML documents and some kind of manifest indicating the times of each segment in turn, suitable for use within the packager.
  • The TTML segmenter splits the input documents into predefined segment durations. For each segment it selects all of the content that overlaps in time with the period of interest, and then selects all referenced styles, regions etc in the head. Some segments may contain no content. Some adjacent segments may duplicate some or all of their content. Use of the sample_has_redundancy flag is not recommended.

Would it help to use that wording?

@rbouqueau
Copy link
Owner

I have updated the document based on our previous agreements.

The three options I can see are:

  • A single document is created for the whole "programme" and is marked as redundant in all but the first segment, using the sample_has_redundancy flag.
  • The TTML segmenter attempts to split the input documents using a strategy to choose the best times at which to begin and end each segment based on the content and other heuristics such as maximum segment size in data or in time. This strategy would typically try to avoid any content overlapping with other segments but this may not always be possible. The output of the TTML segmenter is both a set of TTML documents and some kind of manifest indicating the times of each segment in turn, suitable for use within the packager.
  • The TTML segmenter splits the input documents into predefined segment durations. For each segment it selects all of the content that overlaps in time with the period of interest, and then selects all referenced styles, regions etc in the head. Some segments may contain no content. Some adjacent segments may duplicate some or all of their content. Use of the sample_has_redundancy flag is not recommended.
    Would it help to use that wording?

I agree with this.

rbouqueau added a commit that referenced this issue Feb 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants