Add docs build

rhasspy · Jun 1, 2021 · c5ada76 · c5ada76
1 parent 1a7681e
commit c5ada76
Show file tree

Hide file tree

Showing 45 changed files with 20,224 additions and 24 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,8 +7,6 @@ __pycache__/
 dist/
 /etc/
 
-docs/build/
-
 coverage.xml
 .coverage
 

diff --git a/README.md b/README.md
@@ -35,8 +35,7 @@ read ɹ ˈi d
 
 Note that "wound" and "read" have different pronunciations when used in different contexts.
 
-Gruut includes a pre-trained U.S. English model with part-of-speech/tense aware pronunciations.
-[Pre-trained models](https://github.com/rhasspy/gruut/releases/tag/v1.0.0) are also available for the [supported languages](#support-languages).
+See [the documentation](https://rhasspy.github.io/gruut) for more details.
 
 ## Intended Audience
 
@@ -57,18 +56,16 @@ Some languages also include:
 
 gruut currently supports:
 
-* Czech (`cs-cz`)
-* German (`de-de`)
-* U.S. English (`en-us`)
-   * Supports part-of-speech aware pronunciations
-* U.K. English (`en-gb`)
-* Spanish (`es-es`)
+* Czech (`cs`)
+* German (`de`)
+* English (`en`)
+* Spanish (`es`)
 * Farsi/Persian (`fa`)
-* French (`fr-fr`)
-* Italian (`it-it`)
+* French (`fr`)
+* Italian (`it`)
 * Dutch (`nl`)
-* Russian (`ru-ru`)
-* Swedish (`sv-se`)
+* Russian (`ru`)
+* Swedish (`sv`)
 
 The goal is to support all of [voice2json's languages](https://github.com/synesthesiam/voice2json-profiles#supported-languages)
 
@@ -90,20 +87,12 @@ The goal is to support all of [voice2json's languages](https://github.com/synest
 $ pip install gruut
 ```
 
-For Raspberry Pi (ARM), you will first need to [manually install phonetisaurus](https://github.com/rhasspy/phonetisaurus-pypi/releases).
-
-## Language Download
-
-[Pre-trained models](https://github.com/rhasspy/gruut/releases/tag/v0.8.0) for gruut can be downloaded with:
+Additional languages can be added during installation. For example, with French and Italian support:
 
 ```sh
-$ python3 -m gruut <LANGUAGE> download
+$ pip install gruut[fr,it]
 ```
 
-A U.S. English model is included in the distribution.
-
-By default, models are stored in `$HOME/.config/gruut` (technically `$XDG_CONFIG_HOME/.gruut`). This can be overridden by passing a `--lang-dir` argument to all `gruut` commands.
-
 ## Command-Line Usage
 
 The `gruut` module can be executed with `python3 -m gruut <LANGUAGE> <COMMAND> <ARGS>`

diff --git a/docs/build/.buildinfo b/docs/build/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 90a8d147d7bb8b949280ed46e74eb2cb
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/build/.doctrees/environment.pickle b/docs/build/.doctrees/environment.pickle
diff --git a/docs/build/.doctrees/gruut.doctree b/docs/build/.doctrees/gruut.doctree
diff --git a/docs/build/.doctrees/index.doctree b/docs/build/.doctrees/index.doctree
diff --git a/docs/build/.doctrees/modules.doctree b/docs/build/.doctrees/modules.doctree
diff --git a/docs/build/_sources/gruut.rst.txt b/docs/build/_sources/gruut.rst.txt
@@ -0,0 +1,85 @@
+gruut package
+=============
+
+Submodules
+----------
+
+gruut.commands module
+---------------------
+
+.. automodule:: gruut.commands
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.const module
+------------------
+
+.. automodule:: gruut.const
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.g2p module
+----------------
+
+.. automodule:: gruut.g2p
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.lang module
+-----------------
+
+.. automodule:: gruut.lang
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.lexicon2db module
+-----------------------
+
+.. automodule:: gruut.lexicon2db
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.phonemize module
+----------------------
+
+.. automodule:: gruut.phonemize
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.pos module
+----------------
+
+.. automodule:: gruut.pos
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.toksen module
+-------------------
+
+.. automodule:: gruut.toksen
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+gruut.utils module
+------------------
+
+.. automodule:: gruut.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Module contents
+---------------
+
+.. automodule:: gruut
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/build/_sources/index.rst.txt b/docs/build/_sources/index.rst.txt
@@ -0,0 +1,119 @@
+.. gruut documentation master file
+
+gruut
+=====
+
+A tokenizer and `IPA <https://en.wikipedia.org/wiki/International_Phonetic_Alphabet>`_ phonemizer for multiple human languages.
+
+.. code-block:: python
+
+    from gruut import text_to_phonemes
+
+    text = 'He wound it around the wound, saying "I read it was $10 to read."'
+
+    for sent_idx, word, word_phonemes in text_to_phonemes(text, lang="en-us"):
+        print(word, *word_phonemes)
+
+
+Output::
+
+    he h ˈi
+    wound w ˈaʊ n d
+    it ˈɪ t
+    around ɚ ˈaʊ n d
+    the ð ə
+    wound w ˈu n d
+    , |
+    saying s ˈeɪ ɪ ŋ
+    i ˈaɪ
+    read ɹ ˈɛ d
+    it ˈɪ t
+    was w ə z
+    ten t ˈɛ n
+    dollars d ˈɑ l ɚ z
+    to t ə
+    read ɹ ˈi d
+    . ‖
+
+
+Installation
+------------
+
+To install gruut with U.S. English support only::
+
+    pip install gruut
+
+
+Additional languages can be added during installation. For example, with French and Italian support::
+
+    pip install gruut[fr,it]
+
+
+Supported Languages
+^^^^^^^^^^^^^^^^^^^
+
+* Czech (``cs``)
+* German (``de``)
+* English (``en``)
+* Spanish (``es``)
+* Farsi/Persian (``fa``)
+* French (``fr``)
+* Italian (``it``)
+* Dutch (``nl``)
+* Russian (``ru``)
+* Swedish (``sv``)
+
+
+Usage
+-----
+
+gruut performs two main functions: tokenization and phonemization.
+The :py:meth:`gruut.text_to_phonemes` method performs both steps for you. See the :py:class:`~gruut.TextToPhonemesReturn` enum for ways to adjust the ``return_format``.
+
+If you need more control, see the language-specific classes in :py:mod:`gruut.lang` as well as :py:class:`~gruut.toksen.RegexTokenizer` and :py:class:`~gruut.lang.SqlitePhonemizer`.
+
+Tokenziation operates on text and does the following:
+
+* Splits text into words by whitespace
+* Expands user-defined abbreviations
+* Breaks apart words and sentences further by punctuation (periods, commas, etc.)
+* Drops empty/non-word tokens
+* Expands numbers into words (100 -> one hundred)
+* Applies upper/lower case filter
+* Predicts part of speech tags (see :py:mod:`gruut.pos`)
+
+Once tokenized, phonemization predicts the phonetic pronunciation for each word by:
+
+* Looking up each word in an SQLite database
+* Guessing the pronunciation with a pre-trained model (see :py:mod:`gruut.g2p`)
+
+In cases where more than one pronunciation is possible for a word, the "best" pronunciation is:
+
+* Specified by the user with word indexes enabled and a word of the form "word_N" where N is the 1-based pronunciation index
+* Whichever pronunciation has the most compatible :ref:`features`.
+* The first pronunciation
+
+
+.. _features:
+
+Features
+^^^^^^^^
+
+gruut tokens can contain arbitrary features. For now, only part of speech tags are implemented for English and French.
+
+When determining the "best" pronunciation for a word, a phonemizer may consult these features. In English, for example, some word pronunciations in the lexicon contain "preferred" parts of speech. Words like "wind" may be pronounced different depending on their use as a verb or noun. If a token "wind" is predicted to be a noun during tokenization, then the pronunciation "w ˈɪ n d" is selected instead of "w ˈaɪ n d".
+
+French uses part of speech tags differently. During the post-processing phase of phonemization, these features are used instead to add liasons between words. For example, in the sentence "J’ai des petites oreilles.", "petites" will be pronounced "p ə t i t z" instead of "p ə t i t".
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/docs/build/_sources/modules.rst.txt b/docs/build/_sources/modules.rst.txt
@@ -0,0 +1,7 @@
+gruut
+=====
+
+.. toctree::
+   :maxdepth: 4
+
+   gruut