doc-edits-for-notebooks (#3)

* doc-edits-for-notebooks * Remove output cells --------- Co-authored-by: Chris Chase <[email protected]>
rh-aiservices-bu · Nov 21, 2023 · cb6384a · cb6384a
1 parent cf19b0b
commit cb6384a
Show file tree

Hide file tree

Showing 5 changed files with 93 additions and 89 deletions.
diff --git a/0_sandbox.ipynb b/0_sandbox.ipynb
@@ -3,40 +3,45 @@
   {
    "cell_type": "markdown",
    "id": "991808d0-7d79-47f2-b8cc-764fe86a6156",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "source": [
     "# Your First Jupyter Notebook\n",
     "\n",
-    "### If you are familiar with Jupyter notebooks\n",
+    "NOTE: **If you are already familiar with Jupyter Notebook**, and you understand how to edit and run cells, and how to use other Jupyter features, you can skip this notebook and continue to the next one. Click the `x` at the top of the window next to the `0_sandbox.ipynb` filename, and then open `1-experiment_train.ipnyb`.\n",
+    "\n",
+    "## What are Jupyter notebooks?\n",
     "\n",
-    "If you are already very familiar with Jupyter notebooks, and you understand how to run cells, edit cells, and generally use Jupyter, you can skip this notebook, and move on to the next. Click the `x` at the top of the window where you see the filename `0_sandbox.ipynb`, and then open `1-experiment.ipnyb`.\n",
+    "This is a [Jupyter](https://jupyter.org/) notebook. You need to know only a few things to get started.\n",
     "\n",
-    "## What Are Jupyter Notebooks?\n",
+    "* Jupyter notebooks are made up of _cells_ that can contain prose, executable code, or interactive UI elements.  This cell is a prose cell.\n",
     "\n",
-    "This is a [Jupyter](https://jupyter.org/) notebook.  You can learn more about Jupyter notebooks [here](https://jupyter.readthedocs.io), but you only need to know a few things to get started.\n",
+    "* You can edit and execute cells. To edit a cell, double-click until the frame around it is highlighted. To execute a cell (whether you've edited it or not), click the play button at the top of the file window or select the cell and then press SHIFT+ENTER.\n",
     "\n",
-    "1.  Jupyter notebooks are made up of _cells_ that can contain prose, executable code, or interactive UI elements.  This cell is a prose cell.\n",
-    "2.  Cells can be edited and executed.  To edit a cell, double-click until the frame around it turns green.  To execute a cell (whether you've edited it or not), select it and press Shift+Enter.  This will execute the cell, record its output, and advance to the next cell.  \n",
-    "        <figure>\n",
-    "            <img src=\"./assets/play-button.png\"  alt='missing' width=\"300\"  >\n",
-    "        <figure/>\n",
-    "3.  While a cell is executing, it will have an asterisk in the margin, like `In: [*]`.  After a cell completes executing, the asterisk will be replaced with a sequence number.\n",
+    "     <figure>\n",
+    "        <img src=\"./assets/play-button.png\"  alt='missing' width=\"300\"  >\n",
+    "     <figure/>\n",
+    "        \n",
+    "    Jupyter executes the code in the cell, records the code output, and advances to the next cell.  \n",
+    "\n",
+    "* While a cell is executing, an asterisk shows in the margin, for example: `In: [*]`.  After a cell completes executing, the asterisk is replaced with a sequence number.\n",
     "\n",
     "\n",
     "Let's try it out now!  Try executing the next cell"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "id": "3b4dc1f1-8d6e-4869-ade8-cafc22baa817",
    "metadata": {},
    "outputs": [],
    "source": [
     "def print_some_text(entered_text):\n",
     "    print(\"This is what you entered: \\n\" + entered_text)\n",
     "    \n",
-    "my_text = \"This cell is for code.  You can execute code by clicking on the play button above or hitting Shift+Enter\"\n",
+    "my_text = \"This cell is for code. You can execute code by clicking on the play button above or hitting SHIFT+ENTER.\"\n",
     "\n",
     "print_some_text(my_text)"
    ]
@@ -48,7 +53,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "new_text = \"Next, let's see how you would write narrative text or documentation!\"\n",
+    "new_text = \"Next, let's see how you would write narrative text or documentation.\"\n",
     "\n",
     "# previously defined functions from another cell are still accessible.\n",
     "print_some_text(new_text)"
@@ -61,9 +66,9 @@
    "source": [
     "## This is a markdown cell\n",
     "### Chapter 1\n",
-    "You can use **markdown** formatting to enter your text.\n",
+    "You can use **markdown formatting** to enter your text.\n",
     "\n",
-    "*Double-click* on the cell to modify it!\n",
+    "*Double-click* on a cell to modify it.\n",
     "\n",
     "[Markdown reference](https://www.markdownguide.org/basic-syntax/)"
    ]
@@ -75,20 +80,12 @@
    "source": [
     "## More Information\n",
     "\n",
-    "For more information, click the **Help menu** above\n",
+    "For more information about Jupyter, click the **Help menu** in the Jupyter window's menu bar.\n",
     "\n",
     "<figure>\n",
     "    <img src=\"./assets/help-menu.png\"  alt='missing' width=\"300\"  >\n",
     "<figure/>"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "02c9349a-f522-4850-82e6-7af7c8a659c5",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

diff --git a/1_experiment_train.ipynb b/1_experiment_train.ipynb
@@ -9,9 +9,11 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "source": [
-    "### Install Python Dependencies"
+    "## Install Python dependencies"
    ]
   },
   {
@@ -32,7 +34,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now we can import those dependencies we need to run the code"
+    "Import the dependencies for the model training code:"
    ]
   },
   {
@@ -59,13 +61,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Load the CSV data which we will use to train the model.\n",
-    "It contains the following fields:\n",
+    "## Load the CSV data\n",
+    "\n",
+    "The CSV data that you use to train the model contains the following fields:\n",
+    "\n",
     "* **distancefromhome** - The distance from home where the transaction happened.\n",
-    "* **distancefromlast_transaction** - The distance from last transaction happened.\n",
-    "* **ratiotomedianpurchaseprice** - Ratio of purchased price compared to median purchase price.\n",
+    "* **distancefromlast_transaction** - The distance from the last transaction that happened.\n",
+    "* **ratiotomedianpurchaseprice** - The ratio of purchased price compared to median purchase price.\n",
     "* **repeat_retailer** - If it's from a retailer that already has been purchased from before.\n",
-    "* **used_chip** - If the (credit card) chip was used.\n",
+    "* **used_chip** - If the credit card chip was used.\n",
     "* **usedpinnumber** - If the PIN number was used.\n",
     "* **online_order** - If it was an online order.\n",
     "* **fraud** - If the transaction is fraudulent."
@@ -88,19 +92,19 @@
    "outputs": [],
    "source": [
     "# Set the input (X) and output (Y) data. \n",
-    "# The only output data we have is if it's fraudulent or not, and all other fields go as inputs to the model.\n",
+    "# The only output data is whether it's fraudulent. All other fields are inputs to the model.\n",
     "\n",
     "X = Data.drop(columns = ['repeat_retailer','distance_from_home', 'fraud'])\n",
     "y = Data['fraud']\n",
     "\n",
-    "# Split the data into training and testing sets so we have something to test the trained model with.\n",
+    "# Split the data into training and testing sets so you have something to test the trained model with.\n",
     "\n",
     "# X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, stratify = y)\n",
     "X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, shuffle = False)\n",
     "\n",
     "X_train, X_val, y_train, y_val = train_test_split(X_train,y_train, test_size = 0.2, stratify = y_train)\n",
     "\n",
-    "# Scale the data to remove mean and have unit variance. This means that the data will be between -1 and 1, which makes it a lot easier for the model to learn than random potentially large values.\n",
+    "# Scale the data to remove mean and have unit variance. The data will be between -1 and 1, which makes it a lot easier for the model to learn than random (and potentially large) values.\n",
     "# It is important to only fit the scaler to the training data, otherwise you are leaking information about the global distribution of variables (which is influenced by the test set) into the training set.\n",
     "\n",
     "scaler = StandardScaler()\n",
@@ -113,7 +117,7 @@
     "with open(\"artifact/scaler.pkl\", \"wb\") as handle:\n",
     "    pickle.dump(scaler, handle)\n",
     "\n",
-    "# Since the dataset is unbalanced (it has many more non-fraud transactions than fraudulent ones), we set a class weight to weight the few fraudulent transactions higher than the many non-fraud transactions.\n",
+    "# Since the dataset is unbalanced (it has many more non-fraud transactions than fraudulent ones), set a class weight to weight the few fraudulent transactions higher than the many non-fraud transactions.\n",
     "\n",
     "class_weights = class_weight.compute_class_weight('balanced',classes = np.unique(y_train),y = y_train)\n",
     "class_weights = {i : class_weights[i] for i in range(len(class_weights))}"
@@ -123,9 +127,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Build the model\n",
+    "## Build the model\n",
     "\n",
-    "The model we build here is a simple fully connected deep neural network, containing 3 hidden layers and one output layer."
+    "The model is a simple, fully-connected, deep neural network, containing three hidden layers and one output layer."
    ]
   },
   {
@@ -154,7 +158,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Train"
+    "## Train the model"
    ]
   },
   {
@@ -173,6 +177,13 @@
     "print(\"Training of model is complete\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Save the model file"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -189,15 +200,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Save the model file"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "* Confirm the model file has been created successfully\n",
-    "* This should display the model file, with its size and date. "
+    "## Confirm the model file was created successfully\n",
+    "\n",
+    "The output should include the model name, size, and date. "
    ]
   },
   {
@@ -213,7 +218,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now let's also create a date-stamped folder as well"
+    "## Create a date-stamped folder"
    ]
   },
   {
@@ -240,8 +245,8 @@
     "tags": []
    },
    "source": [
-    "* Confirm the model file has been created successfully\n",
-    "* This should display the model file(s), with its size and date. "
+    "Confirm that the model file was created successfully. \n",
+    "The output should include the model file name, size, and date. "
    ]
   },
   {
@@ -257,7 +262,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Test"
+    "## Test the model"
    ]
   },
   {
@@ -278,7 +283,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Load the test data and scaler"
+    "Load the test data and scaler:"
    ]
   },
   {
@@ -297,7 +302,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Create a onnx inference runtime session and predict values for all test inputs"
+    "Create an ONNX inference runtime session and predict values for all test inputs:"
    ]
   },
   {
@@ -319,7 +324,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Show results"
+    "Show the results:"
    ]
   },
   {
@@ -343,9 +348,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Trying with Sally's details\n",
+    "## Example: Is Sally's transaction likely to be fraudulent?\n",
     "\n",
-    "Fields are in order: \n",
+    "Here is the order of the fields from Sally's transaction details:\n",
     "* distance_from_last_transaction\n",
     "* ratio_to_median_price\n",
     "* used_chip \n",
@@ -369,10 +374,10 @@
     "\n",
     "prediction = sess.run([output_name], {input_name: scaler.transform(sally_transaction_details).astype(np.float32)})\n",
     "\n",
-    "print(\"Was Sally's transaction predicted to be fraudulent?  \")\n",
+    "print(\"Is Sally's transaction predicted to be fraudulent? (true = YES, false = NO) \")\n",
     "print(np.squeeze(prediction) > threshold)\n",
     "\n",
-    "print(\"How likely to be fraudulent was it?  \")\n",
+    "print(\"How likely was Sally's transaction to be fraudulent? \")\n",
     "print(\"{:.5f}\".format(np.squeeze(prediction)) + \"%\")"
    ]
   }

diff --git a/2_save_model.ipynb b/2_save_model.ipynb
@@ -6,14 +6,16 @@
    "source": [
     "# Save the Model\n",
     "\n",
-    "To save this model to use from various locations, including other notebooks or serving the model, we need to upload it to s3 compatible storage."
+    "To save this model so that you can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "source": [
-    "## Install required packages and define a function for the upload"
+    "## Install the required packages and define a function for the upload"
    ]
   },
   {
@@ -75,17 +77,13 @@
     "tags": []
    },
    "source": [
-    "## List files\n",
+    "## Verify the upload\n",
     "\n",
-    "List in your S3 bucket under the upload prefix `models` to make sure upload was successful.  As a best practice, we'll only keep 1 model in a given prefix or directory.  There are several models that will need multiple files in directory and we can download and serve the directory with all necessary files this way without mixing up our models.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "* If this is the first time running the code, this cell will have no output:  \n",
-    "* But if you've already uploaded your model, you should see: `models/fraud/model.onnx`"
+    "In your S3 bucket, under the `models` upload prefix, run the `list_object` command. As best practice, to avoid mixing up model files, keep only one model and its required files in a given prefix or directory. This practive allows you to download and serve a directory with all the files that a model requires. \n",
+    "\n",
+    "If this is the first time running the code, this cell will have no output.\n",
+    "\n",
+    "If you've already uploaded your model, you should see this output: `models/fraud/model.onnx`\n"
    ]
   },
   {
@@ -99,7 +97,9 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "source": [
     "## Upload and check again"
    ]
@@ -108,7 +108,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "And now, we use this function to upload the `models` folder in a rescursive fashion"
+    "Use the function to upload the `models` folder in a rescursive fashion:"
    ]
   },
   {
@@ -126,7 +126,7 @@
     "tags": []
    },
    "source": [
-    "To confirm this worked, we run the `list_objects` function again:"
+    "To confirm this worked, run the `list_objects` function again:"
    ]
   },
   {
@@ -142,11 +142,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Next Steps - Serve the Model\n",
-    "\n",
-    "Hopefully, you saw the model `models/fraud/model.onnx` listed above. Now that you've saved the model so s3 storage we can refer to the model using the same data connection to serve the model as an API.\n",
+    "### Next Step\n",
     "\n",
-    "Return to the workshop instructions to deploy the model as an API.\n"
+    "Now that you've saved the model to s3 storage, you can refer to the model by using the same data connection to serve the model as an API.\n"
    ]
   }
  ],