tensorflow · aniruthraj · Jan 9, 2025
diff --git a/docs/examples/featurization.ipynb b/docs/examples/featurization.ipynb
@@ -241,7 +241,7 @@
       "source": [
         "### Using feature hashing\n",
         "\n",
-        "In fact, the `StringLookup` layer allows us to configure multiple OOV indices. If we do that, any raw value that is not in the vocabulary will be deterministically hashed to one of the OOV indices. The more such indices we have, the less likley it is that two different raw feature values will hash to the same OOV index. Consequently, if we have enough such indices the model should be able to train about as well as a model with an explicit vocabulary without the disdvantage of having to maintain the token list."
+        "In fact, the `StringLookup` layer allows us to configure multiple OOV indices. If we do that, any raw value that is not in the vocabulary will be deterministically hashed to one of the OOV indices. The more such indices we have, the less likley it is that two different raw feature values will hash to the same OOV index. Consequently, if we have enough such indices the model should be able to train about as well as a model with an explicit vocabulary without the disadvantage of having to maintain the token list."
       ]
     },
     {
@@ -250,7 +250,7 @@
         "id": "t0gOaMjJAC17"
       },
       "source": [
-        "We can take this to its logical extreme and rely entirely on feature hashing, with no vocabulary at all. This is implemented in the `tf.keras.layers.Hashing` layer."
+        "We can take this to its logical extreme and rely entirely on feature hashing, with no vocabulary at all. This is implemented in the [`tf.keras.layers.Hashing`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Hashing) layer."
       ]
     },
     {
@@ -314,7 +314,7 @@
       "source": [
         "movie_title_embedding = tf.keras.layers.Embedding(\n",
         "    # Let's use the explicit vocabulary lookup.\n",
-        "    input_dim=movie_title_lookup.vocab_size(),\n",
+        "    input_dim=movie_title_lookup.vocabulary_size(),\n",
         "    output_dim=32\n",
         ")"
       ]
@@ -356,7 +356,7 @@
       },
       "outputs": [],
       "source": [
-        "movie_title_model([\"Star Wars (1977)\"])"
+        "movie_title_model(tf.constant([\"Star Wars (1977)\"]))"
       ]
     },
     {
@@ -379,7 +379,7 @@
         "user_id_lookup = tf.keras.layers.StringLookup()\n",
         "user_id_lookup.adapt(ratings.map(lambda x: x[\"user_id\"]))\n",
         "\n",
-        "user_id_embedding = tf.keras.layers.Embedding(user_id_lookup.vocab_size(), 32)\n",
+        "user_id_embedding = tf.keras.layers.Embedding(user_id_lookup.vocabulary_size(), 32)\n",
         "\n",
         "user_id_model = tf.keras.Sequential([user_id_lookup, user_id_embedding])"
       ]
@@ -426,7 +426,7 @@
         "\n",
         "[Standardization](https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)) rescales features to normalize their range by subtracting the feature's mean and dividing by its standard deviation. It is a common preprocessing transformation.\n",
         "\n",
-        "This can be easily accomplished using the [`tf.keras.layers.Normalization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Normalization) layer:"
+        "This can be easily accomplished using the [`tf.keras.layers.Normalization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization) layer:"
       ]
     },
     {
@@ -518,7 +518,7 @@
         "\n",
         "The first transformation we need to apply to text is tokenization (splitting into constituent words or word-pieces), followed by vocabulary learning, followed by an embedding.\n",
         "\n",
-        "The Keras [`tf.keras.layers.TextVectorization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/TextVectorization) layer can do the first two steps for us:"
+        "The Keras [`tf.keras.layers.TextVectorization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization) layer can do the first two steps for us:"
       ]
     },
     {
@@ -584,7 +584,7 @@
       "source": [
         "This looks correct: the layer is tokenizing titles into individual words.\n",
         "\n",
-        "To finish the processing, we now need to embed the text. Because each title contains multiple words, we will get multiple embeddings for each title. For use in a donwstream model these are usually compressed into a single embedding. Models like RNNs or Transformers are useful here, but averaging all the words' embeddings together is a good starting point."
+        "To finish the processing, we now need to embed the text. Because each title contains multiple words, we will get multiple embeddings for each title. For use in a downstream model these are usually compressed into a single embedding. Models like RNNs or Transformers are useful here, but averaging all the words' embeddings together is a good starting point."
       ]
     },
     {
@@ -624,7 +624,7 @@
         "\n",
         "    self.user_embedding = tf.keras.Sequential([\n",
         "        user_id_lookup,\n",
-        "        tf.keras.layers.Embedding(user_id_lookup.vocab_size(), 32),\n",
+        "        tf.keras.layers.Embedding(user_id_lookup.vocabulary_size(), 32),\n",
         "    ])\n",
         "    self.timestamp_embedding = tf.keras.Sequential([\n",
         "      tf.keras.layers.Discretization(timestamp_buckets.tolist()),\n",
@@ -665,7 +665,7 @@
         "user_model = UserModel()\n",
         "\n",
         "user_model.normalized_timestamp.adapt(\n",
-        "    ratings.map(lambda x: x[\"timestamp\"]).batch(128))\n",
+        "    ratings.map(lambda x: x[\"timestamp\"]).batch(128,drop_remainder=True))\n",
         "\n",
         "for row in ratings.batch(1).take(1):\n",
         "  print(f\"Computed representations: {user_model(row)[0, :3]}\")"
@@ -698,7 +698,7 @@
         "\n",
         "    self.title_embedding = tf.keras.Sequential([\n",
         "      movie_title_lookup,\n",
-        "      tf.keras.layers.Embedding(movie_title_lookup.vocab_size(), 32)\n",
+        "      tf.keras.layers.Embedding(movie_title_lookup.vocabulary_size(), 32)\n",
         "    ])\n",
         "    self.title_text_embedding = tf.keras.Sequential([\n",
         "      tf.keras.layers.TextVectorization(max_tokens=max_tokens),\n",
@@ -749,7 +749,7 @@
       "source": [
         "## Next steps\n",
         "\n",
-        "With the two models above we've taken the first steps to representing rich features in a recommender model: to take this further and explore how these can be used to build an effective deep recomender model, take a look at our Deep Recommenders tutorial."
+        "With the two models above we've taken the first steps to representing rich features in a recommender model: to take this further and explore how these can be used to build an effective deep recommender model, take a look at our Deep Recommenders tutorial."
       ]
     }
   ],