Custom NLU Component in Rasa 3.x
Rasa provides us with a variety of components for stages like tokenization, featurization, and classification that are very helpful in creating a project but sometimes we might require some customisation. Thus, Rasa allows us to create custom components for the tokenizers, featurizers, entity extractors, spellchecker, etc. For an example, we'd want to include a spell checker for spelling rectification in user communication or data preprocessing before intents are identified and entities are extracted.
Let’s discover how to put these into action! This tutorial will detail how to create custom components and then integrate into Rasa NLU pipeline to elevate the performance of our AI assistants.
These custom components are also known as the custom graph components. To create a new component we need to make sure that the following requirements are fulfilled:
- It must adhere to the GraphComponent interface.
- It must be registered with the model configuration currently in use.
- It must be included in the config.yml file.
- It needs to use type annotations (Rasa Open Source validates our model setup using type annotations).
|Skeleton For Designing The Custom Component As Per Rasa Documentation|
|from typing import Dict, Text, Any, List
from rasa.engine.graph import GraphComponent, ExecutionContext
from rasa.engine.recipes.default_recipe import DefaultV1Recipe
from rasa.engine.storage.resource import Resource
from rasa.engine.storage.storage import ModelStorage
from rasa.shared.nlu.training_data.message import Message
from rasa.shared.nlu.training_data.training_data import TrainingData
# TODO: Correctly register our component with its type
config: Dict[Text, Any],
) -> GraphComponent:
# TODO: Implement this
def train(self, training_data: TrainingData) -> Resource:
# TODO: Implement this if our component requires training
def process_training_data(self, training_data: TrainingData) -> TrainingData:
# TODO: Implement this if our component augments the training data with
# tokens or message features which are used by other components
# during training.
def process(self, messages: List[Message]) -> List[Message]:
# TODO: This is the method which Rasa Open Source will call during inference.
|@DefaultV1Recipe.register()||Rasa open-source uses the register decorator and the position of the component in the config file to schedule the execution of the graph component.
There are 3 things that can be defined using this decorator
ComponentType - It is used to specify the type of purpose which is to be fulfilled by the component. Below listed are the possible types that can be defined
is_trainable -This is a boolean variable that specifies whether the component must be trained or not.
model_from - Specifies whether or not a pre-trained model must be supplied to the train, process methods of the graph component.
|By utilizing this method we can create a new Graph Component. It accepts four inputs, as specified in the function definition, and returns a Graph Component that has been instantiated.|
) -> Resource:
|This method is used when we want to train the component. For example, if we want to create a custom classifier component, we need to train it to learn to classify different classes/intents.|
) -> TrainingData:
|As the name of the function suggests, when we want to process data before training we need to implement the processing logic here so as to make sure the train() method gets the right kind of data.|
) -> List[Message]:
|At the time of inference, the process method is called so we need to define how we want the output back from the component.|
Now that we’ve discussed how to create a custom graph component, let’s discuss a simple use case for application of a custom component.
Suppose we’re creating a bot in which there are certain entities that we want to extract out. The entity is somewhat like “<some text> (year-year) <some text>”. In this entity, the year part is surrounded by “()” and there is some text on both sides of this entity. For the sake of this example, let’s consider the user utterance to be “Davin (2000-2022) beckingham”.
Now the problem with this is that when the user passes this information to the bot, the entity extractor part divides this entity into 2 parts as highlighted in the image below:
Now to avoid partitioning our entity, we can create a custom component that would preprocess the user input and remove the “()” from it, due to which the entity partitions. To develop such a component, let’s use the template code we mentioned earlier.
Let’s name our class as Preprocess that inherits from the GraphComponent class!
BUILDING CUSTOM COMPONENT
The first step towards building a component is to register the type of component we are going to create, be it a tokenizer, featurizer, or a classifier, etc. For our particular use case, we are going to register it as a MESSAGE_TOKENIZER and set the is_trainable to False.
We just want to preprocess the user text at the time of inference, hence we only need to implement the process(self, messages: List[Message]) -> List[Message] method which is used by GraphComponents at the inference time.
|def process(self, messages: List[Message]) -> List[Message]:
# This method is used to modify the user message and remove () if included in the user text.
for message in messages:
if 'text' in message.data.keys():
msg = message.data['text']
if "(" in msg:
msg = msg.replace("(", "")
if ")" in msg:
msg = msg.replace(")", "")
# Assigning preprocessed text back to rasa's message object
message.data['text'] = msg
After this functionality is created, we need to add it to the pipeline inside our config.yml file.
- name: Preprocessing_component.Preprocess
# other components below
Here in the config.yml file, we add our custom component at the top because we want the user text to be preprocessed before being used by any other component in the pipeline.
To ensure that rasa picks the path of our custom component, we need to provide the complete path or the relative path from our config.yml file. Here the file being used is in the same path as the config.yml file, hence Preprocessing_component is the name of our python script followed by the class name Preprocess where functionality gets implemented.
After adding this component to our configuration file, we need to train our model and the outcome of this new model can be seen below.
Now to make sure that the component actually removed “()”, let’s focus on the initial two highlighted text in the above image. It shows user input as “I am Davin (2000-2022)
Beckingham”, whereas the text used by RASA is “I am Davin 2000-2022 Beckingham”, without the “()” and also the entity gets correctly extracted.
In this segment, we learned how to develop custom components and apply them to our Rasa NLU pipeline. There are various custom components at our disposal, but it's crucial to know how each of them interact with the other processing components and to what output they lead us to.
Github repo - https://github.com/Bavalpreet/RASA-BLOG/tree/main/custom_component
LinkedIn Profile -