Natural Language Processing
Natural Language Processing (NLP) is a Machine learning technique that enables computers to understand natural languages or speech (Adamopoulou, 2020). NLP is the pre-processing stage in the working of a chatbot. The NLP is responsible in transforming the question asked by the user to a suitable input for the chatbot that enables it to process so as to provide a suitable output. Hence the data that is transformed in this step is mainly used for two processes. First use being to use as a base knowledge of the system or its database. Secondly to gain insights from the user that could be used in the processing step. (Maroengsit, 2019)
NLP mainly comprises four steps namely – Pattern Matching, Parsing, Term Frequency-Inverse document frequency [TF-IDF] and Word2vec. Pattern Matching is an AI technique that matches the query raised by user, which is the input to the database. (Dahiya, 2017). Parsing is an AI technique that is used in the form of algorithm to take text input and parse each text into a part corresponding to a pre-defined rule of algorithms such as left-right and bottom-up algorithm. TF-IDF is a set of statistics that represents the word frequencies in a document. And finally, the word2vec is the process of change text corpus turns into a numerical form and plot into a vector space to create a knowledge base. (Maroengsit, 2019)
Natural Language Understanding (NLU) is a core of NLP which implements the natural user interface like Chatbot. It is a technique that aims to extract the meaning of the natural language input which may be unstructured. (Adamopoulou, 2020).
In short, NLU is the processing stage in the working of a chatbot. This process also comprises a series of steps which are intend classification, Dialogue planning, Name entity recognition, vector recognition with cosine similarity, Lexicon and Long Short-term memory (LSTM) (Maroengsit, 2019)
The pre-processed input from the user is moved to the first stage in NLU where the input is used to determine the intention of the customer, to process it in the knowledge base. And this process is known as Intent Classification. Another process that manages that is responsible for managing the communication with the user is known as Dialogue Planning. This step is responsible for managing the conversation with individual users or multiple users at the same time and also in switching the subject within the same conversation. Name entity recognition is responsible for identifying the label name, person, location or any such aspect related to the query raised by the user so as to create a knowledge base, for example, psychiatric counselling. Vector Recognition with Cosine Similarity is another technique that is helps in measuring two non-zero vectors for content-based similarity. Lexicon is a sort of dictionary or a collection of words that carries meaning to help the model to predict and segment the input. Long Term Short Term (LSTM) is a kind of Recurrent Neural Network (RNN) in Machine Learning. It is actually a combination of two memory types, which are – Long term and Short Term.
After the pre-processing part NLP, processing part NLU, the next stage is generating the output for the input given by the user. This stage is known as Natural Language Generation. This stage indicates whether the chatbot has been successful in understanding the input raised by the user and whether it has processed appropriately from its knowledge base. (Maroengsit, 2019)
Reference:
Adamopoulou, E. and Moussiades, L., 2020, June. An overview of chatbot technology. In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 373-383). Springer, Cham.
Dahiya, M., 2017. A tool of conversation: Chatbot. International Journal of Computer Sciences and Engineering, 5(5), pp.158-161.
Maroengsit, W., Piyakulpinyo, T., Phonyiam, K., Pongnumkul, S., Chaovalit, P. and Theeramunkong, T., 2019, March. A survey on evaluation methods for chatbots. In Proceedings of the 2019 7th International conference on information and education technology (pp. 111-119).