In modern Natural Language Processing (NLP) and Information Retrieval (IR), individual words are typically embedded in vector space, called `word vectors' or `word embedding', to enable differentiable optimization in neural networks. This leads to a new NLP paradigm that could deal with individual words in neural networks. The first issue of the above paradigm is that components in neural networks (like word vectors and hidden states) usually do not convey any concrete physical meaning. One typical way is to use probabilities as well-constrained quantities to better understand neural network components. The challenge of traditional probability theory is that it cannot treat words as atomic discrete events since words are embedded as dense vectors that are not necessarily mutually orthogonal. This thesis proposes a novel framework based on Quantum Probability Theory (QPT) that defines probability axioms in vector space, to probabilistically ground word representation, semantic composition, and semantic abstraction in a unified space. Another issue of the paradigm is that the inductive bias of learning word vectors relies on only the distributional hypothesis: \textit{linguistic items with similar distributions have similar meanings}, while other aspects are usually ignored. This thesis focuses on one of the most nontrivial aspects, namely the spatially or temporally sequential aspect of words. The spatially sequential aspect refers to capture the spatial position of words in any bag-of-words document encoders, while the temporally sequential aspect refers to mine the time-specific word meaning in the scenario when word meanings may evolve with time. Interestingly, the complex-valued word embedding (with amplitude terms and phase terms), which is induced from QPT, could be naturally used to model sequence (both for spacial sequence and temporal sequence) by directly encoding sequential order in phase terms. The benefit is that the rotation nature of phases in waves makes sequential encoding being always bounded no matter how long the length of the sequence/dynamics is. Furthermore, a side effect of the thesis is to bridge the gap between \textit{complex-valued word embeddings} and \textit{sinusoidal position embedding}; it therefore reinterprets commonly-used yet `magic' sinusoidal position embedding in a principled way: sinusoidal position embedding is a real-valued variant of the proposed complex-valued word embeddings. Beyond the spatial dimension, the thesis also explores sinusoidal embeddings in temporally-sequential dimension, called `Word2Fun', for the temporal evolution of words. Word2Fun is proved to be able to approximate any continuous word meaning evolution. The thesis implements the QPT framework with 1) a Quantum Probability Driven neural Network (QPDN) for document modeling that achieves comparable performance with SOTA approaches in text classification benchmarks; and 2) a further extension for text matching, called `complex-valued network for matching' (CNM) , that achieves comparable performance with SOTA approaches in question answering (a typical matching task) benchmarks. This additionally shows the potential to use complex-valued word embedding in general document representation. For the complex-valued word embedding in sequential modeling, the empirical study also evidences the superiority of the `complex-valued word embedding' in spatial sequence modeling and Word2Fun in temporal sequence modeling.

In modern Natural Language Processing (NLP) and Information Retrieval (IR), individual words are typically embedded in vector space, called `word vectors' or `word embedding', to enable differentiable optimization in neural networks. This leads to a new NLP paradigm that could deal with individual words in neural networks. The first issue of the above paradigm is that components in neural networks (like word vectors and hidden states) usually do not convey any concrete physical meaning. One typical way is to use probabilities as well-constrained quantities to better understand neural network components. The challenge of traditional probability theory is that it cannot treat words as atomic discrete events since words are embedded as dense vectors that are not necessarily mutually orthogonal. This thesis proposes a novel framework based on Quantum Probability Theory (QPT) that defines probability axioms in vector space, to probabilistically ground word representation, semantic composition, and semantic abstraction in a unified space. Another issue of the paradigm is that the inductive bias of learning word vectors relies on only the distributional hypothesis: \textit{linguistic items with similar distributions have similar meanings}, while other aspects are usually ignored. This thesis focuses on one of the most nontrivial aspects, namely the spatially or temporally sequential aspect of words. The spatially sequential aspect refers to capture the spatial position of words in any bag-of-words document encoders, while the temporally sequential aspect refers to mine the time-specific word meaning in the scenario when word meanings may evolve with time. Interestingly, the complex-valued word embedding (with amplitude terms and phase terms), which is induced from QPT, could be naturally used to model sequence (both for spacial sequence and temporal sequence) by directly encoding sequential order in phase terms. The benefit is that the rotation nature of phases in waves makes sequential encoding being always bounded no matter how long the length of the sequence/dynamics is. Furthermore, a side effect of the thesis is to bridge the gap between \textit{complex-valued word embeddings} and \textit{sinusoidal position embedding}; it therefore reinterprets commonly-used yet `magic' sinusoidal position embedding in a principled way: sinusoidal position embedding is a real-valued variant of the proposed complex-valued word embeddings. Beyond the spatial dimension, the thesis also explores sinusoidal embeddings in temporally-sequential dimension, called `Word2Fun', for the temporal evolution of words. Word2Fun is proved to be able to approximate any continuous word meaning evolution. The thesis implements the QPT framework with 1) a Quantum Probability Driven neural Network (QPDN) for document modeling that achieves comparable performance with SOTA approaches in text classification benchmarks; and 2) a further extension for text matching, called `complex-valued network for matching' (CNM) , that achieves comparable performance with SOTA approaches in question answering (a typical matching task) benchmarks. This additionally shows the potential to use complex-valued word embedding in general document representation. For the complex-valued word embedding in sequential modeling, the empirical study also evidences the superiority of the `complex-valued word embedding' in spatial sequence modeling and Word2Fun in temporal sequence modeling.

Monitoraggio ed esplorazione dei contenuti dinamici utilizzando gli spazi vettoriali / Wang, Benyou. - (2022 Mar 11).

Monitoraggio ed esplorazione dei contenuti dinamici utilizzando gli spazi vettoriali

WANG, Benyou
2022

Abstract

In modern Natural Language Processing (NLP) and Information Retrieval (IR), individual words are typically embedded in vector space, called `word vectors' or `word embedding', to enable differentiable optimization in neural networks. This leads to a new NLP paradigm that could deal with individual words in neural networks. The first issue of the above paradigm is that components in neural networks (like word vectors and hidden states) usually do not convey any concrete physical meaning. One typical way is to use probabilities as well-constrained quantities to better understand neural network components. The challenge of traditional probability theory is that it cannot treat words as atomic discrete events since words are embedded as dense vectors that are not necessarily mutually orthogonal. This thesis proposes a novel framework based on Quantum Probability Theory (QPT) that defines probability axioms in vector space, to probabilistically ground word representation, semantic composition, and semantic abstraction in a unified space. Another issue of the paradigm is that the inductive bias of learning word vectors relies on only the distributional hypothesis: \textit{linguistic items with similar distributions have similar meanings}, while other aspects are usually ignored. This thesis focuses on one of the most nontrivial aspects, namely the spatially or temporally sequential aspect of words. The spatially sequential aspect refers to capture the spatial position of words in any bag-of-words document encoders, while the temporally sequential aspect refers to mine the time-specific word meaning in the scenario when word meanings may evolve with time. Interestingly, the complex-valued word embedding (with amplitude terms and phase terms), which is induced from QPT, could be naturally used to model sequence (both for spacial sequence and temporal sequence) by directly encoding sequential order in phase terms. The benefit is that the rotation nature of phases in waves makes sequential encoding being always bounded no matter how long the length of the sequence/dynamics is. Furthermore, a side effect of the thesis is to bridge the gap between \textit{complex-valued word embeddings} and \textit{sinusoidal position embedding}; it therefore reinterprets commonly-used yet `magic' sinusoidal position embedding in a principled way: sinusoidal position embedding is a real-valued variant of the proposed complex-valued word embeddings. Beyond the spatial dimension, the thesis also explores sinusoidal embeddings in temporally-sequential dimension, called `Word2Fun', for the temporal evolution of words. Word2Fun is proved to be able to approximate any continuous word meaning evolution. The thesis implements the QPT framework with 1) a Quantum Probability Driven neural Network (QPDN) for document modeling that achieves comparable performance with SOTA approaches in text classification benchmarks; and 2) a further extension for text matching, called `complex-valued network for matching' (CNM) , that achieves comparable performance with SOTA approaches in question answering (a typical matching task) benchmarks. This additionally shows the potential to use complex-valued word embedding in general document representation. For the complex-valued word embedding in sequential modeling, the empirical study also evidences the superiority of the `complex-valued word embedding' in spatial sequence modeling and Word2Fun in temporal sequence modeling.
Dynamic Content Monitoring and Exploration using Vector Spaces
11-mar-2022
In modern Natural Language Processing (NLP) and Information Retrieval (IR), individual words are typically embedded in vector space, called `word vectors' or `word embedding', to enable differentiable optimization in neural networks. This leads to a new NLP paradigm that could deal with individual words in neural networks. The first issue of the above paradigm is that components in neural networks (like word vectors and hidden states) usually do not convey any concrete physical meaning. One typical way is to use probabilities as well-constrained quantities to better understand neural network components. The challenge of traditional probability theory is that it cannot treat words as atomic discrete events since words are embedded as dense vectors that are not necessarily mutually orthogonal. This thesis proposes a novel framework based on Quantum Probability Theory (QPT) that defines probability axioms in vector space, to probabilistically ground word representation, semantic composition, and semantic abstraction in a unified space. Another issue of the paradigm is that the inductive bias of learning word vectors relies on only the distributional hypothesis: \textit{linguistic items with similar distributions have similar meanings}, while other aspects are usually ignored. This thesis focuses on one of the most nontrivial aspects, namely the spatially or temporally sequential aspect of words. The spatially sequential aspect refers to capture the spatial position of words in any bag-of-words document encoders, while the temporally sequential aspect refers to mine the time-specific word meaning in the scenario when word meanings may evolve with time. Interestingly, the complex-valued word embedding (with amplitude terms and phase terms), which is induced from QPT, could be naturally used to model sequence (both for spacial sequence and temporal sequence) by directly encoding sequential order in phase terms. The benefit is that the rotation nature of phases in waves makes sequential encoding being always bounded no matter how long the length of the sequence/dynamics is. Furthermore, a side effect of the thesis is to bridge the gap between \textit{complex-valued word embeddings} and \textit{sinusoidal position embedding}; it therefore reinterprets commonly-used yet `magic' sinusoidal position embedding in a principled way: sinusoidal position embedding is a real-valued variant of the proposed complex-valued word embeddings. Beyond the spatial dimension, the thesis also explores sinusoidal embeddings in temporally-sequential dimension, called `Word2Fun', for the temporal evolution of words. Word2Fun is proved to be able to approximate any continuous word meaning evolution. The thesis implements the QPT framework with 1) a Quantum Probability Driven neural Network (QPDN) for document modeling that achieves comparable performance with SOTA approaches in text classification benchmarks; and 2) a further extension for text matching, called `complex-valued network for matching' (CNM) , that achieves comparable performance with SOTA approaches in question answering (a typical matching task) benchmarks. This additionally shows the potential to use complex-valued word embedding in general document representation. For the complex-valued word embedding in sequential modeling, the empirical study also evidences the superiority of the `complex-valued word embedding' in spatial sequence modeling and Word2Fun in temporal sequence modeling.
Monitoraggio ed esplorazione dei contenuti dinamici utilizzando gli spazi vettoriali / Wang, Benyou. - (2022 Mar 11).
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_benyou (22) (1).pdf

accesso aperto

Descrizione: thesis
Tipologia: Tesi di dottorato
Dimensione 3.15 MB
Formato Adobe PDF
3.15 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3445087
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact