About
Research and apply NLP/Machine Learning techniques to solve interesting problems. Enjoy…
Articles by Wei
Activity
-
As Microsoft marks its incredible 50th Anniversary, I find myself reflecting on my own journey with this amazing company. Having spent a decade at…
As Microsoft marks its incredible 50th Anniversary, I find myself reflecting on my own journey with this amazing company. Having spent a decade at…
Liked by Wei Liu
-
Happy and excited for all the teams responsible for the launch of https://nova.amazon.com! Congrats! Read more at https://lnkd.in/erDXU9_j
Happy and excited for all the teams responsible for the launch of https://nova.amazon.com! Congrats! Read more at https://lnkd.in/erDXU9_j
Liked by Wei Liu
-
LLM and Human alignment with simple linear mapping
LLM and Human alignment with simple linear mapping
Shared by Wei Liu
Experience
Education
-
The University of Sheffield
-
Activities and Societies: CSSA (Chinese Students and Scholars Association)
deploy,maintain,upgrade the community website and discussion board.
-
-
计算机科学与应用
Licenses & Certifications
Publications
-
Efficient Minimal Perfect Hash Language Models
LREC 2010
The availability of large collections of text have made it possible to build language models that incorporate counts of billions of n-grams. This paper proposes two new methods of efficiently storing large language models that allow O(1) random access and use significantly less space than all known approaches. We introduce two novel data structures that take advantage of the distribution of n-grams in corpora and make use of various numbers of minimal perfect hashes to compactly store language…
The availability of large collections of text have made it possible to build language models that incorporate counts of billions of n-grams. This paper proposes two new methods of efficiently storing large language models that allow O(1) random access and use significantly less space than all known approaches. We introduce two novel data structures that take advantage of the distribution of n-grams in corpora and make use of various numbers of minimal perfect hashes to compactly store language models containing full frequency counts of billions of n-grams using 2.5 Bytes per n-gram and language models of quantized probabilities using 2.26 Bytes per n-gram. These methods allow language processing applications to take advantage of much larger language models than previously was possible using the same hardware and we additionally describe how they can be used in a distributed environment to store even larger models. We show that our approaches are simple to implement and can easily be combined with pruning and quantization to achieve additional reductions in the size of the language model.
Other authors -
-
Professor or screaming beast? - Detecting Word Misuse in Chinese
LREC 2008
The Internet has become a very popular platform for communication around the world. However because most modern computer
keyboards are Latin-based, Asian language speakers (such as Chinese) cannot input characters (Hanzi) directly with these keyboards. As
a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is
the Pinyin input system. Pinyin is also called ”Romanised” Chinese in that it phonetically resembles a…The Internet has become a very popular platform for communication around the world. However because most modern computer
keyboards are Latin-based, Asian language speakers (such as Chinese) cannot input characters (Hanzi) directly with these keyboards. As
a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is
the Pinyin input system. Pinyin is also called ”Romanised” Chinese in that it phonetically resembles a Chinese character. Due to the
highly ambiguous mapping from Pinyin to Chinese characters, word misuses can occur using standard computer keyboard, and more
commonly so in internet chat-rooms or instant messengers where the language used is less formal. In this paper we aim to develop a
system that can automatically identify such anomalies, whether they are simple typos intentional substitutions. After identifying them,
the system should suggest the correct word to be used. -
Chinese Text Classification without Automatic Word Segmentation
ALPIT
Due to the lack of word boundaries in Asian systems of writing, machine processing of these languages often involves segmenting text into word units. This paper tests the assumption that this segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not. We show extensive results for both tasks, considering both single words and short phrases as features, and examining the effect of document length on classification…
Due to the lack of word boundaries in Asian systems of writing, machine processing of these languages often involves segmenting text into word units. This paper tests the assumption that this segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not. We show extensive results for both tasks, considering both single words and short phrases as features, and examining the effect of document length on classification accuracy. Our experiments show that a naïve character bi-gram model of text performs as well as models generated using a state-of-the-art automatic segmenter.
Other authorsSee publication
Languages
-
English
Native or bilingual proficiency
-
Mandarin Chinese
Native or bilingual proficiency
-
Cantonese
Native or bilingual proficiency
Recommendations received
1 person has recommended Wei
Join now to viewMore activity by Wei
-
After 14 years, it's almost time for a new chapter... I've had an incredible opportunity to grow, learn and work on some truly ground-breaking…
After 14 years, it's almost time for a new chapter... I've had an incredible opportunity to grow, learn and work on some truly ground-breaking…
Liked by Wei Liu
-
BBC News Interview with Samantha Simmonds, following UK Prime Minister #RishiSunak's speech on #AI 🔒 Why Focus on risks now? AI is evolving fast…
BBC News Interview with Samantha Simmonds, following UK Prime Minister #RishiSunak's speech on #AI 🔒 Why Focus on risks now? AI is evolving fast…
Liked by Wei Liu
-
The weeks spent in the burning desert sun paid off! Excited to share that our publication "Stealthy Terrain-Aware Multi-Agent Active Search" has…
The weeks spent in the burning desert sun paid off! Excited to share that our publication "Stealthy Terrain-Aware Multi-Agent Active Search" has…
Liked by Wei Liu
-
Last night a team from across ComplyAdvantage took part in the J.P. Morgan Corporate Challenge in Battersea Park 🏃 With an average finish time of…
Last night a team from across ComplyAdvantage took part in the J.P. Morgan Corporate Challenge in Battersea Park 🏃 With an average finish time of…
Liked by Wei Liu
-
When robots have GUTS, they are brave! Humbled and honoured that our work "GUTS: Generalised Uncertainty-Aware Thompson Sampling for Multi-Agent…
When robots have GUTS, they are brave! Humbled and honoured that our work "GUTS: Generalised Uncertainty-Aware Thompson Sampling for Multi-Agent…
Liked by Wei Liu
-
NLP is getting more central in the tech industry and AI-based products. Especially with the support of language models powered by deep learning, the…
NLP is getting more central in the tech industry and AI-based products. Especially with the support of language models powered by deep learning, the…
Liked by Wei Liu
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Wei Liu in United Kingdom
-
Wei Liu
AI & Machine Learning Data Scientist | Applied AI in Finance & Healthcare | Formerly: Biotech, University of Cambridge, University of Glasgow
-
WEI LIU
Tsinghua SEM 24
-
Wei L.
-
Wei Liu
Software Engineer at Meta
155 others named Wei Liu in United Kingdom are on LinkedIn
See others named Wei Liu