Wei Liu

Wei Liu

London, England, United Kingdom
848 followers 500+ connections

About

Research and apply NLP/Machine Learning techniques to solve interesting problems. Enjoy…

Articles by Wei

  • Don't ignore the bias in your ML system

    Don't ignore the bias in your ML system

    Just days before Christmas, I came across a Kaggle-like competition in China - CCF BDCI2019 金融信息负面及主体判定…

  • On China's cashless payment

    On China's cashless payment

    I travel back to China every year to visit my family. Every time I was overwhelmed by how fast things has changed, from…

    1 Comment

Activity

Join now to see all activity

Experience

  • Amazon Graphic

    Amazon

    London, England, United Kingdom

  • -

    London, United Kingdom

  • -

    London, United Kingdom

  • -

    London, United Kingdom

  • -

    London, United Kingdom

  • -

    London, United Kingdom

  • -

  • -

    Shenzhen, Guangdong, China

Education

  • The University of Sheffield Graphic

    The University of Sheffield

    -

    Activities and Societies: CSSA (Chinese Students and Scholars Association)

    deploy,maintain,upgrade the community website and discussion board.

  • -

    计算机科学与应用

Licenses & Certifications

Publications

  • Efficient Minimal Perfect Hash Language Models

    LREC 2010

    The availability of large collections of text have made it possible to build language models that incorporate counts of billions of n-grams. This paper proposes two new methods of efficiently storing large language models that allow O(1) random access and use significantly less space than all known approaches. We introduce two novel data structures that take advantage of the distribution of n-grams in corpora and make use of various numbers of minimal perfect hashes to compactly store language…

    The availability of large collections of text have made it possible to build language models that incorporate counts of billions of n-grams. This paper proposes two new methods of efficiently storing large language models that allow O(1) random access and use significantly less space than all known approaches. We introduce two novel data structures that take advantage of the distribution of n-grams in corpora and make use of various numbers of minimal perfect hashes to compactly store language models containing full frequency counts of billions of n-grams using 2.5 Bytes per n-gram and language models of quantized probabilities using 2.26 Bytes per n-gram. These methods allow language processing applications to take advantage of much larger language models than previously was possible using the same hardware and we additionally describe how they can be used in a distributed environment to store even larger models. We show that our approaches are simple to implement and can easily be combined with pruning and quantization to achieve additional reductions in the size of the language model.

    Other authors
    • David Guthrie
    See publication
  • Professor or screaming beast? - Detecting Word Misuse in Chinese

    LREC 2008

    The Internet has become a very popular platform for communication around the world. However because most modern computer
    keyboards are Latin-based, Asian language speakers (such as Chinese) cannot input characters (Hanzi) directly with these keyboards. As
    a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is
    the Pinyin input system. Pinyin is also called ”Romanised” Chinese in that it phonetically resembles a…

    The Internet has become a very popular platform for communication around the world. However because most modern computer
    keyboards are Latin-based, Asian language speakers (such as Chinese) cannot input characters (Hanzi) directly with these keyboards. As
    a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is
    the Pinyin input system. Pinyin is also called ”Romanised” Chinese in that it phonetically resembles a Chinese character. Due to the
    highly ambiguous mapping from Pinyin to Chinese characters, word misuses can occur using standard computer keyboard, and more
    commonly so in internet chat-rooms or instant messengers where the language used is less formal. In this paper we aim to develop a
    system that can automatically identify such anomalies, whether they are simple typos intentional substitutions. After identifying them,
    the system should suggest the correct word to be used.

    See publication
  • Chinese Text Classification without Automatic Word Segmentation

    ALPIT

    Due to the lack of word boundaries in Asian systems of writing, machine processing of these languages often involves segmenting text into word units. This paper tests the assumption that this segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not. We show extensive results for both tasks, considering both single words and short phrases as features, and examining the effect of document length on classification…

    Due to the lack of word boundaries in Asian systems of writing, machine processing of these languages often involves segmenting text into word units. This paper tests the assumption that this segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not. We show extensive results for both tasks, considering both single words and short phrases as features, and examining the effect of document length on classification accuracy. Our experiments show that a naïve character bi-gram model of text performs as well as models generated using a state-of-the-art automatic segmenter.

    Other authors
    See publication

Languages

  • English

    Native or bilingual proficiency

  • Mandarin Chinese

    Native or bilingual proficiency

  • Cantonese

    Native or bilingual proficiency

Recommendations received

More activity by Wei

View Wei’s full profile

  • See who you know in common
  • Get introduced
  • Contact Wei directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Wei Liu in United Kingdom

Add new skills with these courses