• Login
    View Item 
    •   NWU-IR Home
    • Conference Papers
    • Conference Papers - Vaal Triangle Campus
    • View Item
    •   NWU-IR Home
    • Conference Papers
    • Conference Papers - Vaal Triangle Campus
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    N-gram based language identification of individual words

    Thumbnail
    View/Open
    prasa2013-03.pdf (282.3Kb)
    Date
    2013
    Author
    Giwa, Oluwapelumi
    Davel, Marelie H.
    Metadata
    Show full item record
    Abstract
    Various factors influence the accuracy with which the language of individual words can be classified using n-grams. We consider a South African text-based language identification (LID) task and experiment with two different types of n-gram classifiers: a Näıve Bayes classifier and a Support Vector Machine. Specifically, we investigate various factors that influence LID accuracy when identifying generic words (as opposed to running text) in four languages. These include: the importance of n-gram smoothing (Katz smoothing, absolute discounting and Witten-Bell smoothing) when training Naıve Bayes classifiers; the effect of training corpus size on classification accuracy; and the relationship between word length, n-gram length and classification accuracy. For the best variant of each of the two sets of algorithms, we achieve relatively comparable classification accuracies. The accuracy of the Support Vector Machine (88.16%, obtained with a Radial Basis function) is higher than that of the Naıve Bayes classifier (87.62%, obtained using Witten-Bell smoothing), but the latter result is associated with a significantly lower computational cost. Index Terms: text-based language identification, smoothing, character n-grams, Naıve Bayes classifier, support vector machine.
    URI
    http://hdl.handle.net/10394/11525
    Collections
    • Conference Papers - Vaal Triangle Campus [84]
    • Faculty of Natural and Agricultural Sciences [4855]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV