twitter - how to train a maxent classifier -


[project stack : java, opennlp, elasticsearch (datastore) , twitter4j read data twitter]

i intend use maxent classifier classify tweets. understand initial step train model. documentation found have gistrainer based train method train model. have managed put simple piece of code makes use of opennlp's maxent classifier train model , predict outcome.

i have used 2 files postive.txt , negative.txt train model

contents of positive.txt

positive    positive    best positive    fantastic positive    super positive    fine  positive    nice 

contents of negative.txt

negative    bad negative    ugly negative    worst negative    worse negative    sucks 

and java methods below generate outcome.

@override public void traindataset(string source, string destination) throws exception {     file[] inputfiles = fileutil.buildfilelist(new file(source)); // trains both positive , negative.txt     file modelfile = new file(destination);     tokenizer tokenizer = simpletokenizer.instance;     categorydatastream ds = new categorydatastream(inputfiles, tokenizer);     int cutoff = 5;     int iterations = 100;     bagofwordsfeaturegenerator bowfg = new bagofwordsfeaturegenerator();     doccatmodel model = documentcategorizerme.train("en", ds, cutoff,iterations, bowfg);     model.serialize(new fileoutputstream(modelfile)); }  @override public void predict(string text, string modelfile) {     inputstream modelstream = null;     try{         tokenizer tokenizer = simpletokenizer.instance;         string[] tokens = tokenizer.tokenize(text);         modelstream = new fileinputstream(modelfile);         doccatmodel model = new doccatmodel(modelstream);         bagofwordsfeaturegenerator bowfg = new bagofwordsfeaturegenerator();          documentcategorizer categorizer = new documentcategorizerme(model, bowfg);         double[] probs   = categorizer.categorize(tokens);         if(null!=probs && probs.length>0){             for(int i=0;i<probs.length;i++){                 system.out.println("double[] probs index  " + + " value " + probs[i]);             }         }         string label = categorizer.getbestcategory(probs);         system.out.println("label " + label);         int bestindex = categorizer.getindex(label);         system.out.println("bestindex " + bestindex);         double score = probs[bestindex];         system.out.println("score " + score);     }     catch(exception e){         e.printstacktrace();     }     finally{         if(null!=modelstream){             try {                 modelstream.close();             } catch (ioexception e) {                 e.printstacktrace();             }         }     } }  public static void main(string[] args) {     try {         string outputmodelpath = "/home/**/sd-sentiment-analysis/models/trainpostive";         string source = "/home/**/sd-sentiment-analysis/sd-core/src/main/resources/datasets/";         maximunentropyclassifier me = new maximunentropyclassifier();         me.traindataset(source, outputmodelpath);         me.predict("this bad", outputmodelpath);     } catch (exception e) {         e.printstacktrace();     } } 

i have following questions.

1) how iteratively train model? also, how add new sentences/words model ? there specific format data file? found file needs have minimum of 2 words separated tab. understanding valid? 2) there publicly available data sets can use train model? found sources movie reviews. project i'm working on involves not movie reviews other things such product reviews, brand sentiments etc. 3) this helps extent. there working example somewhere publicly available? couldn't find documentation maxent.

please me out. kind'a blocked on this.

1) can store samples in database. used accumulo once this. @ interval rebuild model , reprocess data. 2) format is: categoryname space sample newline. no tabs 3) sounds want combine general sentiment topic or entity. use name finder or regex find entity or add entity class labels doccat include product name etc , samples have specific


Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -