r - Estimating class probabilities with hierarchical random forest models -


i using random forest classifier (in r) predict spatial distribution of multiple native plant communities using variety of environmental variables predictors. classification system hierarchical each successive level becoming more detailed in class description. example, have hierarchical classification system 2 levels , upper level consists of 2 classes: forest (f) , grassland (g). lets second level each forest , grassland class composed of 2 subclasses (f1,f2 , g1,g2). using forest class example, subclasses might conifer or deciduous forests.

i know pretty basic far, here's challenge i've run into. i'd predict spatial distribution of these classes @ finest classification level there environmental variation acceptable accuracy. reduce variability can train multiple random forest models first model (model #1) operates @ uppermost level classifying observations either f or g. @ second level, subset data 2 groups based on f/g class , train 2 models (models #2 , #3) each classifying subset respective subclasses.

using these stacked models, predict class probability of new observation. using random forests, value number of trees voting particular class divided number of trees in forest. single new observation summarized random forest output might be:

level 1 (model #1)
- f, g = 80, 20

level 2 (models #2 , #3)
- f1, f2 = 80, 20
- g1, g2 = 70, 30

the output suggests new observation forest subclass of f1, how confident f1 correct class?

my questions firstly, there appropriate method calculating combined probability of new observation being f1 given modeling structure? secondly, if appropriate, how? (i suspect sort of bayesian approach using upper level probabilities priors might work i'm far proficient in bayesian statistics).

i apologize verbosity , not posting actual data/code (its hard extract both succinct , representative of issues given dataset). thanks!

i'm working on similar issue , have codified r package runs randomforest local classifier along pre-defined class hierarchy. can find in r-forge under 'hie-ran-forest'. package includes 2 ways turn local probabilities crisp class.

  1. stepwise majority rule- choose class highest proportion of votes in level 1 model, choose class highest proportion of votes in second level model
  2. multiplicative majority rule- multiply probabilities (proportion of votes) down class hierarchy , choose class highest multiplicative proportion of votes.

in example provided, both methods end f1, yet values:

f, g   = 0.6,  0.4 f1, f2 = 0.6,  0.4  g1, g2 = 0.95, 0.05 

the stepwise majority choose f1 (f in model 1 , f1 in model 2) while multiplicative choose g1 since

0.4*0.95 (g1) > 0.6*0.6 (f1) > 0.6*0.4 (f2) > 0.4*0.05 (g2) 

i don't think there 'correct' option , in general find 2 methods reach similar accuracy levels. stewpwise more sensitive mis-classification near root of tree. yet if model 1 correct, tend make less 'serious' mis-classification. on other hand, multiplicative less sensitive results of specific local classifier sensitive depth of class hierarchy , number of sibling in each local classifier.


Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -