Right now our development team is busy preparing a conference release of RTextTools for The 4th Annual Conference of the Comparative Policy Agendas Project at the University of Catania in Sicily. One of the key issues we've had thus far is memory consumption with very large datasets.
In the past week we've pushed out a slew of updates that allow the support vector machine and maximum entropy algorithms to run with low memory requirements, even on very large datasets. Unfortunately, not all the algorithms used in RTextTools support the changes we've made, so this leaves us with a two algorithm ensemble for low-memory classification. However, SVM and maxent tend to be the most accurate algorithms in our tests, meaning that a large ensemble isn't necessary to get high consensus accuracy.
In the past week we've pushed out a slew of updates that allow the support vector machine and maximum entropy algorithms to run with low memory requirements, even on very large datasets. Unfortunately, not all the algorithms used in RTextTools support the changes we've made, so this leaves us with a two algorithm ensemble for low-memory classification. However, SVM and maxent tend to be the most accurate algorithms in our tests, meaning that a large ensemble isn't necessary to get high consensus accuracy.