Waikato Environment for Knowledge AnalysisExcelsior JET Case Study
By: Fran Supek, BSc Waikato Environment for Knowledge Analysis (Weka) is an open-source collection of machine learning algorithms for data mining tasks, maintained by the University of Waikato, New Zealand. The software is a widely accepted standard in the field and is commonly used in a variety of application, ranging from biomedical to financial data analysis. Weka is written in the Java programming language and is normally run under a Java Virtual Machine. Parts of the software are very computationally intensive and might benefit from a native machine code implementation, such as using a JET compiled executable. We examined the computation times for four common tasks in Weka 3.5.1 on a medium-sized dataset (file "segment-challenge.arff" included with Weka) representing an image recognition problem. The dataset contains 1500 instances, divided in 7 classes and described by 19 attributes each. First two tasks included training an SVM classifier with a linear or a radial kernel (-R option); all other parameters were left at default values. Third task uses a Random Forests classifier with size of the forest set to 100, and the forth task involves attribute ranking using ReliefF with default settings. All tasks were set to use ten-fold crossvalidation. Testing was performed on an AMD Athlon 64 2800+ with 1 GB of RAM running Windows XP SP2 (32 bit). The "Client VM" and "Server VM" are from Sun Java SDK 1.5.0_05. An "-Xmx700m" flag was added to the command line to expand the memory available for the Sun Java VMs. The JET executable was created using "Moderate" as the function inlining level and using the "Classic" compiler setting. The following table contains time in seconds Weka needs to perform a task. A median value of three measurements is shown (it should be noted that individual measurements agreed very well). The chart shows the improvement over the default setting, the Sun Java Client VM - for instance, a bar of height 2 would indicate two times faster computation.
Probably the most commonly used (and very powerful) classifier in Weka is its implementation of the SVM (Support Vector Machine) using a radial basis function kernel; here, JET compiled executable is over twice as fast as the Sun Client VM. The simpler linear kernel SVM is over 7 times faster, and the Random Forest, which repeatedly builds C4.5 decision trees, improves some 40% over the Sun Client VM. Using the Relief F attribute ranking scheme is actually a bit slower on the JET compiled executable. Weka startup time, as well as the responsiveness of its Explorer GUI are difficult to measure but my impression is that both improve significantly when using JET compiled executable. In conclusion, some commonly used and computationally intensive parts of Weka machine learning software experience a 2 - 7 fold increase in speed when using the Excelsior JET compiled executable in comparison to the Sun Client VM. |
||||||||||||||||||||
|
Home | Company | Products | Services | Resources | Blog | Contact | Request a Call Site: Search | Sitemap | Forum | Credits © 1999-2007 Excelsior LLC. All Rights Reserved. |