Jump to content
Excelsior Forums
cheble

Performance Bug with JET Built EXE compared to Java Runtime

Recommended Posts

Hello,

I am having issues with my JET builds. When running my java program built with JET, the performance is much worse compared to running the program using Java 1.8.0_65. I am experience the program taking 4 to 5 times as long when running the built exe.

I have included a sample java program that demonstrates the issue. The general idea of the program is to read bytes from a file and parse them into two float arrays. The sample program hogs some memory (about 2GB) to simulate my actual (more complex) program and runs a couple of methods many times. The bug only seems to show up in my small sample program when I have the code using the 2GB of memory. An output file is created with the following lines:

* the number of bytes read from the file

* the total time for many repetitions of my readData() method

* the total time for many repetitions of Arrays.copyOfRange()

I ran this sample program using the jre included in jdk1.8.0_65 and then running a JET build of the program. The JET build readData() output times were about 5 times longer than when running with the jre.

Can someone try to reproduce these results using the attached java code and help me with a fix.

I will attach a zip containing the java file and a data file. The data file should be located at the runtime directory or its path added as a program argument.

Thanks,

My Environment Specs:

Windows 10 Home

Intel i7 64-bit

16 GB RAM

Excelsior Jet for Windows 11.0 with Maintenance Pack 1

JETSampleProgram.zip

Share this post


Link to post
Share on other sites

Hello,

Thank you very much for the provided example!

We have successfully reproduced the performance degradation. We need some time to perform the analyses of this issue.

Share this post


Link to post
Share on other sites

We just encountered similar performance issues with Jet 11.3. Performance on memory-intensive tasks is about 5-10x slower than the with JRE 1.8.0_121, and 2-5x slower than with Jet 9. This happens when memory use exceeds 1-1.5 GB. Even worse, we observe early OutOfMemory errors if virtual memory use is turned off. "Early" means that the errors while there is still plenty of empty memory for heap growth (e.g. after using 780 MB with > 2 GB free RAM).  

There are some indications that the slow performance is also due to memory management issues. Object allocations and garbage collection are very fast for smaller data sets, but GC slows down to significantly slower than JRE speeds when memory use goes above 1-1.5 GB. 

Apparently, the performance issues for large data sets originally reported here are still present in the latest Jet version, even though you have had a sample program illustrating this issue. Can you please provide an update on this issue?

 

Share this post


Link to post
Share on other sites


We have extensively studied the provided example and there are some points we need to explain:

    1. This example may be optimized in naive way by just wrapping data into ByteBuffer in the following way:        

buf = ByteBuffer.wrap(data) 

    After this,  ByteBuffer.put() may be replaced with buf.getInt(), buf.getFloat() which speed ups the execution by 3-5 times (both on Oracle HS and JET).


    2. Bench results depend on inlining settings. Our performance analysis team made an example (based on the provided above) on which x86-compiled application has comparable to Hotspot performance, when inlining level is set to default. However, when inlining level is set to "Very Aggressive" in JET Control Panel, resulting executable outperforms Hotspot by the factor of 1.5 on the same bench.

Inline planning has very big influence on performance, which ahead-of-time compiler may fail to predict because it lacks execution profile information. Default value is chosen to be applicable for the majority of applications, but It is reasonable to try different inlining levels for particular application and compare the resulting performance.

    3. Excelsior JET compiler for x86 applications has more optimizations implemented than x64 compiler. That is why, if application uses less than 2GB of RAM, it is worth trying to compile this appication with x86 edition of Excelsior JET and compare the performance of both executables.


The upcoming release of Excelsior JET is devoted for various performance improvements in both (x86 and x64) compilers. Note, that the example provided by cheble is in our short-list for performance analysis.

Share this post


Link to post
Share on other sites
On January 27, 2017 at 5:01 AM, AlexandrFIlatov said:


We have extensively studied the provided example and there are some points we need to explain:

    1. This example may be optimized in naive way by just wrapping data into ByteBuffer in the following way:        


buf = ByteBuffer.wrap(data) 

    After this,  ByteBuffer.put() may be replaced with buf.getInt(), buf.getFloat() which speed ups the execution by 3-5 times (both on Oracle HS and JET).

Considering that this was example code provided to illustrate a problem, optimizing makes little sense. Furthermore, if the speed improvement is also seen with the Oracle JRE, then the performance difference would remain.

On January 27, 2017 at 5:01 AM, AlexandrFIlatov said:


    2. Bench results depend on inlining settings. Our performance analysis team made an example (based on the provided above) on which x86-compiled application has comparable to Hotspot performance, when inlining level is set to default. However, when inlining level is set to "Very Aggressive" in JET Control Panel, resulting executable outperforms Hotspot by the factor of 1.5 on the same bench.

Inline planning has very big influence on performance, which ahead-of-time compiler may fail to predict because it lacks execution profile information. Default value is chosen to be applicable for the majority of applications, but It is reasonable to try different inlining levels for particular application and compare the resulting performance.

Unfortunately, using "Very aggressive" inlining makes no difference for our code. Jet compiled code still is between 2x and 5x slower. Performance of x86 Jet code is less important for us, since using RAM beyond the x86 limits is essential for our software. That said, we find similar performance differences for x86 compiled code when using about 1 GB of RAM. In two tests, performance was 1.5x slower than JRE for one data set, and 6.5x slower for a second data set that was about 2x larger. The performance penalty alone is bad enough, but seeing that it gets worse in what looks like N-square fashion with larger data sets is a big problem, too. We will have to stop using Jet for future releases of our software.

Share this post


Link to post
Share on other sites

Our results on your benchmark (64-bit versions of Excelsior JET participate), the smaller, the better:

data length: 18413

Excelsior JET 11.3 (default)
Total time for 200000 Repetitions: 37,115,176,783

Excelsior JET 12 (default)
Total time for 200000 Repetitions: 29,748,295,723

Excelsior JET 12 (PGO enabled)
Total time for 200000 Repetitions: 14,051,042,243

------------

Performance of Arrays.copyOfRange() across these versions is basically the same.

The results will tend to improve in future releases too.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×