New Optimized apps for 64-bit Linux

Message boards : Number crunching : New Optimized apps for 64-bit Linux

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1037 - Posted: 8 Jun 2009, 13:11:39 UTC

Just tried the statically linked file on my 2.4.31 kernel, but got a segmentation fault because the kernel is too old. I may try to compile one for 2.4.X but I don't think there's too much demand for it anyways.....

Mike D
ID: 1037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ebahapo
Avatar

Send message
Joined: 11 Sep 07
Posts: 7
Credit: 306,962
RAC: 0
Message 1038 - Posted: 8 Jun 2009, 15:11:36 UTC - in response to Message 1036.  

(though you should consider installing the required libraries in the future).

This library is from PathScale, so it's better if you link that library statically instead.
ID: 1038 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1039 - Posted: 8 Jun 2009, 16:10:19 UTC - in response to Message 1038.  
Last modified: 8 Jun 2009, 16:36:32 UTC

(though you should consider installing the required libraries in the future).

This library is from PathScale, so it's better if you link that library statically instead.



Maybe, but that was the 1st instance of someone needing it. Quite a few others have used the executables before without needing that library.

Anyways, I may just compile an additional 64-bit statically linked march=AnyX86 executable and call it good. There doesn't seem to be much improvement via processor optimization on the Phenom, and the only the Athlon64 that TJM tested against had some jaw-dropping improvement. But I'm curious how the anyx86 runs on other processors vs the optimized versions.

If the anyx86 literally runs all all platforms running on the 2.6.X kernel, maybe we can make it the standard (at least on 32-bit)? That way even the anonymous users would benefit, and everything could get done sooner on the awgly100's. Granted the default app runs on my Pentium MMX, and this optimized app does not (due to my old computer having a 2.4.31 kernel), but I'd think the loss of a few 32-bit Pentiums would be OK since the vast majority of Linux users are on the 2.6.X kernel and are i686 (i.e. Pentium II) and higher. The productivity offset should make up for the loss of any Pentium computers out there working on this......

Mike Doerner
ID: 1039 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1040 - Posted: 8 Jun 2009, 16:20:24 UTC - in response to Message 1039.  

OK, I've added a statically linked 64-bit version to the file....so now we have 32-bit and 64-bit AnyX86 files. I won't bother with the other optimizations unless there's a specific architecture improvement that's needed.....Maybe Athlon64 if the march=anyx86 doesn't show the same improvement as march=athlon64.

Mike D
ID: 1040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1045 - Posted: 9 Jun 2009, 19:05:13 UTC - in response to Message 1039.  
Last modified: 9 Jun 2009, 19:08:46 UTC

If the anyx86 literally runs all all platforms running on the 2.6.X kernel, maybe we can make it the standard (at least on 32-bit)? That way even the anonymous users would benefit, and everything could get done sooner on the awgly100's. Granted the default app runs on my Pentium MMX, and this optimized app does not (due to my old computer having a 2.4.31 kernel), but I'd think the loss of a few 32-bit Pentiums would be OK since the vast majority of Linux users are on the 2.6.X kernel and are i686 (i.e. Pentium II) and higher. The productivity offset should make up for the loss of any Pentium computers out there working on this......

Mike Doerner



OK, I finally got the results I wanted......

hceyz72_0_7143677_r0 45,566.99 seconds on an Intel Pentium Mobile MMX 233 MHz - default app 32-bit

hceyz72_0_6673872_r0 1,106.93 seconds on an AMD Phenom 9950 overclocked 3.1 GHZ - Open64 -Ofast -anyx86 app 64-bit


Does it still make sense for the minimum requirements on the project to be a Pentium?!?! I'd think Pentium II (i.e. i686) makes more sense, along with a requirement for a 2.6.X kernel (which I think everyone on a linux box has except for my Pentium Mobile MMX) and the two static apps could then become the default apps for the linux side of the project. What do you think TJM?

Mike D

PS Is there a way on the BOINC d/loads to determine i686 or x86_64? Then you could actually get the hot-rodded apps into the appropriate 32/64-bit OS on the anonymous user machines.
ID: 1045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rpmrg

Send message
Joined: 12 Jun 09
Posts: 2
Credit: 451
RAC: 0
Message 1051 - Posted: 12 Jun 2009, 19:23:06 UTC
Last modified: 12 Jun 2009, 19:24:36 UTC

Amazing work for all AMD crunchers, but not all of them run Linux and plenty have Phenom II's running on Windows 64bit. Is it difficult for a Windows compile;, because all those CPU's are simply wasted in 32bit binaries.
ID: 1051 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1052 - Posted: 12 Jun 2009, 19:54:31 UTC - in response to Message 1051.  

Correct, I was referring to the Linux machines. Windows boxes are a lost cause anyways.....;-) They can do as they please.

Mike D
ID: 1052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rpmrg

Send message
Joined: 12 Jun 09
Posts: 2
Credit: 451
RAC: 0
Message 1053 - Posted: 12 Jun 2009, 20:52:10 UTC

Its unfortunate that you rule out the Windows AMD users, its a pitty that someone who wants to use his windows 64bit box to crunch at boinc doesnt deserve some optimized binaries to boost his units and justify his choise for AMD over Intel.
ID: 1053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1054 - Posted: 13 Jun 2009, 3:29:59 UTC - in response to Message 1053.  
Last modified: 13 Jun 2009, 3:36:00 UTC

Its unfortunate that you rule out the Windows AMD users, its a pitty that someone who wants to use his windows 64bit box to crunch at boinc doesnt deserve some optimized binaries to boost his units and justify his choise for AMD over Intel.



I feel your pain, but the thing is Open64 is only available on Linux (not AMD's fault either, they just optimized an existing compiler). TJM has tried several different compilers on the "dark side" so-to-speak, but the MinGW (gcc for Windows) gave the best results and he didn't have to re-write the code for Linux (always a bonus). Also, sometimes the licensing agreements can restrict how binaries can be distributed if the free compiler only allows for "personal use".

Since you are on a 64-bit windows platform, maybe you could download MinGW for Windows 64-bit here.....

Sourceforge MinGW 64-Bit

...and try compiling the source yourself, using the 64-bit flags? This version of MinGW is based on gcc 4.4.0, which from the 1st message in this thread did show some improvement over the 32-bit apps TJM compiled for the project. Why not give it a shot and then you can have an optimized 64-bit app for Windows? I'm assuming you're not a programmer, and that's no big deal, neither am I. If you run into issues compiling the source code either myself or TJM would be happy to assist you.

Then we can see whether Windows or Linux runs WU's faster on AMD architecture....;-)

Mike D

PS You could always run linux at night, and use Windows during the day. Set up a dual-boot option when you install a linux distribution and you can have the best of both worlds (even I have a WinXP 32-bit partition on my hard drive, as Solidworks doesn't run well on the WINE app within Linux).
ID: 1054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 70,705,112
RAC: 390,347
Message 1055 - Posted: 13 Jun 2009, 9:17:06 UTC
Last modified: 13 Jun 2009, 9:17:32 UTC

I tried most (if not all) of free compilers for Windows and the MinGW executables are the fastest I built.
I was going to try Intel C Compiler for Windows (it comes with 30 days trial period), but I had serious problems setting it up to work with MS Visual Studio. When I finally managed to build the executable, it's performance was poor, only slightly faster than default app built with MS VS 2005. But I'm not sure if the compiler worked properly, so it would be nice if someone with working Intel compiler could verify that.
M4 Project homepage
M4 Project wiki
ID: 1055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile plonk420

Send message
Joined: 5 Jun 09
Posts: 10
Credit: 422,558
RAC: 0
Message 1173 - Posted: 29 Jul 2009, 18:55:46 UTC
Last modified: 29 Jul 2009, 19:03:14 UTC

is this all the linux app_info.xml needs, now?

<app_info>
<app>
<name>enigma_m4_2</name>
<user_friendly_name>Enigma 0.76b-Opt</user_friendly_name>
</app>
<file_info>
<name>wrapper_5.22_i686-pc-linux-gnu</name>
<executable/>
</file_info>
<file_info>
<name>enigma_0.76_i686-pc-linux-gnu</name>
<executable/>
</file_info>
<file_info>
<name>job_1.16.xml</name>
</file_info>
<app_version>
<app_name>enigma_m4_2</app_name>
<version_num>522</version_num>
<file_ref>
<file_name>wrapper_5.22_i686-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>enigma_0.76_i686-pc-linux-gnu</file_name>
<open_name>enigma_0.76_i686-pc-linux-gnu</open_name>
</file_ref>
<file_ref>
<file_name>job_1.16.xml</file_name>
<open_name>job.xml</open_name>
</file_ref>
</app_version>
</app_info>


does it even need that open_name for the executable?
ID: 1173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1174 - Posted: 30 Jul 2009, 11:19:57 UTC
Last modified: 30 Jul 2009, 11:21:20 UTC

Hi Kids,

Well, I start work on Monday, but before I get buried I wanted to see if I could "tweak" a little more speed out of the Open64 64-bit app, and I think I have....

conf-cc: opencc -Wall -W -Ofast -m64 -march=anyx86 -fomit-frame-pointer

conf-ld: opencc -fomit-frame-pointer -s -m64 -ipa -IPA:field_reorder=ON

...nothing special here. Anyx86 runs faster than Barcelona on my Phenom. The -IPA:field_reorder=ON seems to take about 40-100 seconds off the AWGLY_0 tasks since I've started using it. That optimization helps reduce cache misses, and seems to be helping so far. Putting it in both conf-cc (compile) and conf-ld (linking) didn't help, but leaving it only on the linking stage seems to have helped a little.

I've tried all the other-IPA options but that one showed the only measurable improvement. If there's any other optimization flags I should try, please let me know.

Mike Doerner
ID: 1174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 70,705,112
RAC: 390,347
Message 1175 - Posted: 30 Jul 2009, 11:35:55 UTC - in response to Message 1174.  

Could you tell me which flags did you use to build the Athlon 64 app ? I've tried a lot of combinations and my fastest was much slower than yours. And I tried on a modified source code, which is a bit faster than default.

M4 Project homepage
M4 Project wiki
ID: 1175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1176 - Posted: 30 Jul 2009, 13:25:04 UTC - in response to Message 1175.  
Last modified: 30 Jul 2009, 13:38:50 UTC

The only things I changed were the -march=anyx86 to -march=barcelona, -march=athlon, -march=athlon64, or -march=athlon64fx (or the other architectures, like core, wolfdale, etc.) depending on which processor I was compiling for. The only other thing I changed was -m64 to -m32 for the 32-bit versions.

PS I also switched from gcc 4.3 to gcc 4.1 for most of the apps I compiled before (opencc uses these to help compile). But I didn't think it made that much of a difference on time to process data.

Mike D
ID: 1176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1177 - Posted: 30 Jul 2009, 15:31:26 UTC - in response to Message 1175.  
Last modified: 30 Jul 2009, 15:31:49 UTC

Could you tell me which flags did you use to build the Athlon 64 app ? I've tried a lot of combinations and my fastest was much slower than yours. And I tried on a modified source code, which is a bit faster than default.



Modified source code?!?!?! Care to share?:-D

Mike D
ID: 1177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 70,705,112
RAC: 390,347
Message 1178 - Posted: 2 Aug 2009, 16:23:09 UTC

It's the same source that I posted somewhere on the forum - some functions calls were removed and replaced by the functions code, it gives a small but noticable performance boost but the output executable is huge. At least on Intel processors there's a performance gain, gcc has serious problems building from this source when the -fschedule-insns is used and unfortunately without this option the performance on AMD processors is poor.

M4 Project homepage
M4 Project wiki
ID: 1178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1179 - Posted: 2 Aug 2009, 19:07:26 UTC - in response to Message 1178.  

That's like looking for a needle in a haystack. Can you email it to me? Thanks.

Mike Doerner
ID: 1179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1244 - Posted: 13 Sep 2009, 15:36:18 UTC

FWIW, my wife's old iMac came up with these results (G4 800 MHz) when I compiled the app with these settings. -O3 is apparently unstable according to the Gentoo Wiki page...

gcc -Wall -W -O2 -mcpu=7450 -pipe -maltivec -mabi=altivec

....and here's the results....not bad, but still pokey.

michael-doerners-imac:~/enigma_benchmark mdoerner$ ./start
2009-09-13 11:12:34 enigma: working on range ...
2009-09-13 11:30:27 enigma: finished range

real 17m52.103s
user 16m16.691s
sys 0m6.287s
michael-doerners-imac:~/enigma_benchmark mdoerner$

TJM, can I use the Linux wrappers to use this app with BOINC? Let me know.

Mike D
ID: 1244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 70,705,112
RAC: 390,347
Message 1245 - Posted: 13 Sep 2009, 15:47:50 UTC

The linux wrapper won't work on MAC, but you can always try to build default one from BOINC sources.
M4 Project homepage
M4 Project wiki
ID: 1245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Team kizb

Send message
Joined: 26 Oct 11
Posts: 1
Credit: 11,783
RAC: 0
Message 2139 - Posted: 27 Oct 2011, 18:17:08 UTC

Would the new AMD FX-8150 Zambezi 3.6GHz 8 core Processor work well with Boinc? I'm considering building a dedicated Boinc computer and am looking for ideas.
ID: 2139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : New Optimized apps for 64-bit Linux




Copyright © 2017 TJM