Linux x86_64 optimizing tips

Message boards : Number crunching : Linux x86_64 optimizing tips

To post messages, you must log in.

AuthorMessage
oh2hyt

Send message
Joined: 14 Jul 09
Posts: 53
Credit: 705,427,365
RAC: 0
Message 2243 - Posted: 19 Mar 2012, 11:03:52 UTC
Last modified: 19 Mar 2012, 11:38:01 UTC

I try tell here what I did to build fastest program for debian 6.0.4 running on Intel Core 2 Quad Q9450 (Yorkfield 2x6MB L2 cache) machine. Just my findings, not maybe whole truth. Read with own risk.

First some benchmarking cpu time results with using TJM's eb.tgz. I hope he gives working download link for it. These are averaged with intuition from several runs, because random number in code makes benchmark times vary.

~2m10s icc -O2 -xHost -no-prec-div -ipo [best found options]
~2m13s icc -O3 -xHost -no-prec-div -ipo [O3 for comparison]
~2m26s opencc -Ofast -m64 -march=core -LNO:prefetch=0 [best found options]
~2m50s gcc-4.4.5 with source package default flags.
~3m35s executable distributed by enigma boinc server.

I would want to build with icc -static option, but icc doesn't work with my gcc libs staticly. I'm not going to build compatible libs now.

icc = Intel composer_xe_2011_sp1.9.293
opencc = Open64 5.0


How?
Get IntelĀ® C++ Composer XE 2011 for Linux from
http://software.intel.com/en-us/articles/non-commercial-software-development/
Yes, you have to register, because you get activation serial that way.

Install it to your distribution, google can give help.

Get source for enigma program, which Enigma@Home uses. And extract it.
http://www.bytereef.org/enigma-suite.html
http://www.bytereef.org/software/enigma-suite-0.76.tar.gz

How I builded and installed new optimized executable.
0) As icc installation program tells run "source /opt/intel/bin/compilervars.sh intel64" in bash shell to get compiler enviroment in use. (Compiler installed under /opt).
1) Because I tested lot of things, I did first once "make".
2) Then I edited compile and load changing gcc to icc and optimizing options/flags to those mentined above. These should be near best for all intel cpus.
3) Then did real building with "make clean; make".
4) And copied resulting enigma executable to over "projects/www.enigmaathome.net/enigma2_0.76_i686-pc-linux-gnu".
5) And added suitable app_info.xml. One can be found for example from http://chomikuj.pl/rakowskipw/boinc/enigma/opty+enigma .

Done.

-With icc -Ofast and -O3 were slower than -O2. -ipo improved. -xHost is best, and didn't find negative effect newer than SSE2 sses. Actually with highers result was faster.
-Doing profiling did not result any changes at benchmark, but in real use I feel really small improvement.
-Doing unrolling tries made no improvements. Neither inline. So icc does those okey with -O2.
-Trying source with #define SIMPLESCORE resulted clearly worse times, so loop blocking made in default source is effective.
-Disabling software prefetch slowed down. (With opencc it improved)

Then.. Q9450 cores 0 and 1 use different L2 than cores 2 and 3. Linux scheduler has habit to move program around cores and from L2 to other. To avoid this I made small background script. Use carefully.
#!/bin/sh

oldpids=""
# at first pass asap and rest every minute
while [ -z "$pids" ] || $(sleep `date '+(60 - %S.%N)' | bc`)
do
    pids="$(pgrep -U boinc enigma2)"
    if [ "$pids" != "$oldpids" ]
    then
        oldpids="$pids"
        echo "$pids" | awk '{if (NR<5) system("taskset -cp "NR-1" "$1)}' > /dev/null
        echo "`date +'%Y-%m-%d %H:%M:%S'` : $pids" | perl -pe 's/\n/ /g'; echo
    fi
done

(Why forum code tag doesn't show indenting?)

Results in workunits: original 76min -> gcc 58min -> opencc 52-53min -> icc 51-52min.
ID: 2243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
oh2hyt

Send message
Joined: 14 Jul 09
Posts: 53
Credit: 705,427,365
RAC: 0
Message 2244 - Posted: 20 Mar 2012, 7:23:36 UTC
Last modified: 20 Mar 2012, 7:24:23 UTC

Well after 5.31 rolling to users this week, all "instructions" are not valid - only tips!
ID: 2244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
oh2hyt

Send message
Joined: 14 Jul 09
Posts: 53
Credit: 705,427,365
RAC: 0
Message 2245 - Posted: 20 Mar 2012, 9:49:24 UTC
Last modified: 20 Mar 2012, 10:19:53 UTC

Correction to first post results in workunits: original 76min -> gcc 58min -> opencc 52min -> icc 47min.
So +62% faster.

TJM, mind share 5.31 source? Enigma-Suite-0.76 is GPL2 anyway.
ID: 2245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
matajan

Send message
Joined: 19 Apr 10
Posts: 6
Credit: 851,215
RAC: 0
Message 2246 - Posted: 21 Mar 2012, 0:41:59 UTC

Hi TJM,

The old 0.26 version:

Name: enigma2_0.76_windows_intelx86.exe
CRC-32: 13807b06
MD4: 4ddc6b07ebe9726390e8bef7111f4147
MD5: 85a51a8f3b2e79b3680028193032dd53
SHA-1: 296d6efba7c883353fc5e00531ba4d2434b6dcb9


The new 0.32 version:

Name: enigma_5.32_windows_intelx86.exe
CRC-32: 13807b06
MD4: 4ddc6b07ebe9726390e8bef7111f4147
MD5: 85a51a8f3b2e79b3680028193032dd53
SHA-1: 296d6efba7c883353fc5e00531ba4d2434b6dcb9


Is the SAME 0.26 file with a new name or this is a mistake?


Best regards, matajan.
ID: 2246 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 2250 - Posted: 21 Mar 2012, 12:53:48 UTC - in response to Message 2246.  
Last modified: 21 Mar 2012, 13:00:32 UTC

Yep, the core is the same, only wrapper has been changed. It now uses functions taken directly from Stefan Krah's enigma to read/parse checkpoints.
Perhaps I'll use the 'plan class' to release basic optimized apps, but the problem is that most of the apps here were build for specific processors (some even with code tweaking), not specific instruction sets.

The sources used to build the apps I provided were the stock from Stefan Krah's enigma-suite. Well, mostly, because there were 2-3 experimental versions with code reorganised a bit, like a special build for athlons 64, with manually unrolled loops. I doubt if I still have the source, as mdoerner's app made it obsolete (it's much faster).
M4 Project homepage
M4 Project wiki
ID: 2250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 2251 - Posted: 22 Mar 2012, 1:03:30 UTC - in response to Message 2250.  

Or was much faster....since I've regressed to Win7 64-bit I don't have the old apps I compiled back then. That and my ISP yanked my web space so I don't have the old .tgz files either. I'd use the flags I published in my earlier post as a starting point....then see if you can get better results.

Mike Doerner
ID: 2251 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
oh2hyt

Send message
Joined: 14 Jul 09
Posts: 53
Credit: 705,427,365
RAC: 0
Message 2253 - Posted: 22 Mar 2012, 23:31:46 UTC

Yeah right.. my script had one bug. ['s -z looks only length. So when no enigmas are running loop looses its sleep delay - and eats nicely cpu.

Fixed:
#!/bin/sh

oldpids=""
# loop every minute
while $(sleep `date '+(60 - %S.%N)' | bc`)
do
    pids="$(pgrep -U boinc enigma2)"
    if [ "$pids" != "$oldpids" ]
    then
        oldpids="$pids"
        echo "`date +'%Y-%m-%d %H:%M:%S'` : $pids" | perl -pe 's/\n/ /g'; echo
        echo "$pids" | awk '{if (NF==1 && NR<5) system("taskset -cp "NR-1" "$1)}
' > /dev/null
    fi
done

ID: 2253 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Linux x86_64 optimizing tips




Copyright © 2024 TJM