optimized applications

Message boards : Number crunching : optimized applications

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Paul D Harris

Send message
Joined: 15 Feb 12
Posts: 4
Credit: 72,585
RAC: 0
Message 2213 - Posted: 20 Feb 2012, 22:09:36 UTC

Are there any optimized applications for use at seti I use lunatics but I do not see any used here. I believe it would speed up my work units it takes now about 3 hours for a work unit to complete.
ID: 2213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ageless
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 11 Sep 07
Posts: 104
Credit: 155,932
RAC: 0
Message 2216 - Posted: 26 Feb 2012, 14:33:13 UTC - in response to Message 2213.  

ID: 2216 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Peciak
Avatar

Send message
Joined: 27 Aug 09
Posts: 9
Credit: 117,918,807
RAC: 0
Message 2236 - Posted: 10 Mar 2012, 9:27:17 UTC

WINDOWS INTEL
WINDOWS AMD
LINUX

http://chomikuj.pl/rakowskipw/boinc/enigma/opty+enigma[/url]
ID: 2236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul D Harris

Send message
Joined: 15 Feb 12
Posts: 4
Credit: 72,585
RAC: 0
Message 2239 - Posted: 14 Mar 2012, 2:16:02 UTC - in response to Message 2236.  

WINDOWS INTEL
WINDOWS AMD
LINUX

http://chomikuj.pl/rakowskipw/boinc/enigma/opty+enigma[/url]


Thanks got it installed just waiting on wu now.
ID: 2239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul D Harris

Send message
Joined: 15 Feb 12
Posts: 4
Credit: 72,585
RAC: 0
Message 2240 - Posted: 17 Mar 2012, 1:14:27 UTC - in response to Message 2239.  

WINDOWS INTEL
WINDOWS AMD
LINUX

http://chomikuj.pl/rakowskipw/boinc/enigma/opty+enigma[/url]


Thanks got it installed just waiting on wu now.


I am doing wu in 1.5 hrs now it took off about 1 hr.
ID: 2240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Peciak
Avatar

Send message
Joined: 27 Aug 09
Posts: 9
Credit: 117,918,807
RAC: 0
Message 3930 - Posted: 9 May 2016, 17:19:07 UTC

ID: 3930 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Agbar
Avatar

Send message
Joined: 10 Sep 09
Posts: 28
Credit: 690,568
RAC: 0
Message 3933 - Posted: 15 May 2016, 14:03:24 UTC - in response to Message 3930.  
Last modified: 15 May 2016, 14:03:52 UTC

Greate you've wirtten about my project here, but please don't redistribute Enigma-Optima. I would like to know how many people are using it - Github stats will be distorted by downloads from chomikuj.
ID: 3933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 24 Feb 13
Posts: 18
Credit: 55,194,685
RAC: 0
Message 3936 - Posted: 20 May 2016, 18:00:00 UTC

Thank you, I will test this app (x64 beta 3).

Is there a chance for getting ARM optimized app ?
ID: 3936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Agbar
Avatar

Send message
Joined: 10 Sep 09
Posts: 28
Credit: 690,568
RAC: 0
Message 3937 - Posted: 21 May 2016, 16:06:31 UTC - in response to Message 3936.  

Thanks for your interest.

I would like to concentrate on x86 (preferably 64 bit). I don't know much about ARM architecture, so preparing ARM optimized app would be a huge challenge, and potentially time consuming. There are few other factors too. Summarizing (not going into details): no ARM version in foreseeable future.
ID: 3937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 24 Feb 13
Posts: 18
Credit: 55,194,685
RAC: 0
Message 3938 - Posted: 21 May 2016, 20:34:42 UTC

Well, I compared this optimized app and the previous one on my i7 2600K running at 4.2 GHz, using two series of 100 UT.

It seems that Optima is almost 25% faster than the previous optimized app.

Thank you for your work, I have spread the word on the Alliance Francophone forum.
ID: 3938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Agbar
Avatar

Send message
Joined: 10 Sep 09
Posts: 28
Credit: 690,568
RAC: 0
Message 3939 - Posted: 21 May 2016, 22:26:41 UTC - in response to Message 3938.  

It seems that Optima is almost 25% faster than the previous optimized app.

Speed gain depends on CPU model you have. My teammates reported biggest wins on AVX2 enabled CPUs (ie Intel Core i7-4770K).

If you encounter any problems (like trashing work units) please reach me here or on BOINC@Poland forum (we have English language board) or - even better - on Github.
ID: 3939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 24 Feb 13
Posts: 18
Credit: 55,194,685
RAC: 0
Message 3940 - Posted: 22 May 2016, 17:41:34 UTC
Last modified: 22 May 2016, 17:41:51 UTC

Excellent, you included the latest features of modern CPU (in this case AVX2).

I suppose that this app run AVX on Sandy bridge (or close CPU from early 2010's). What does it exploit on older CPU ? SSE4 ? Is there an improvement on these old CPU which don't have AVX ?

By the way, you may warn users that this app will increase power consumption and temperature of the CPU because it exploit AVX.
ID: 3940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Agbar
Avatar

Send message
Joined: 10 Sep 09
Posts: 28
Credit: 690,568
RAC: 0
Message 3942 - Posted: 23 May 2016, 22:48:05 UTC - in response to Message 3940.  

There are several upgrades to original code.

1. (no SSE) First thing I realized was that in a hot path (decode + score) there are many multiplications. They are used mainly to calculate addresses in multidimensional arrays. For a few percent of space (I don' remember exact number, it is trivial to calculate however) I changed multiplication by 26 to 32. Multiplication by 32 can be done with "shift left" instruction. For most modern (>= Pentium ;) processors shift can be done in 1 cycle while multiplication takes at least 3 (up to Nehalem; Sandy Bridge has faster multiplier, but it doesn't matter 'cause it has AVX).

Using well optimizing compiler like one from Intel I had fastest optimized enigma app at the time (it was around 3 years ago). And it is included for processors that doesn't support any versions of code.

2. SSSE3 (notice triple S for Supplemental SSE3) Basically I owned first gen Core i7 and i5 machines, so I started tinkering to get something faster than "basic" code. It wasn't that easy because I needed to use intrinsics - it is much easier to write than real assembly. However it turned out that GCC emits undeniably code for __builtin_shuffle(16x8bit_vector,32x8bit_permutation) (as it turned the most important function to make it work) so I was forced to write it on my own. And this faster shuffle is crucial to get any considerable speed improvement over "basic" code. In simple words as "basic" decodes one character by another, SSSE3 code decodes group of up to 16 characters at a time.
AVX does not include required operations so Sandy/Ivy Bridge processors execute the same code (SSSE3)

3. AVX2: Haswell includes operations on 256bit integer vectors (including PSHUFB instruction). Code is quite simple extrapolation of SSSE3 code to wider registers.

There is still room for improvements. Currently I am testing "basic" version that is almost as fast as SSSE3 on my computer: less than 2% slower! Seriously GCC would do better job, because all I have done was to fight one "optimization". I believe it can be even faster.

SSSE3 code isn't optimized very well and I am sure AVX2 is even worse (I don't own anything THAT modern yet, I tested correctness on Intel SDE) It is fast enough to be faster than basic code ;)

AVX code might be marginally faster than SSSE3, but it's probably too much work compared to the results.

It is possible to "extrapolate" ideas of this code to support AVX512, but it is to early now as it would be a dead code for next year. Purley and Cannonlake are expected to be released in 2017.

By the way, you may warn users that this app will increase power consumption and temperature of the CPU because it exploit AVX.

You are probably right. But notice that any modern CPU has Turbo Boost or equivalent - processors automatically adjust for power, so there should be no real change.
ID: 3942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jj666

Send message
Joined: 10 Mar 14
Posts: 8
Credit: 68,796,111
RAC: 0
Message 3973 - Posted: 15 Jun 2016, 6:52:33 UTC

Great work here! Works very well on my old Xeon Proliant servers.

Cheers,

-jj-
ID: 3973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Agbar
Avatar

Send message
Joined: 10 Sep 09
Posts: 28
Credit: 690,568
RAC: 0
Message 3976 - Posted: 15 Jun 2016, 22:31:03 UTC - in response to Message 3973.  

Thanks!

I was about to ask everybody if it works fine? Are there any issues with this app? If no I would publish this version as v1.0.0 (stable).

As I said before I have faster basic (no SSSE3) version that would work better on older processors. I had Pentium 3 in mind while doing it, but I noticed recently that some older Opterons don't have it either (example and capabilities.) I must write some tests first as this app isn't simple recompilation as most of others of this kind, but - you know - I have a life besides programming ;)

I want to say thank you to all of you who uses it, especially guys from top list. I didn't expect this size of deployment at this stage!
ID: 3976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jj666

Send message
Joined: 10 Mar 14
Posts: 8
Credit: 68,796,111
RAC: 0
Message 3977 - Posted: 16 Jun 2016, 6:22:29 UTC

As some benchmarking, from the previous optimised binaries in the days since I've been using:-

Proliant G6: Xeon X5670 (x2) @ 2.93ghz -> points per day around 31,500 to 41,500.
Proliant G7: Xeon X5690 (x2) @ 3.47ghz -> points per day around 35,000 to 47,000.

No broken WU's seen in the last days that I have been using.

Given that it's one app, and not a bunch of different apps compiled differently, it working much better for deployment (I do also have an AMD Bulldozer and an i5 machine I crunch on infrequently).

Thanks again!

Cheers,

-jj-
ID: 3977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
stiwi

Send message
Joined: 20 May 12
Posts: 19
Credit: 109,893,954
RAC: 0
Message 3979 - Posted: 19 Jun 2016, 13:45:34 UTC - in response to Message 3976.  
Last modified: 19 Jun 2016, 13:47:10 UTC

Thanks!

I was about to ask everybody if it works fine? Are there any issues with this app? If no I would publish this version as v1.0.0 (stable).

As I said before I have faster basic (no SSSE3) version that would work better on older processors. I had Pentium 3 in mind while doing it, but I noticed recently that some older Opterons don't have it either (example and capabilities.) I must write some tests first as this app isn't simple recompilation as most of others of this kind, but - you know - I have a life besides programming ;)

I want to say thank you to all of you who uses it, especially guys from top list. I didn't expect this size of deployment at this stage!



I had 4 errors but I don't know if they were caused by the optimized app or something different is wrong:

<core_client_version>7.4.8</core_client_version>
<![CDATA[
<stderr_txt>
Wrapper v5.26 build 8: starting
03:11:14 (4644): wrapper: running enigma_0.76.exe (-R -o results.txt 00trigr.cur 00bigr.cur 00ciphertext)
Enigma Optima v1.0.0-beta.3 Windows64
Best ISA: AVX
Seed set to: 1466298699.
2016-06-19 03:11:39 enigma: working on range ...
2016-06-19 03:54:13 enigma: finished range
03:54:14 (4644): called boinc_finish

</stderr_txt>
<message>
finish file present too long
</message>
]]>

All other taks works fine and fast :)
ID: 3979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ben

Send message
Joined: 1 Dec 15
Posts: 1
Credit: 399,478
RAC: 0
Message 3980 - Posted: 20 Jun 2016, 3:55:46 UTC

I've been doing a few work units with my i5 and the agbars intel app. The 30 point wu times used to be around 15 minutes i can now do them in approximately 10 minutes. The 50 and 60 point wus i can crunch in just over 20 minutes. Amazing work. Have done over 100 wus without any issues
ID: 3980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LOVIT

Send message
Joined: 23 Apr 10
Posts: 2
Credit: 60,954
RAC: 0
Message 3982 - Posted: 21 Jun 2016, 15:59:08 UTC

ho hoo great work!
on my old (haha main computer) C2Q 9300@ 3GHz i do 52 points WU with standard app
cca 2500s (41 mins; 10 seconds) and now cca 1550 s (25 mins 50s), thats amazing 40% speed increase!
thanks mister
ID: 3982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Agbar
Avatar

Send message
Joined: 10 Sep 09
Posts: 28
Credit: 690,568
RAC: 0
Message 3983 - Posted: 22 Jun 2016, 12:31:48 UTC - in response to Message 3979.  
Last modified: 22 Jun 2016, 12:33:06 UTC

I had 4 errors but I don't know if they were caused by the optimized app or something different is wrong:

<message>
finish file present too long
</message>
]]>


By any chance, do you have task ids?

In 202892545 there is an error:
enigma: error: resume file is not in the right format 

I don't remember touching anything in this part of the code. Maybe disk error or OC? Without resume file I can't tell what went wrong. Unfortunately when the task fail that file is deleted.

I bet it is disk/filesystem problem.
ID: 3983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : optimized applications




Copyright © 2024 TJM