Error while computing

Message boards : Number crunching : Error while computing

To post messages, you must log in.

AuthorMessage
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2066 - Posted: 15 Jun 2011, 18:09:46 UTC
Last modified: 15 Jun 2011, 18:10:30 UTC

What can I do to avoid error while computing situation? Reason is here:

Stderr output

<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
wrapper: starting
Unrecognized XML in parse_init_data_file: userid
Skipping: 0
Skipping: /userid
Unrecognized XML in parse_init_data_file: teamid
Skipping: 0
Skipping: /teamid
Unrecognized XML in parse_init_data_file: hostid
Skipping: 26098
Skipping: /hostid
Unrecognized XML in parse_init_data_file: result_name
Skipping: rxpsb70-p5_0_13708584_1130_0
Skipping: /result_name
Unrecognized XML in parse_init_data_file: starting_elapsed_time
Skipping: 0.000000
Skipping: /starting_elapsed_time
running enigma2_0.76_i686-apple-darwin
wrapper: running ../../projects/www.enigmaathome.net/enigma2_0.76_i686-apple-darwin (-R)
2011-06-15 19:29:54  enigma: working on range ...

</stderr_txt>
]]>


This only happens on Mac, not Windows. Mac host is 26098.

Mac estimates that WU will be done in 35 seconds(!) but it takes 50 minutes plus minus few minutes. Event log shows this line:

Ke 15 Kes 20:54:53 2011 | Enigma@Home | Aborting task rxpsb70-p5_0_13708582_1114_0: exceeded elapsed time limit 2547.63 (1000000.00G/392.52G)
ID: 2066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ageless
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 11 Sep 07
Posts: 104
Credit: 155,932
RAC: 0
Message 2067 - Posted: 16 Jun 2011, 15:23:40 UTC - in response to Message 2066.  

Can you please check the value of the Duration Correction Factor on that computer?
You can find that in the details of the computer, down near the bottom.

Task duration correction factor 1.10601

It should be as my figure around 1.

Please check first, I can give instructions on how to reset this value when you return with an answer.
Jord.

BOINC FAQ Service.
ID: 2067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2068 - Posted: 16 Jun 2011, 17:13:43 UTC

DCF is 0,8033. It has been around 1 before without problems. I did upgrade OS and Boinc, but that has nothing to do with this, or does it?

Should Boinc set the DCF value itself?
ID: 2068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ageless
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 11 Sep 07
Posts: 104
Credit: 155,932
RAC: 0
Message 2069 - Posted: 17 Jun 2011, 7:17:21 UTC - in response to Message 2068.  
Last modified: 17 Jun 2011, 7:18:54 UTC

BOINC normally sets the DCF value itself, and with each task it does it will change the value of it up or down, as it's part of the BOINC learning process into how long tasks actually take.

A DCF of 0.8 isn't too bad, but let's reset it anyway. Just to check if that resets the estimated run time for you.

First exit BOINC completely: BOINC Manager->Advanced view->File->Exit->Check "Stop running science applications when exiting the Manager?"->OK.
Then navigate to your BOINC Data directory, which on OS X is by default at /Library/Applications Support/BOINC/

Edit the file called client_state.xml with a simple text editor. No need for specialized XML editors. The XML BOINC uses is developed especially for BOINC. XML editors won't know how to deal with this.
Search in it for Enigma
Stop search.
read through the lines to the <duration_correction_factor>X</duration_correction_factor> line.
Change the number here from 0.8033 to 1.000000 (mind using the decimal point, not a comma!)
Make sure not to change anything else!
Save the client_state.xml file.

Restart BOINC.
Now let it fetch work from here. What are the estimates on the work time, now?
Jord.

BOINC FAQ Service.
ID: 2069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2070 - Posted: 17 Jun 2011, 17:13:58 UTC - in response to Message 2069.  

Now let it fetch work from here. What are the estimates on the work time, now?

Estimates are now 44 seconds. Very under estimated.

Settings I had when requesting new work:

- Number of usable CPUs has changed from 2 to 1.
- cache is default (0.25 days)
- Boinc says DCF is exactly 1

Guess how many WUs I got?

Pe 17 Kes 19:56:02 2011 | Enigma@Home | Scheduler request completed: got 248 new tasks

I have now work for at least 4-5 days. I soon know if these will end up "error while computing" state. Maybe DCF should be 8.0 or more?

(Sorry about my poor English)
ID: 2070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2071 - Posted: 17 Jun 2011, 18:54:01 UTC
Last modified: 17 Jun 2011, 18:57:04 UTC

No luck still same error :-( I try higher DCF value: it is 20. Now Boinc estimates that WU will be done in 14 minutes 42 seconds. It´s still about 1/4 of real calculating time.
ID: 2071 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2072 - Posted: 17 Jun 2011, 19:35:35 UTC

DCF value of 20 did not helped. Task is still aborting:

Pe 17 Kes 22:31:11 2011 | Enigma@Home | Aborting task rxpsb70-p6_0_13708764_10_0: exceeded elapsed time limit 2547.63 (1000000.00G/392.52G)

What next?
ID: 2072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ageless
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 11 Sep 07
Posts: 104
Credit: 155,932
RAC: 0
Message 2075 - Posted: 22 Jun 2011, 16:08:47 UTC - in response to Message 2072.  

Sorry, I forgot all about this thread. Had some other things on my head.
I'll ask TJM to check whether the resource estimates are correct for the Mac. I hope he can spare some time away from his other project.
Jord.

BOINC FAQ Service.
ID: 2075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2103 - Posted: 13 Aug 2011, 16:45:11 UTC

Any new information about this problem?
ID: 2103 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 2107 - Posted: 25 Aug 2011, 20:08:48 UTC

Resource estimates are the same for all platforms, as they're hardcoded in the WU template.
I think it might be a problem with the core client/manager - is the benchmarked CPU speed correct ?

M4 Project homepage
M4 Project wiki
ID: 2107 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2110 - Posted: 26 Aug 2011, 14:08:43 UTC - in response to Message 2107.  

I think it might be a problem with the core client/manager - is the benchmarked CPU speed correct ?


Benchmark results are:

Ke 24 Elo 12:44:57 2011 | | Running CPU benchmarks
Ke 24 Elo 12:44:57 2011 | | Suspending computation - CPU benchmarks in progress
Ke 24 Elo 12:45:29 2011 | | Benchmark results:
Ke 24 Elo 12:45:29 2011 | | Number of CPUs: 2
Ke 24 Elo 12:45:29 2011 | | 3103 floating point MIPS (Whetstone) per CPU
Ke 24 Elo 12:45:29 2011 | | 7329 integer MIPS (Dhrystone) per CPU

They seems to be correct.

Boinc version is 6.12.26. If I remember right, this problem started after I upgraded OS from Leopard to Snow Leopard. At the same time I upgraded Boinc.
ID: 2110 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 2114 - Posted: 27 Aug 2011, 7:35:32 UTC
Last modified: 27 Aug 2011, 7:36:29 UTC

I think I know what's going on.
The wrapper doesn't return CPU time, because it can't read it on mac.
Probably BOINC core client thinks that the tasks are completed very fast and it then underestimates runtime (isn't the runtime stored somewhere in local files ?).


The solution would be to rebuild Mac wrapper again, I'll try to get remote access to Mac with dev tools, perhaps I'll be able to fix the problem.
M4 Project homepage
M4 Project wiki
ID: 2114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sami

Send message
Joined: 8 Apr 10
Posts: 8
Credit: 11,933,109
RAC: 0
Message 2183 - Posted: 1 Jan 2012, 20:08:09 UTC

I noticed that my Mac is crunching enigma@home again. So someone has managed to repair what ever the problem was. Thanks :-)
ID: 2183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 2184 - Posted: 4 Jan 2012, 15:41:29 UTC - in response to Message 2183.  

Nope, nothing was repaired on the app side, maybe just the newer server software fixes something related to this problem.

M4 Project homepage
M4 Project wiki
ID: 2184 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Error while computing




Copyright © 2024 TJM