This is a growing list of questions, please feel free to suggest new items!
- 🤯 Oops Ramses crashed, what should I do?
- 🏃🏾Running Ramses
- Can I restart my simulation with more cores ?
- I’ve run my first simulation. I want to check the effect of some part of the setup on the results. Do I need to rerun the whole simulation?
- What are the units of X parameter in the nml file?
- What are the code units?
- My simulation is unexpectedly slow, or has suddenly slowed down, what can I do?
- 🏭 Processing Ramses simulations (🚧WIP)
🤯 Oops Ramses crashed, what should I do?
Was the code configured correctly?
- Are the paths, such as to the initial conditions (if there are any), correct?
- Are all the parameters in the .nml file correctly spelt?
- Were the
ngridmax
/npartmax high enough (grep your log file for “max”)? - Did you compile with the same libraries as available in your computing environment?
- Did you allocate enough hydro fields at compile time (is
NVAR
correct? – check theMakefile
). Are other such preprocessor instructions correctly set? - If running with MPI (mpirun), did you compile with
MPI=1
?
Example errors messages
Wrong ramses executable (1.):
HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:146): execvp error on file /wrong_path/ramses3d (No such file or directory)
Wrong init file (1.):
forrtl: No such file or directory
forrtl: severe (29): file not found, unit 10, file wrong_path/ic_part
Incorrect nml parameter (2.):
Incorrect nml parameter:
forrtl: severe (19): invalid reference to variable in NAMELIST input, unit 1, file /some_path/nml.nml, line xx, position yy
ngridmax
too small (same with npartmax
) (3.):
No more free memory
Increase ngridmax
Wrong NVAR (5.):
Here with Z=0 instead of 1
rt_init(): Something wrong with NVAR.
Should have NVAR=2+ndim+1*metal+1*dcool+1*aton+IRtrap+nIons
Have NVAR= 8
Should have NVAR= 9
STOPPING!
Was there a problem with your computing environment?
- Did it run out of RAM? Look for oom (out of memory) messages. If you’re using a computing cluster, consult its documentation and launch jobs with more RAM per MPI task.
- If running on a cluster, did you load the same modules in your submission script as for compilation?
- If you are restarting a simulation, is the number of processes consistent with prior runs?
- Did ramses run in the right folder? Did you have permission/disk space to write output?
- If you ran the code on a computing cluster, make a note of the nodes that your code was running on. If you encounter repeated but unexplained crashes, there could be some issue with the parallelism or specific compute nodes. Reach out to the tech support for your computing environment.
Example errors messages
Restart with a different number of processors (3.):
No specific messages:
Abort(2) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 2) - process 2
Was there a bug in the code?
- If no other clear problem is apparent, try recompiling the code with debug hooks and flags (Set
DEBUG=1
in the makefile). The problem may become clearer. - Is the error being raised in part of the code that you are actively developing? If so there is likely a problem. Add some checks to your code to verify that any new variables and calculations have the values you expect during execution. Pay attention to conversions between code units and physical units.
- If the error appears to be coming from one of the core modules of RAMSES, it is more likely that something is wrong in the configuration of the code and the setup of the simulation (e.g. invalid or strange values in initial conditions leading to unexpected negative numbers or divisions by zero). If the problem persists, before declaring an issue on GitHub, try reaching out for help from the Ramses community (see the Contact page).
🏃🏾Running Ramses
Can I restart my simulation with more cores ?
No. (While not impossible in theory, no ready-to-use tool exists. Let us know if you develop one.)
I’ve run my first simulation. I want to check the effect of some part of the setup on the results. Do I need to rerun the whole simulation?
Changes to the .nml
file can be made between restarts. Depending on the physical consistency of such a change, you may not need to re-run the whole simulation. E.g. To investigate the impact of black hole seed mass I could restart some margin of time before the formation of the first black hole in my simulation.
What are the units of X parameter in the nml file?
Check the description of parameters in the documentation. If you can’t find the information, please report it to us.
Note
If all else fails, you could grep for the parameter name and look for where it is used in the code. There may be a helpful comment next to its definition in the corresponding ramses/X/X_params.f90 file. Otherwise, it is likely converted to code units wherever it is used by the code. From there, you can reverse engineer its units.
What are the code units?
If your run a cosmological run (cosmo=.true.), the code units are defined in amr/units.f90
. For non cosmo run (cosmo=.false.
), the conversion factors between code units and CGS units are defined in the &UNIT_PARAMS
namelist block. Default is CGS.
My simulation is unexpectedly slow, or has suddenly slowed down, what can I do?
- Does the slowdown happen suddenly or gradually? Check the timestep length in the log files. Maybe there is a physical explanation for the slowdown. For instance, feedback can heat gas and accelerate particles, which often leads to smaller timesteps and more computations.
- Maybe you are forming a very high number of particles or sinks. Check density thresholds if using them.Maybe the maximum resolution is higher than you thought.
- Check the configured levelmin and levelmax are correct. Maybe too many cells are being refined to the maximum level. Check the occupation of different levels in the log file and refinement criteria in your .nml.
- Maybe your load balancing is not good [check the log]. Try to play with sub-cycling or nremap to load-balance more often.
- Are you using a module for the first time, or have you just developed something in RAMSES? Try reverting to a version you know better to see if this change is causing the difference in performance.
🏭 Processing Ramses simulations (🚧WIP)
The code has written snapshots to output_XXXXX
directories, how do I identify their redshift or time?
Each snapshot directory contains the data written by each task for the different resolution elements in the code (hydro, part, etc). They also contain a number of helpful .txt
files:
info_XXXXX.tx
t: contains cosmological info for this snapshot, including the current expansion factor or simulation time (is that true?) and corresponding units conversions from code units to CGS. [here are some example bash functions for getting the redshift of a given snapshot]part_file_descriptor.txt
: contents, format, and precision of the different data types in the particle files [version dependent]hydro_file_descriptor.txt
: contents, format, and precision of the different data types in the hydro files [version dependent]- A copy of the
.nml
file used for the simulation [version dependent?]
How can I reduce the amount of data for long term storage?
- If computed in double precision, some fields can be likely re-written to single precision (but be careful with particle ids, positions, and velocities).
- When all project post-processing is complete, snapshot files can be compressed into archives with e.g. gzip
- Perhaps not all snapshots are required. Discarded snapshots could be regenerated from saved ones.
- For some snapshots, and depending on your use cases, it may be viable to extract a sub-region of the simulation for permanent storage, and to discard the rest of the snapshot.