todo_list_please_dont_remove.txt


To-do list for the different flavors of SPECFEM:
------------------------------------------------

(items listed in no particular order)

- suggestion 48:
----------------
In the new merged CPU / GPU version of SPECFEM3D_GLOBE, fix all these warnings that we get when using the Cray compiler.
They correspond to Fortran statements that are correct but that induce memory copies in and out of subroutine arguments, which may slow down the code
(significantly if these calls are used very often i.e. are for instance inside a big loop; not that significantly if the call is used only once;
however, it would be better / cleaner for the code to compile without a single warning. Here is the full list of Cray warnings (as of the beginning of Oct 2013):

ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/parallel.sharedmpi.o src/shared/parallel.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/spline_routines.shared.o src/shared/spline_routines.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/write_VTK_file.shared.o src/shared/write_VTK_file.f90

  call gather_all_cr(glob_data(1,:),nglob,store_val_ux_all,nglob,NPROCTOT)
                              ^                                            
ftn-1438 crayftn: CAUTION WRITE_VTK_DATA_CR_ALL, File = src/shared/write_VTK_file.f90, Line = 483, Column = 31 
  This argument produces a copy in and copy out to a temporary variable.

  call gather_all_cr(glob_data(2,:),nglob,store_val_uy_all,nglob,NPROCTOT)
                              ^                                            
ftn-1438 crayftn: CAUTION WRITE_VTK_DATA_CR_ALL, File = src/shared/write_VTK_file.f90, Line = 484, Column = 31 
  This argument produces a copy in and copy out to a temporary variable.

  call gather_all_cr(glob_data(3,:),nglob,store_val_uz_all,nglob,NPROCTOT)
                              ^                                            
ftn-1438 crayftn: CAUTION WRITE_VTK_DATA_CR_ALL, File = src/shared/write_VTK_file.f90, Line = 485, Column = 31 
  This argument produces a copy in and copy out to a temporary variable.

ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/meshfem3D_par.check.o src/meshfem3D/meshfem3D_par.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/meshfem3D.check.o src/meshfem3D/meshfem3D.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/meshfem3D_models.check.o src/meshfem3D/meshfem3D_models.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/add_missing_nodes.check.o src/meshfem3D/add_missing_nodes.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/model_aniso_mantle.check.o src/meshfem3D/model_aniso_mantle.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/compute_forces_outer_core_noDev.solverstatic.o src/specfem3D/compute_forces_outer_core_noDev.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/compute_forces_outer_core_Dev.solverstatic.o src/specfem3D/compute_forces_outer_core_Dev.F90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/compute_kernels.solverstatic.o src/specfem3D/compute_kernels.F90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/compute_seismograms.solverstatic.o src/specfem3D/compute_seismograms.f90

                hxir_store(irec_local,:),hetar_store(irec_local,:),hgammar_store(irec_local,:), &
                          ^                                                                       
ftn-1438 crayftn: CAUTION COMPUTE_SEISMOGRAMS_ADJOINT, File = src/specfem3D/compute_seismograms.f90, Line = 276, Column = 27 
  This argument produces a copy in and copy out to a temporary variable.
                                                    ^                                             
ftn-1438 crayftn: CAUTION COMPUTE_SEISMOGRAMS_ADJOINT, File = src/specfem3D/compute_seismograms.f90, Line = 276, Column = 53 
  This argument produces a copy in and copy out to a temporary variable.
                                                                                ^                 
ftn-1438 crayftn: CAUTION COMPUTE_SEISMOGRAMS_ADJOINT, File = src/specfem3D/compute_seismograms.f90, Line = 276, Column = 81 
  This argument produces a copy in and copy out to a temporary variable.

                hpxir_store(irec_local,:),hpetar_store(irec_local,:),hpgammar_store(irec_local,:), &
                           ^                                                                         
ftn-1438 crayftn: CAUTION COMPUTE_SEISMOGRAMS_ADJOINT, File = src/specfem3D/compute_seismograms.f90, Line = 277, Column = 28 
  This argument produces a copy in and copy out to a temporary variable.
                                                      ^                                              
ftn-1438 crayftn: CAUTION COMPUTE_SEISMOGRAMS_ADJOINT, File = src/specfem3D/compute_seismograms.f90, Line = 277, Column = 55 
  This argument produces a copy in and copy out to a temporary variable.
                                                                                   ^                 
ftn-1438 crayftn: CAUTION COMPUTE_SEISMOGRAMS_ADJOINT, File = src/specfem3D/compute_seismograms.f90, Line = 277, Column = 84 
  This argument produces a copy in and copy out to a temporary variable.

ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/compute_stacey_crust_mantle.solverstatic.o src/specfem3D/compute_stacey_crust_mantle.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/compute_stacey_outer_core.solverstatic.o src/specfem3D/compute_stacey_outer_core.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/multiply_arrays_source.solverstatic.o src/specfem3D/multiply_arrays_source.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/noise_tomography.solverstatic.o src/specfem3D/noise_tomography.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/write_output_ASCII.solverstatic.o src/specfem3D/write_output_ASCII.f90
ftn  -eC -eD -ec -en -eI -ea -g -G0  -I./obj -I.  -I. -I./setup -c -o obj/write_output_SAC.solverstatic.o src/specfem3D/write_output_SAC.f90

      call write_n_real(seismogram_tmp(iorientation,1:seismo_current),seismo_current)
                                      ^                                               
ftn-1438 crayftn: CAUTION WRITE_OUTPUT_SAC, File = src/specfem3D/write_output_SAC.f90, Line = 624, Column = 39 
  This argument produces a copy in and copy out to a temporary variable.

      call write_n_real(real(seismogram_tmp(iorientation,1:seismo_current)),seismo_current)
                        ^                                                                   
ftn-1438 crayftn: CAUTION WRITE_OUTPUT_SAC, File = src/specfem3D/write_output_SAC.f90, Line = 626, Column = 25 
  This argument produces a copy in to a temporary variable.

Cray Fortran : Version 8.1.8 


- suggestion 47: (from Craig P Steffen at NCSA)
----------------
Make checkpointing/restarting easier to use and not limited to a NUMBER_OF_RUNS > 1;
i.e. on big machines we should be able to use the existing routines to checkpoint even when NUMBER_OF_RUNS == 1, in case the run fails and needs to be restarted.
This will be useful on very big machines, in particular in SPECFEM3D_GLOBE.

- suggestion 46:
----------------
In SPECFEM2D, implement the changes described and partially done in SPECFEM2D/UTILS/Ra_Cleave_record_pressure_in_addition_to_displacement to record pressure in addition to displacement at the receivers in the same run. Currently this can only be done by doing the same run twice.

- suggestion 45:
----------------
In SPECFEM3D_GLOBE, fix this known 64-bit issue when -mcmodel=medium -shared-intel is used in order to handle very large static memory size per core
(the 3D_Cartesian and 2D packages are fine because that compiler option is never used in them):

If you run very large meshes on a relatively small number
of processors, the static memory size needed on each processor might become
greater than 2 gigabytes, which is the upper limit for 32-bit addressing
(dynamic memory allocation is always OK, even beyond the 2 GB limit; only static memory has a problem).
In this case, on some compilers you may need to add -mcmodel=medium (if you do not use the Intel ifort / icc compiler)
or -mcmodel=medium -shared-intel (if you use the Intel ifort / icc compiler)
to the configure options of CFLAGS, FCFLAGS and LDFLAGS otherwise the compiler will display an error
message (for instance 'relocation truncated to fit: R\_X86\_64\_PC32 against .bss' or something similar);

BEWARE that using -mcmodel=medium -shared-intel is known for currently leading to incorrect seismograms,
at least when flag ATTENUATION is on (and maybe even without), at least in the case of the Intel ifort / icc compiler.
This likely comes from intrinsic functions such as size() that return and integer8 instead of an integer4, thus leading
to incorrect results when used in function calls is the -i8 flag is not added to the compiler options;
however, when adding -i8 the code does not compile because the MPI calls then refuse to compile (they need integer4 as arguments).
It seems that this problem is triggered when 3D models are used and mostly (only?) when attenuation is on, but not triggered for 1D Earth model benchmarks
even with attenuation (at least not triggered for EXAMPLES/benchmarks/attenuation_benchmark_GJI_2002_versus_normal_modes nor for Anne Sieminski's three 1D benchmarks), which makes it uneasy to locate.
Since most current users run the code with less than 2 GB of static memory per core, we have not investigated that problem carefully for now,
and thus recommend that you do NOT use these compiler flags until this problem is located and fixed.

- suggestion 44:
----------------
From Anne Obermann (ISTerre, Grenoble, France): add an option to record seismograms of div and curl at each receiver instead of (or in addition to) displacement.

- suggestion 43:
----------------
To make the three codes (2D, 3D_Cartesian, 3D_GLOBE) conform strictly to the Fortran standard,
we should replace all "call system(...)" statements with "call execute_command_line(...)", which is the
new standard way in Fortran2008. However, this intrinsic subroutine is currently (July 2013) not supported 
for instance by Intel ifort and thus we purposely do not do it yet, but around 2015 or so we should do it.

- suggestion 42:
----------------
if the material property of element adjacent to PML changed, make sure we also need to change
the material property of elements inside PML adjacent to this element correspondingly.

- suggestion 41:
----------------
make sure users avoid receivers located in PML region when using PML in adjoint simulation (i.e. add a stop statement if a receiver
is detected inside a PML element)

- suggestion 40:
----------------
Currently, the way PML is implemented for a fluid/solid simulation is memory consuming.
We should in the future reduce the memory usage in the following way:
1: replace the rmass_acoustic_interface(nglob_acoustic)  with rmass_acoustic_interface(nglob_acoustic_solid_interface)
2: replace the potential_dot_dot_acoustic_interface(nglob_acoustic)  with potential_dot_dot_acoustic_interface(nglob_acoustic_solid_interface)
where nglob_acoustic denotes the whole point set inside acoustic region, nglob_acoustic_solid_interface denotes the point set on acoustic_solid_interface

- suggestion 35:
----------------

Add full viscoelastic kernels in 3D_Cartesian, undoing full attenuation (already done in 3D_GLOBE in July 2013)

- suggestion 38:
----------------

Add full loop vectorization for critical calculation loops, to make sure we gain a factor of about 1.8 in terms of CPU time.
Already done by Dimitri in SPECFEM3D_GLOBE (fully, grep for "FORCE_VECTORIZATION" to see where and how),
but only partially done for now in SPECFEM3D_Cartesian.

- suggestion 39:
----------------

doublecheck carefully file memory_eval.f90 in SPECFEM3D_Cartesian (already done and fixed for SPECFEM3D_Globe by Dimitri in July 2013).
(in particular now that C-PML has been implemented in SPECFEM3D_Cartesian), as the code has undergone lots of changes in the past few months
and thus new big arrays have been used, in particular for C-PML (but only in the CPML layers for these arrays, not in the rest of the model);
Thus we should make sure that all these arrays are taken into account in the memory estimate computed in memory_eval.f90.

---------

A bug report received for SPECFEM2D:

Subject: [CIG-SEISMO] only one partition of "model_velocity.dat_output" will be generated in parallel run
Date: Mon, 12 Aug 2013 18:03:40 +0100
From: Yingzi Ying
To: cig-seismo

Dear SPECFEM Developers,

In parallel running of xspecfem2D, only one partition of "model_velocity.dat_output" will be generated.
Both mpirun and mpiexec executions have been tested.
The whole mesh can be generated in serial running.
Just a bug report.

Thanks,
Yingzi


------------------------------------------------
CPML :
------------------------------------------------

- suggestion 19:
----------------

use weights in Scotch decomposition for C-PML elements in the code;
because C-PML elements will be more expensive because they will compute more terms and will solve more equations (for the memory
variables / convolution terms) thus we should assign a higher weight to them when we call Scotch

the "elmnts_load" array up in decompose_mesh/part_decompose_mesh.f90 (around line 1432)
currently does not take into account C-PML elements weights. The matching array in the code is
called "CPML_type" and is defined as follows: 1 = face, 2 = edge, 3 = corner.

that is similar to what Daniel does for acoustic elements in the current version of SPECFEM3D: he uses a weight of 1 for acoustic
elements and 3 (or something like that) for elastic elements, which use a 3D vector instead of a scalar.

In principle the weighting factors can be computed analytically by counting the exact number of additional multiplications and
additions performed by C-PML elements

In PML corners this additional factor can be multiplied by 2 (for instance in the slice of PML that has X and Y damping) or by 3
for corners that combine an X, a Y and a Z PML.

easy to do once we have an expression for the weighting factors

------------------------------------------------
PT-SCOTCH and ParMETIS:
------------------------------------------------

- suggestion 06:
----------------

use PT-Scotch (the parallel version of SCOTCH) in SPECFEM3D instead of (or as an option in addition to) the serial version of
SCOTCH; because decomposing very large 3D meshes on a serial machine (on a single processor core) can be a severe limitation and
for some meshes that will be a major bottleneck.

Support ParMetis as well because there is a new, improved version that has been released by Karypis et al recently.
To do that, Jeroen sent a Fortran bug fix (a patch to use to call METIS from Fortran) to Matthieu by email on Jan 16, 2013.

What would need to be done is switch to PT-Scotch version 6.0 and/or to ParMETIS
(supporting both would be easy, since the subroutine calls are very similar; in the current serial code we support both Scotch and METIS).

To do that, one would need to rewrite "decompose_mesh.F90" in directory src/decompose_mesh.
That is the only file that would need to change (in addition to part_decompose_mesh.f90, which is called by decompose_mesh.F90).
Unfortunately that file is not very well written nor very well commented.
Thus unfortunately that set of routines is not so easy to understand and modify.
A bunch of things are hardwired.

Thus, this is not so easy to do (Daniel and I checked about a year ago). It would maybe be easier to rewrite 'decompose_mesh_scotch'
from scratch as a new, MPI program instead of trying to add MPI support to the current and not so well written serial version.
That program is relatively small, and what it does is relatively straightforward.

------------------------------------------------
mesh files in SPECFEM3D_GLOBE:
------------------------------------------------

- suggestion 11:
----------------

merge all the mesh data files into a single mesh file per MPI slice;
on large machines many people (including Dominik Goeddeke and I, also people at the UK supercomputing center) have found that
generating more than 4O smaller files per slice in GLOBE puts a very heavy load on LUSTRE or
GPFS shared file systems for no reason

the only reason why we have many files per slice is historical; we should definitely change that

Add ADIOS support for that

I have asked Jo to reduce that to a single file per process

we will need to do two things:

- convert all ASCII mesh files to 'unformatted' (in particular all the MPI buffer files: list of points etc); easy to do: just add
"form=unformatted" to the open statements, and change all write(unit_number,*) statements to simply write(unit_number)
i.e. just get rid of the ",*"

- merge all the files; easy to do in principle, just comment out all "close" and "open" statements in the code except the first
'open' and the last "close"; also make sure that the same unit number is used in all writes (which is not the case right now: I
remember we use 11, 12, 27, 34, IOUT and so on; you will need to make all of this a single value (called IOUT)

This should be done in both SPECFEM3D and SPECFEM3D_GLOBE

this is critical, because currently in GLOBE we write about 42 files per MPI slice, thus when running on 1000 cores we create
42,000 files for no reason. Let us reduce this to 1000.

IMPORTANT: see if this has an impact on processing scripts, if we decide to also merge the files output by the solver (for instance the
several different types of sensitivity kernel files) into a single file. If so, some kernel processing scripts and tools will need to be adapted
accordingly.

Feedback from Qinya: I can't agree more. We have had experience running big specfem jobs on our National Supercomputer Consortium
    (related to kernels for noise measurements, which generate even greater number of files). It crippled the entire file system,
    and the system administrators became really unfriendly to us after that ...

On the other hand, I know a lot of visualization programs (paraview) actually read the individual binary files and combine them
    to form bigger visualization domains. So if we rewrite this, we need to write it in a form so that it is easy/faster for an
    external program to access  bits of it (such as x,y,z, ibool, etc). Maybe direct-accessed C binary files?


------------------------------------------------
auto-time stepping in SPECFEM3D_GLOBE:
------------------------------------------------

- suggestion 13:
----------------

fix auto_NER() in GLOBE when a single chunk is used and has a size of less than 90 x 90 degrees; in that case, that (relatively
old) routine written by Brian Savage around 2006/2007 is called, and I noticed that it makes the code blow up in most cases because
the mesh it creates and/or the time step that it automatically selects is not right (probably because the mesh has changed
significantly since Brian wrote that routine)

probably not urgent (in our group at least we almost always use the full Earth or else a chunk of size 90 x 90 degrees); but if
other users change that the current code will likely blow up


------------------------------------------------
re-constructing wavefields:
------------------------------------------------

- suggestion 16:
----------------

when reading the absorbing conditions back to reconstruct the forward run backwards when running the adjoint step (i.e. when
SIMULATION_TYPE = 3), my postdoc Vadim Monteiller suggests to read the adjoint information back only every 50 or 100 iterations, by
chunks of 50 or 100 time steps at a time, to avoid slowing down the run

advantage: makes the code faster by avoiding doing I/Os at each time step

drawback: more memory is needed to store these edge conditions (exactly 50 or 100 times more, but only for a few 2D arrays)

useful to do? implement it as an option maybe? and then the user could choose to use 50, 100 or 1, and chosing 1 would be
equivalent to the current behavior?

Feedback from Qinya: The way I implemented this was for simplicity. It was just easier to write every step the absorbing boundary term and then read it back in reverse order in the kernel calculation. But obviously you can write in 50-step chunks, and read them back in 50-step chunks as well (make sure you still apply them in the reverse time order). We may have to be careful of the sizes of storage variables so they are not exceedingly large for some slices. For example, a model of 3x3 slices, slice 0 will have 2 absorbing boundary sides, slice 1 will have 1 a.b. side, while slice 4 will have no a.b.  We can make it a default option, but giving users the choice of 50 or 100 is just going to make the Par_file even more confusing. We could easily estimate a number from the max number of adjoint boundary elements in all slices.


------------------------------------------------
workflow inverse schemes:
------------------------------------------------

- suggestion 21:
----------------

Update from Dimitri, Oct 2012: after talking with many developers and users, we should definitely make the inversion tools in SPECFEM3D more standard and more user friendly. Even if there are many options to adjust when solving inverse problems, and different users currently have different tools (in Matlab, Perl, Fortran, Python and so on), sooner rather than later we should probably try to define a single tool in which all these different options could be implemented, but at least we would have a unified framework, which we could document in the manual.

Currently many users complain that computing sensitivity kernels is easy with SPECFEM3D but that then it is not so easy to use them (and different people give different answers when they are asked). So far, at least the following groups have local tools: the Princeton group, the Toronto group, Carl in Alaska, Emanuele and his group at INGV, Alessia (at least for Flexwin, there are now two versions: in Fortran and in Python), and the Marseille / CNRS group. We should try to address this issue relatively quickly, because the more we wait the more these sets of tools will diverge.

One thing we should do quickly for sure is make sure we maintain the current tools of SPECFEM3D inside the same directory as SPECFEM3D itself; it seems that currently everything is outside, in an "ADJOINT_TOMO" directory, which is thus not downloaded by users when they get SPECFEM3D through SVN or from the CIG tar file; which is problematic.

==================

Later, in a few months, maybe also see if compression (wavelets?) could help

Feedback from Qinya on the last point: I recently hosted a visit by Prof. Ling-yun Chiao (Chiao & Kuo 2001, Hung, Ping & Chiao 2011) from national Taiwan University, one of the first few people who started to apply wavelet to seismic inverse problem. I think we have some idea of how wavelet can potentially help with adjoint tomography, and I would love to share our thoughts with you all as well.

------------------------------------------------
GPU version:
------------------------------------------------

- suggestion 34:
----------------

Here are recent features of the code that have been implemented by different developers in the
regular Fortran + MPI version but that have not been ported to the GPU version yet:

- CPML

- modified fluid/solid formulation of Yang Luo to allow for adjoint sources located in fluid regions
                   (for instance for offshore simulations in the oil industry)

- QKappa attenuation support in addition to Qmu

- new (fixed / improved / more accurate) attenuation time scheme of Zhinan Xie


- suggestion 22: (not really critical, probably not worth doing, at least for now)
----------------

when I said that the future codes will support either CPUs or GPUs (in the same code but not simultaneously because we do not
compute part of the elements on the graphics cards and the rest on the CPUs): I realize that that is true but that there is an easy
and elegant way of running hybrid calculations anyway without changing the code:

- if you have N GPUs and M CPUs, start M + N MPI processes, and call CUDA only from the N processes and the CPU code only from the
M other processes; this way, N processes do GPU calculations only, M processes do CPU calculations only, and MPI can easily make
them run on the same nodes and communicate through MPI

- since GPUs are faster than CPUs (by a speedup factor that we can call S), to get close-to-perfect load balancing, just call
Scotch or ParScotch with that S factor as a weight for partitions that will run on CPUs and thus S times slower; by doing this,
Scotch will assign them approximately S times fewer elements, and thus we will get very good load balancing in a mixed/hybrid CPU +
GPU models

clever and easy to do.

limitation (of hybrid models in general, not of the above approach): if GPUs are 30 or 40 times faster than CPUs, doing this is
probably not very useful and using the GPUs only is probably sufficient because using the CPUs in addition will result in a gain of
a few percent only;
maybe not worth doing it


------------------------------------------------
time schemes:
------------------------------------------------

- suggestion 30:
----------------

Regarding the implicit time schemes that Zhinan has implemented in 2D, I agree that it would be great to put that in the official SVN version relatively soon to avoid losing the changes if we wait for too long. But I think we only need this in 2D for now, so let us not do it in 3D (at least for now in 2012). Let us just commit Zhinan's 2D version of the implicit routines to the official SVN code (making it off by default; the default should remain a purely explicit second-order Newmark scheme).


------------------------------------------------
attenuation in SPECFEM2D:
------------------------------------------------

- suggestion 23:
----------------

in the 2D code, the way we define viscoelastic or elastic element is not very nice: we need to set Q = 9999 to turn attenuation off;
it would be better to have a logical flag saying if a given layer has attenuation or not

this suggestion comes from notes that I took while talking to Paul Cristini a few months ago therefore I am not sure I remember
very well what Paul wanted to change about that. Paul, could you remind us more precisely the change that you had in mind?

Answer from Paul:

My concern was to be able to have in one domain several media which can be elastic or visoelastic. Currently, there is one flag for the whole domain which makes all media elastic or visoelastic.

Conclusion from Dimitri:

Now I remember more clearly what the problem is, based on what you told me:

- in SPECFEM2D the TURN_ATTENUATION_ON flag is global; thus either all the elastic elements become viscoelastic, or none of them

- and then, to have some viscoelastic regions and others that are purely elastic, we do something weird: we set Qkappa = Qmu = 9999 and we run a viscoelastic simulation but with Q = 9999, i.e. almost no loss of energy

Of course, what we should do is:

- suppress the TURN_ATTENUATION_ON flag

- for each material in the list of materials, add a flag that says if that material is elastic or viscoelastic;
or detect if Q > 9000 for instance and if so, consider that that material is elastic and turn off attenuation for it
(the same way we detect acoustic media by checking if the S wave velocity Vs is smaller than 0.0001)

Otherwise, in the current implementation, we waste memory and CPU time by doing viscoelastic simulations everywhere, even in elastic regions, when TURN_ATTENUATION_ON is on.

More precise analysis from Dimitri:
this means in particular changing this in DATA/Par_file:

ATTENUATION_VISCOELASTIC_SOLID  = .false.        # turn attenuation (viscoelasticity) on or off for non-poroelastic solid parts of the model
ATTENUATION_PORO_FLUID_PART     = .false.        # turn viscous attenuation on or off for the fluid part of poroelastic parts of the model

and making it local to each material set instead of global for the whole mesh.

Feedback from Qinya: I agree. The way attenuation appears in the 2D Par_file is quite confusing. It will be good to clean up, or at least explain well in the manual.


------------------------------------------------
databases in SPECFEM2D:
------------------------------------------------

- suggestion 32:
----------------

Convert the mesh database of the 2D code from ASCII to binary to produce smaller files?
For some of Paul Cristini's 2D simulations for instance the current ASCII file become really huge


------------------------------------------------
dynamic/kinematic faults:
------------------------------------------------

+ fault_solver_common:
      - make ordered version of subroutine assemble_MPI_vector_blocking, and use it in subroutine initialize_fault

+ fault_solver_dynamic:
      - many hard-coded ad hoc features need to be set instead as user options
        (e.g. rate-and-state friction)

+ rate-and-state friction:
      - make it a user-friendly option (currently a flag in fault_solver)
      - add documentation

+ cubit interface:
      - consolidate python scripts (eliminate redundancy)

+ verify consistency of fault edge nodes in kinematic solver:
  currently, these nodes are not split but they are included in the fault database

+ simplify mesh generation with faults:
      - eliminate fault edge nodes?
          Currently fault edge nodes must be defined as closed and non-split in CUBIT,
          SESAME asigns a single ibulk, but it adds it to the fault database.
      - develop a recipe for mesh coarsening in CUBIT based on half "cubed sphere" mesh
        - test tet-to-hex strategy

+ add snapshot outputs to kinematic solver (e.g. to export fault stresses for adjoint source inversion)

+ prepare automated tests


Older to-do lists of SPECFEM2D, SPECFEM3D and SPECFEM3D_GLOBE (some items are probably still useful to implement):
------------------------------------------------------------------------------------------------------------------

(items listed in no particular order)

- add a checkpointing/restarting system with CRC-32 as in
SPECFEM3D_GLOBE/tags/v4.1.0_beta_merged_mesher_solver_non_blocking_MPI/src , or using the fault tolerance library developed by
Leonardo Bautista and Franck Cappello at INRIA (see the content of directory "utils/fault_tolerance").

- Regarding memory size (getting an estimate of memory consumption in SPECFEM3D),
somebody should just cut and paste my SPECFEM3D_GLOBE routine
"SPECFEM3D_GLOBE/version41_beta/src/memory_eval.f90", which
I call from "SPECFEM3D_GLOBE/version41_beta/src/create_header_file.f90".
It would work for SPECFEM3D as well (with minor modifications).

- For binary files (for instance Carl Tape's m16 model) we should use NETCDF to store them and read them back; that is the only
clean way of ensuring portability between different systems; and NETCDF is free and open source therefore there is no reason not to
do it.

- in SPECFEM3D_GLOBE use a potential of (rho * u) instead of u in the fluid, in case of fluid-fluid discontinuities; probably
already done in SPECFEM3D because of purely acoustic oil industry problems, but not done in SPECFEM3D_GLOBE because there is no first-order
discontinuity inside the fluid outer core therefore it is not needed; thus this point is not needed except if we decide
to use SPECFEM3D_GLOBE for other problems one day, for instance helioseismology

- merge SPECFEM3D_GLOBE into SPECFEM3D, i.e. create a single package and see the mesh of the Earth as a general mesh that we would
decompose in parallel using PT-Scotch. Problem: would this scale for huge global Earth simulations, e.g. at a seismic period of 1
second on 200,000 processor cores?

- make the code compatible with helioseismology / general Cowling formulation

- could be done by Vala:
we could use heuristic rules to make source and receiver detection much
faster and make these routines use far less memory. For instance using the
latitude, longitude and depth of each source or receiver we can quickly
determine if there is no chance for this source or receiver
to be located in the current slice; then that processor would immediately
switch to the next source or receiver; this could speedup the whole process by
a factor of 10 to 100 I guess when many sources are used (e.g. Vala uses
50,000) and/or many stations (Bernhard uses 20,000). We would
get rid of the bottleneck in locate_sources and locate_receivers
when one uses a huge number of sources (e.g. Vala -> 100000 sources)
or a very large number of stations (e.g. Bernhard -> 20000 stations):
in such a case there is a bunch of MPI_GATHERS which waste a lot
of memory and can even use all the available memory and make the run crash.
Vala partially fixed this problem by using subsets of 1000 stations, but after
careful analysis we have found a way of doing better and getting rid of all
the GATHERs and *all* the extra memory (this way we won't have to limit the number of sources to a maximum of 100000,
as currently done in the solver).

- something that could be made more general in the 2D code:
at line 770 of specfem2D.F90:
!! DK DK if needed in the future, here the quality factor could be different for each point
i.e. they could be given at each (i,j,ispec) instead of at each (ispec) only
in the current version. Very easy to do if needed, just that line to change.

- we could add support for the TetGen mesh creation package, and cut each
tetrahedron into four hexahedra using the barycenter. This could help design
meshes for sedimentary basins, in particular near basin edges.
Celine Blitz has worked on that in 2010 while at TOTAL.

- we could add support for different polynomial degrees in different elements (p-adaptivity)

- we could add support for Discontinuous Galerkin, and/or for SEM on tetrahedra and pyramids


---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------

Things to add later to the manual:
----------------------------------

- update the manuals according to the changes made from the list above

---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------

TO DO ONLY IN 2013 OR 2014 BECAUSE WE DO NOT NEED THAT FOR URGENT PROJECTS:
---------------------------------------------------------------------------

- suggestion 07:
----------------

implement higher order time schemes in 3D by cutting and pasting what Zhinan has done in 2D; two schemes are useful: RK4 and
LDDRK4-6 (low storage low dissipation Runge Kutta, a bit more expensive than RK4 but much better)

implement these RK4 and LDDRK4-6 high-order time schemes including for viscoelastic media, as already done in the 2D code

probably not a high priority, could be done last because we do not need that for current projects

Feedback from Qinya: The same time scheme should be used for forward and adjoint simulations, while special care should be taken to implement the corresponding time scheme for the back-reconstruction of forward field in kernel calculations.


---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------

DONE (MOVE ITEMS ABOVE BELOW THIS LINE WHEN THEY ARE DONE):
-----------------------------------------------------------

- suggestion 01a:
----------------

Zhinan commits his current clean 2D C-PML code to SVN; because changes
made outside of SVN are useless for regular users of the application,
and are also quickly lost.

Zhinan, please also include the modifications that René Matzen made to "compute_forces"
and that he sent on May 10, 2012.

Here is a more detailed list of things to do in order to finish that:

- commit the acoustic CPML routines

- make sure that CPML elastic and CPML acoustic work with MPI

- make sure CPML can be used without Stacey; to do this, suppress the anyabs_local flag, whose name is confusing,
and use the PML_BOUNDARY_CONDITIONS and STACEY_ABSORBING_CONDITIONS flags instead;
maybe define anyabs_PML and anyabs_Stacey internally in the code instead of anyabs

- make sure CPML can be used with external meshes (coming from Gmsh or CUBIT) as well, using the numbering convention
that you defined with Paul two months ago; make sure that external meshes with fluid regions and solid regions also work fine with CPML

- clarify why the corners need additional arrays and additional code;
I still do not understand that; I thought that corners were not different from the rest in PML and that memory variables were simply summed there,
but I see that in your code you have specific arrays for the corners, called "rmemory_dux_dx_corner()" and so on; they make the code
(the implementation) longer and more complex, thus let us see if they are absolutely needed or if we could (maybe?) get rid of them;
we should explain this in the LaTeX file that you are writing now to explain the CPML equations

- see if CPML works with curved elements (checking the behavior with the default M2 UPPA example with topography for instance)

- suggestion 01b:
----------------

Zhinan is going to try to adapt CPML in the 2D code to tilted models, to see how to do that;
he will also try to see how many PML flags he needs.

- suggestion 03:
----------------

Zhinan (with the help of Jo and Daniel) implements C-PML in SPECFEM3D (in that code only for now; we can cut and paste that in
SPECFEM3D_GLOBE later if needed, but we do not need that now).
Paul Cristini and I will then test it extensively after the summer.
Zhinan, Daniel, Paul and Jo need to agree on a numbering convention (i.e. flags) for the different PML regions and corners (a
binary / bit encoding technique can be used; see more details about this at the end of this suggestion);
they could use the same idea as in the convention used in 2D.

See (with Emanuele and also Paul Cristini) how to define and handle the PML absorbing elements in CUBIT;
read this from input files in the solver

See also if the current arrays called isPML() or PML() or something like that in SPECFEM3D need to be suppressed or on the contrary could
be reused. They seem to correspond to an earlier attempt a few years ago that was never finished.

One thing that we will need to investigate in the specific case of GLOBE is how to make PML work for boundaries that are not aligned
with x / y / z (i.e. for one chunk of SPECFEM3D_GLOBE). That should not be a problem but we will get more terms because of products with
the three components of the normal vector. We already have the general (tensorial) formulation written in our PML paper of 2003,
but I never implemented all the terms.

The standard logical bit handling functions of Fortran are described for instance at http://www.ews.uiuc.edu/~mrgates2/docs/fortran.html
Thus here is the standard way of using a single array for all the logical flags for CPML in Fortran:

- define a single integer flag

integer, dimension(nspec) :: PML_flags

- clear it initially :  PML_flags(:) = 0

- use for instance bit #1 for Xmin, bit #2 for Xmax , bit #3 for Ymin .... bit #6 for Zmax

- thus, to indicate that an element belongs to the Xmax PML for instance, use :

PML_flags(ispec) = ibset(PML_flags(ispec), 2)     ! sets bit #2 of that integer to 1

- later, to test if this element belongs to the Xmax PML, use:

if(bset(PML_flags(ispec), 2)) then ! test if bit #2 is equal to 1
endif


Pretty simple, and then a single array is needed for all the flags.

(in fact, we could use this technique to group all the other logical flags of the code in a single array, to reduce memory size...)


- suggestion 04:
----------------

Zhinan sees if Dirichlet or Neumann conditions are needed on the outside edge of the C-PML layers in acoustic media; in elastic
media, a Dirichlet condition (zero displacement) is needed to prevent spurious Rayleigh waves from developing at the far end of the
PML and making it blow up; in the acoustic case, a Dirichlet condition for pressure (zero pressure) means a Neumann (rather than
Dirichlet) condition for the potential because pressure is related to its gradient;

thus Zhinan should see (by running a simple test in the 2D code) if a Neumann condition is OK for an acoustic PML, and/or if a
Dirichlet condition works fine as well; it could be that both are OK (because no such spurious surface waves can develop, contrary
to the elastic case) but it is worth checking (and it is simple to check)

- suggestion 18:
----------------

leave Stacey as an option even once C-PML is implemented;
because in many cases C-PML (with Dirichlet conditions on its outer edges) will be sufficient, but in principle it is possible to
replace Dirichlet with Stacey and thus use both C-PML and Stacey simultaneously to have even better absorption;
There is a paper by Collino and Monk in 1998 about this (for Maxwell)

So let us keep Stacey in the future codes, but let us make it off by default in DATA/Par_file (and C-PML on by default)


------------------------------------------------
27-node elements in SPECFEM3D:
------------------------------------------------

- suggestion 25:
----------------

Add support for 27-node elements in addition to 8-node elements;
that is useful for instance when handling large topographic variations or
significantly distorted layers; we do have support for 27-node elements in
SPECFEM3D_GLOBE but not in SPECFEM3D. I guess it would just be a
matter of cutting and pasting the 27-node routines from SPECFEM3D_GLOBE.

Users (for instance Vadim Monteiller and Paul Cristini) who use SPECFEM3D for mountain ranges for instance would be able to use curved elements, and since we have support for 27-node hexahedra in SPECFEM3D_GLOBE we could probably cut and paste that in SPECFEM3D.

Vadim could probably have a look at that because he will need that for his study of the Pyrénées.

As Emanuele mentioned, CUBIT has no real support for HEX27, but Gmsh does (and Paul uses Gmsh all the time; thus we could use it rather than CUBIT when curvature is needed).

I guess Vadim will work on that at some point.

Comment from Daniel:

keep in mind that we have two ways to mesh, in-house mesher xmeshfem3D and CUBIT/SCOTCH. if you insist on 27-node elements, i would start to implement it in the xmeshfem3D in-house mesher. that would probably show what is needed when combining it with CUBIT and SCOTCH which might be more of a problem.

Comment from Emanuele:

Theoretically, Cubit can generate mesh for HEX8, HEX20, HEX27 type of
hexahedra but higher-order nodes are created only when the mesh is
being exported to the Exodus II file (and persist in the CUBIT
database after file export) so after the meshing process... therefore
basically it is a regular hex mesh with more nodes inside each
elements (not a curvilinear element).

I'm not sure if the cubit's python interface is able to extract the
information about the internal nodes... I should take a try but in
 any case it is possible to recreate it during the exporting phase
from cubit/geocubit to specfem3d (or directly in specfem?)

I suppose that I can easily create a script that produce a hex
association of 27 nodes but the hard-for-me step is to adapt the
specfem3d reader and the scotch partitioning.


- suggestion 17:
----------------

in comments in DATA/Par_file and also in the LaTeX manual of the 2D code, Zhinan should mention the CFL upper bound of the
different time schemes that are now implemented and that can be selected by the user:
Newark, RK4 and LDDRK4-6

mention also in the manual and in these comments in DATA/Par_file that using Stacey absorbing conditions slightly reduces this maximum
CFL value i.e. the maximum time step (as a rule of thumb obtained by trial and error, it is reduced by a factor of about 0.90).

without such numbers as comments nor in the manual, it is difficult for users to decide what value to use for the time step (in the
2D code, the user selects the time step, it is not computed automatically by the code)


------------------------------------------------
Stacey absorbing boundaries:
------------------------------------------------

- suggestion 20:  note DK DK Jan 2013: not needed any more because we now have C-PML instead; thus purposely not done.
----------------

see with Zhinan if a spring should be added to Stacey in SPECFEM3D and SPECFEM3D_GLOBE; Stacey is a dashpot and thus Zhinan and his
advisors in China have shown that the condition can become inaccurate and have a static offset in some cases; they have shown how
to suppress this by adding a spring in parallel to the dashpot

already done in the 2D code

see with Zhinan if he thinks this should be done in SPECFEM3D and SPECFEM3D_GLOBE as well, and if it would be useful

should probably not be a high priority; PML and other things are more urgent


- suggestion 26:
----------------

Zhinan will try to back-propagate some waves in 1D with viscoelasticity;

Jeroen, Qinya, Zhinan and I discussed that a few months ago, some of us think the backward run is unstable when undoing attenuation but Zhinan remembers seeing some stable backward runs with a C viscous damping matrix in mechanical engineering at his institute in China, therefore it is worth trying using SPECFEM1D for instance.

(in 2D we could avoid this problem by saving all the timesteps of the forward run to disk and reading them back, but in 3D is it not possible yet because the amount of I/Os would be too big; this should change in 5 to 10 years, but for now we still need to back-propagate when SIMULATION_TYPE = 3 in 3D)


- suggestion 00: about VERCE project modifications
----------------

> Possibly set the name of the tomography file in the Par_file instead
> of hard-coding it in model_tomography.f90 or creating a separate
> parfile
>
>
> Some interesting addition will be the:
>
> ---- multifiles tomography
> Piero told me that the run for turkey/europe mesh had this possibility
> so probably it is matter to commit that files in the svn version
>

-----------------------------------------------------------
GPU related:
-----------------------------------------------------------

- suggestion 02:
----------------

Max merges the GPU / CUDA improvements made by Peter Messmer into
SUNFLOWER; Daniel and Jo then merge SUNFLOWER into SPECFEM3D to get a
single version that supports both CPUs and GPUs (more precisely: either
CPUs or GPUs, depending on the case; no mixed / hybrid model);
we then "svn delete" SUNFLOWER and keep a single code

------------------------------------------------
Par_file modifications:
------------------------------------------------

- suggestion 00: about adding new CPML flags and also new (or modified) movie and shakemap flags + VERCE project modifications
----------------

From Dimitri: here are two more parameters to rename and/or move from the constants.h file to the input Par_file:

could you also rename "USE_RICKER_IPATI" to "USE_RICKER" or "USE_RICKER_TIME_FUNCTION" everywhere in the source code and in the input file?
Because "Ipati" is the name of an industrial field and thus we should make the name of that parameter less specific.

Let us also move that parameter from the constants.h file to the input file (Par_file), to avoid having to recompile the code
(see my previous email about this)

================

As discussed yesterday during our Skype call, let us make the
modifications below in the input Par_file of SPECFEM3D.
NGNOD should be added and set to 8 or 27, and then when reading the
Par_file the code should check that it is equal to one or the other, and
otherwise exit with an error message.
The code should then automatically set NGNOD2D to 4 if NGNOD == 8 and to 9 if NGNOD == 27.

For the new movie flags, I suggest changing MOVIE_SURFACE and
EXTERNAL_MESH_MOVIE_SURFACE, which are confusing, to MOVIE_SURFACE only,
and then define a second (integer) parameter, called for instance
MOVIE_TYPE, which could be equal to 1 to show the top surface
(topography + oceans) only, to 2 to show all the external faces of the
mesh (i.e. topography + vertical edges + bottom), and so on.

Let me add this to the todo list. Let us also decide who does it
(since new flags will be needed in the Par_file for PML, I guess it will
be easy and better to do all the modifications at the same time).

Let us also modify the code to allow for shakemaps and movies at the
same time, rather than having to run the code twice.

> add NGNOD = 27 or 8   to the input Par_file (move it from shared/constants.h.in to the Par_file, since now either value can be used depending on the mesh,
whether it is HEX8 or HEX27)
>
>
> In the VERCE framework they ask some modifications in order to improve
> the flexibility of the code if installed as a module....
>
> Move from constants.h to Par_file the flags:
> EXTERNAL_MESH_MOVIE_SURFACE
> EXTERNAL_MESH_CREATE_SHAKEMAP
> OLSEN_ATTENUATION_RATIO
>
> Allow shakemaps and surface movies to be generated during the same run
> (EXTERNAL_MESH_MOVIE_SURFACE and EXTERNAL_MESH_CREATE_SHAKEMAP not
> mutually exclusive)
>
> We believe also that
>
> Explain in the manual the difference in using
> EXTERNAL_MESH_MOVIE_SURFACE instead of MOVIE_SURFACE, or
> EXTERNAL_MESH_CREATE_SHAKEMAP instead of CREATE_SHAKEMAP


- suggestion 05:
----------------

Subject: Re: [CIG-SEISMO] simulation of seismic wave propagation excited by a force
Date: Tue, 11 Sep 2012 18:34:21 +0200
From: Dimitri Komatitsch
To: Christina Morency
CC: Yang Luo, Yingzi Ying

Hi Christina and Yang, Hi all,

OK, great. This is excellent news.

So let us just move this "USE_FORCE_POINT_SOURCE" flag from
setup/constants.h.in to DATA/Par_file, and let us mention it in the manual.

Jo, could you please do that and commit the changes?

Thank you,
Dimitri.

On 09/11/2012 06:17 PM, Christina Morency wrote:
> Hi Dimitri,
>
> Yes, I can confirm what Yang is saying, it is already in the package,
> with interpolation if the source is not located on a GLL point (as it is
> indeed done in the 2D btw).
>
> Christina
>
> On 09/11/2012 09:14 AM, Yang Luo wrote:
>> I think Daniel (or someone else) has already copied necessary lines from old package.
>>
>> There is an option "USE_FORCE_POINT_SOURCE" in constant.h, which allows one to use a point force
>> (which Yingzi already figured out, or maybe it's already in the manual)
>>    logical, parameter :: USE_FORCE_POINT_SOURCE = .false.
>>    double precision, parameter :: FACTOR_FORCE_SOURCE = 1.d15
>>    integer, parameter :: COMPONENT_FORCE_SOURCE = 3  ! takes direction in comp E/N/Z = 1/2/3
>>
>> Interpolation is included, if the point force is not located exactly on a GLL point.

=================

add an option to use a point force source in SPECFEM3D instead of a CMTSOLUTION; some users need that (very classical for instance
in non destructive testing) and it is easy to do; the only thing to decide is what convention to use in DATA/Par_file: add a flag
for that?
put a flag to choose between a Ricker source and a Heaviside source when a force source is used
(we can cut and paste the Heaviside implementation from the case of a CMT source)

Feedback from Qinya: I think this is a great idea. In the past, we always had multiple versions of the same specfem code, just to be able to run point force sources. We can add a parameter `SOURCE_TYPE' in Par file, which is = 0 for moment-tensor source ( reads CMTSOLUTION, and assumes Heavside/err as source time function), and = 1 for point force source (and read something like PNTSOLUTION, including force location, direction, time shift and half duration). Now there is no standard stf to use for point force, but Heavside function does not seem to be a good idea because it will produce a static offset on all seismograms, which is not desirable. On the other hand, I don't see why Richer wavelet (2nd order derivative of Gaussian) is necessary better than Gaussian itself. One way to give user the freedom of choice is to put the stf info also in the PNTSOLUTION.

Subject: Re: [CIG-SEISMO] simulation of seismic wave propagation excited by a force
Date: Tue, 11 Sep 2012 18:01:38 +0200
From: Dimitri Komatitsch
To: Yingzi Ying

Dear Yingzi,

You are right, in the current code there is no option in the Par_file to
use a point source, but we added this to the todo list right before the
summer and thus we should/will definitely do it. It is easy to do: just
add the Ricker wavelet to the accel() vector (which is in fact a force
vector before we divide it by the mass matrix), to the component you
want (vertical or horizontal) and you are all set.

Joseph, could you maybe do it? below are a few lines from one of my test
codes, in which I do just that; let us cut and paste these lines in
SPECFEM3D (it is just a matter of selecting the right grid point; which
we already do for a CMT source, thus we could add a force instead of a
CMT at that location; the only difference being that the CMT source is
added to all the points of a given spectral element, while a force
vector should be added to a single point).

Let me talk to Joseph tomorrow to see if we can release a patch of the
source code soon.

I also cc Yang to see if he has already done that in a local version at
Princeton maybe (for instance to model a force for some of his oil
industry simulations in foothill regions?).

Another difficulty is if the location of the force does not correspond
to a grid point; then there are two options:

- select the closest grid point; i.e. slightly change the location of
the source (can be done automatically by the code)

- use Christina Morency's way of adding a force source between grid
points; I think I remember she did that in the 2D code, but I am not
sure how; thus I cc her.

(that second option would be better, if Christina can confirm that it
works fine)

Best wishes,
Dimitri.

==================================================================

! ipoinsource is the global grid point at which we add the source
! "3" means a vertical force (i.e. we add the force to the vertical
! component of the force vector, which is later divided by the mass
! matrix to get the acceleration vector)
! factor_force is the norm of the force vector to add
! ricker(t) is the Ricker wavelet at time t = (it-1)*deltat

  accel(3,ipoinsource) = accel(3,ipoinsource) - factor_force * ricker(t)

==================================================================

On 09/11/2012 12:59 PM, Yingzi Ying wrote:
> Hi all,
>
> I am trying to use specfem3d to simulate waves excited by a vertical
> force with user-defined source forms such as ricker wavelet.
>
> I tried to do it through adjoint simulation by setting 'SIMULATION_TYPE
> = 2' and put source position at STATIONS_ADJOINT and receivers at
> CMTSOLUTION. But I failed.
>
> Does anyone have an idea or suggestion how to do active simulations with
> specfem3d?
>
> Thanks in advance.
>
> Best regards,
> Yingzi
>


-------- Original Message --------
Subject: Re: force in SPECFEM3D
Date: Wed, 18 Jul 2012 23:50:19 +0200
From: Dimitri Komatitsch
To:  Carl Tape,  daniel peter

Hi all,

Yes, let us definitely add it and make it an option in the Par_file.
This will be very useful also in the case of non destructive testing of
materials (in which case the source is almost always a set of forces).

What probably exists is a routine to add a force at a GLL grid point
(which is easy to do). If we also want to have an option to model a
source located somewhere between grid points, then I guess we will need
to see how the discrete source term looks like (I think Daniel once told
me that Christina had done it in the 2D code; probably summing the
Lagrange interpolants at all the GLL points of the element? (as we do
for receivers located between grid points?).

Thanks,
Dimitri.

On 07/18/2012 11:01 PM, Jeroen Tromp wrote:
> Hi Daniel:
>
> The adjoint source involves a "force", so the necessary code exists. But
> it can't hurt to explicitly allow a source that is a force rather than a
> moment tensor.
>
> Best regards,
>
> Jeroen
>
> On 7/18/12 8:56 PM, Carl Tape wrote:
>> Hi Daniel, Dimitri, Jeroen--
>>
>> I am continuing some benchmark simulations for static offsets in
>> SPECFEM3D: 0-D, 1-D, 3-D models. One of the applications is to make a
>> Green's function database similar to what is used in GPS inversions of
>> fault slip. This means using a source that is a force, not a moment
>> tensor. This is easy in the 2D code (elastic force option), but it
>> doesn't appear to be present in the 3D code. Can you confirm? Is that
>> something we want? (It seems like it.)
>>
>> Thanks,
>> Carl

-------- Original Message --------
Subject: Re: force in SPECFEM3D
Date: Thu, 19 Jul 2012 02:15:05 +0200
From: Dimitri Komatitsch
To: Carl Tape, daniel peter
CC: Jeroen Tromp

Hi Carl and Daniel,

Yes, I think that would be very useful;
and we should also put these parameters in the input file (Par_file)
rather than in constants.h, so that users do not need to recompile the code.

Thanks,
Best wishes,

Dimitri.

On 07/19/2012 01:06 AM, Carl Tape wrote:
> Hi Daniel,
>
> Great. This is it:
>
>    integer, parameter :: COMPONENT_FORCE_SOURCE = 3  ! takes direction
> in comp E/N/Z = 1/2/3
>
> But perhaps we can generalize this to specify a unit vector that
> doesn't have to be normalized, like
>
>   COMPONENT_FORCE_SOURCE_X = 1
>   COMPONENT_FORCE_SOURCE_Y = -2
>   COMPONENT_FORCE_SOURCE_Z = -1
>
> Then in the code we could normalize (one less thing to worry about).
> This would be closer to the 2D code (which takes and angle), and it
> would allow for computing Green's functions on fault with arbitrary
> orientation.
>
> Let me know what you think.
>
> Thanks,
> Carl
>
> On Wed, Jul 18, 2012 at 2:45 PM, daniel peter<dpeter@princeton.edu>  wrote:
>> Hi Carl
>>
>> there is such an option in the constants.h file for the 3D codes (you will have to set the flag, something like "USE_POINT_FORCE" and direction, and then recompile the code).
>>
>> I'll see what we could put into the manual as well.
>>
>> Best wishes
>> Daniel

-----------------------------------------------------------
Stacey conditions:
-----------------------------------------------------------

- suggestion 08:
----------------

fix the Stacey stability bug: already done by Zhinan in 2D and also in a demo 3D code; Jo is now done merging that into GLOBE;
should be done (by Daniel? by Jo?) in SPECFEM3D as well, because very useful to increase the time step.

Update on June 19, 2012: Vadim will do it in SPECFEM3D + MPI, he will test it and then send the modified code to Daniel, Jo and Dimitri,
and then Daniel can commit it to SVN.

- suggestion 14:
----------------

when fixing the Stacey bug in 3D, make sure only three mass matrices are used, not four (in order to save as much memory as
possible);
in the acoustic elements, a single mass matrix is probably still sufficient, as already checked by Zhinan

of course, when absorbing conditions are turned off, a single mass matrix should be used and the two others should be declared with
a dummy size of 1

almost already implemented by Jo in SPECFEM3D_GLOBE; can then be cut and pasted in SPECFEM3D as well

already done in the 2D version

- suggestion 15:
----------------

fix the Stacey / Sommerfield equations (stability issue) in acoustic elements: already done by Zhinan in 2D, Jo will do it in 3D
but will need Zhinan to send him the equations i.e. help him do it in the 3D code

as Zhinan says, it is probably simple for Jo to make the modification by taking the code Zhinan added to modify
"rmass_x" and making the same modifications for "rmass_acoustic"
(only when absorbing conditions are turned on of course).

should be done in both SPECFEM3D and SPECFEM3D_GLOBE; already done in SPECFEM2D

-----------------------------------------------------------
attenuation:
-----------------------------------------------------------

- suggestion 09:
----------------

fix the attenuation time stepping bug (my RK4 time scheme from 1999 was not a true RK4); already done by Zhinan in 2D and in a demo
3D code, Jo and Daniel should merge that in SPECFEM3D and in SPECFEM3D_GLOBE because that makes the codes significantly more
accurate for viscoelasticity, in particular for long runs

to validate that we could/should use figures 21 (a, b and c) and 22 of our 2002 GJI paper with Jeroen


- suggestion 10:
----------------

the attenuation arrays are always fully 3D in the current 3D codes (at least in SPECFEM3D_GLOBE, please doublecheck in SPECFEM3D);
i.e. even for 1D attenuation models (layer cakes with constant Q per geological layer) all these arrays are declared as (i,j,k,ispec),
while (1,1,1,ispec) would be sufficient;

this was done by Hejun, it is useful for 3D attenuation models but wastes a lot (really a lot) of memory in the case of 1D
attenuation

that would save a factor of 125 on attenuation arrays in these (very common) cases of constant Q per regions

I think switching from (NGLLX,NGLLY,NGLLZ) to (1,1,1) i.e. using the lower left corner of each element in the case of 1D
attenuation would be easy to implement and sufficient to fix the problem (since array sizes would decrease by a factor of 125).

Jo should fix that in SPECFEM3D_GLOBE (I am not sure if there is the same problem in SPECFEM3D; Daniel will know); 3D attenuation
should be optional, and off by default

note: turned on by default, honoring 80km discontinuity crucial for reliable results

To do that in SPECFEM3D_GLOBE, the following command will be very useful because it will show the largest arrays used (more
precisely: statically allocated)
in the code ("allocatable" arrays will *NOT* appear, but we use no such big allocatable arrays in SPECFEM3D_GLOBE):
nm --print-size --size-sort --reverse-sort --radix=d ./bin/xspecfem3D | cut -f 2- -d ' ' | less

-----------------------------------------------------------
mesh files:
-----------------------------------------------------------

- suggestion 12:
----------------

check if file 'array_dims.txt' is used or not (it seems to be unused; it seems to have been an old file for ParaView, but let us
doublecheck)

currently each MPI slice writes its own version of that file, thus if using 1000 cores we get 1000 copies of that file; that puts a
heavy load on shared file systems for no reason

Jo and Daniel, please check if it is used (in particular in ParaView postprocessing scripts) and if not please suppress it; if it
is used, I think only rank 0 could create it, no need to have 1000 identical copies

this applies to SPECFEM3D_GLOBE; I am not sure about SPECFEM3D, please doublecheck

Feedback from Qinya: I chose to output the exact nspec/nglob numbers for each slice to array_dims.txt in save_array_solver.txt,
    because the nspec/nglob number in values_from_mesher.h are only the maximum values for all slices. We need the exact numbers
    to read properly the topology/kernel binary files for combine_vol_data.f90, and generate the .mesh and .vtu files for paraview.
    If the new version of the code has exactly the same number of nspec/nglob number for all slices, it is no longer needed.
    Otherwise, I am sure there is a way around it. As long as we can get this info from other files, we can eliminate the array_dims.txt file.

------------------------------------------------
record length in SPECFEM3D_GLOBE:
------------------------------------------------

- suggestion 24:
----------------

internally in the three codes we compute a time shift t0 to shift the source and have null initial conditions, and we subtract it
when we save the seismograms and also the source time function; that is the right thing to do, but we forgot to also add it to
"RECORD_LENGTH_IN_MINUTES" after reading it from DATA/Par_file.

As a result, the seismograms stop at RECORD_LENGTH_IN_MINUTES*60 - t0 seconds rather than the expected RECORD_LENGTH_IN_MINUTES*60
duration.

when running simulations at short periods t0 is small compared to RECORD_LENGTH_IN_MINUTES*60 and thus one hardly notices the
problem, but the other day I ran a very long period calculation with hdur = 150 seconds and then my seismograms were 150 or 200
seconds too short and thus I had to delete them and redo the whole simulation.

Solution (easy): add t0 to RECORD_LENGTH_IN_MINUTES*60 when computing the number of time steps (NSTEPS) in the three codes

Jo, could you please do that at some point? (not the first priority of course)

Feedback from Qinya: This should not affect kernel calculation, as long as the same record length is used for forward and adjoint simulations.

-----------------------------------------------------------
documentation:
-----------------------------------------------------------

- suggestion 27:
----------------

For Emanuele and Federica:

I agree that read_external_mesh() with varying parameters (for instance to import external tomographic models) is described on page 13 of the manual, but I think the description could / should be made a bit more precise.

In the current version of page 13 of the manual it seems to be a relatively minor point, while in practice many users will use that feature (many/most applications will use such external models).

I therefore suggest that Federica and you make it a slightly longer paragraph, and probably even make it a separate section or sub-section.
This way, the item will appear in the table of contents.

-----------------------------------------------------------
packaging:
-----------------------------------------------------------

- suggestion 28:
----------------

two small problems in the "configure" script (of SPECFEM2D, SPECFEM3D and also SPECFEM3D_GLOBE):

* CFLAGS for the C compiler is hardwired in the different Makefiles (and set to -O2 or something simple like that); it is not defined in "flags.guess"; this is problematic when running on more than 2 GB of memory per core, which is not uncommon these days; in this case many compilers require users to add "-mcmodel=medium" to the Fortran and to the C compiler flags. In the current flags.guess this can be done for Fortran but not for C and thus we get an error message from the linker (with gfortran / gcc, with Intel ifort / icc, probably others as well); the solution would be to move CFLAGS from the Makefiles to flags.guess (only)


* when we use Intel icc (for instance with "./configure FC=ifort CC=icc") the script seems to guess that GNU gcc is used instead:

checking for gcc... icc
checking whether we are using the GNU C compiler... yes
checking for gcc... (cached) icc

Obviously the script should print "checking for CC" instead, and "checking whether we are using the GNU C compiler" should return "no".

note: our configure scripts are createad by autoconf (see suggestion 29) which seems to cause this behavior.

Another option (at least in the meantime) would be to tell users in the manual to use this: "./configure FC=ifort CC=icc CFLAGS="-O2 -mcmodel=medium -shared-intel".

- suggestion 29:
----------------

Elliott Sales de Andrade will fix the configure.ac autoconf scripts in SPECFEM2D, SPECFEM3D and SPECFEM3D_GLOBE.


------------------------------------------------
sensitivity kernel output:
------------------------------------------------

- suggestion 31:
----------------

Elliott Sales de Andrade and Qinya will implement a routine to interpolate and output the sensitiviy kernels
on a topologically-regular grid instead of the non-structured SEM grid (this can be useful for many processing
packages that require a topologically-regular grid of points as input).


- suggestion 37:
----------------

Add QKappa attenuation in solids and in fluids, in addition to Qmu

- suggestion 36:
----------------

Add Deville for acoustic media; currently there is Deville for viscoelastic media only:  done by Dimitri in March 2013

- suggestion 33:
----------------

DK DK new suggestion by Dimitri, Jan 2013: now that C-PML is implemented for forward runs, make it work for adjoint runs as well
(more precisely, the problem is just to reinject the right boundary conditions at the entrance of the PML when we undo the forward run
in the second set of simulations, starting from the last time frame and going backwards)