SHORT SUMMARY
=============

This HOWTO briefly summarises how to process MegaPrime@CFHT
multi-colour data sets with THELI.  The file gives you an overview on
the processing and performs a concrete reduction example on two-colour
data of the galaxy cluster A1835.

File version: 28/02/2013  (Thomas Erben)

THELI GLOSSARY
==============

In the following we use the following terms you should be familiar
with:

- THELI paper: A description of THELI algorithms published in
  Astronomical Notes (AN 326, 432)

- CARS paper: A description of the processing of parts of the
  CFHTLS-Wide survey (A&A 493, 1197)

- RUN or RUN processing: please see the THELI paper

- SET or SET processing: please see the THELI paper

- SURVEY: All data processing described within this file concerns
  'surveys'. For practical reasons, each data set to be processed with
  the methods described here, belong to a survey. This can be an
  imaging survey such as CFHTLS or NGVS but also observations of
  individual objects. For instance, we attribute observations from
  individual galaxy clusters to a survey called CLUSTERS.

- reduce directory: The directory where all THELI scripts reside and
  from where you launch all the processing. It is created during the
  software setup (see the file 'HOWTO_CFHTLS_collab_installpipe.txt').

- somedir: an arbitrary, but fixed directory where all data from
  a THELI survey processing session are stored.

SOFTWARE INSTALLATION
=====================

Necessary software setup is described in the HOWTO 
'HOWTO_CFHTLS_collab_installpipe.txt'.

DATA DIRECTORY SETUP
====================

The current pipeline setup requires the data to be processed
within one directory (with lots of free disk space)!

- Go to the top of the directory where you intend to reduce your
  data. We call this directory 'somedir' henceforth.
  Call the script 'create_CFHTLS_set_dirs.sh' which is located within
  your reduce directory. We want to process an object within the
  CLUSTERS survey. so call:

  '.../create_CFHTLS_set_dirs.sh CLUSTERS mode1'

  This creates a subdirectory CLUSTERS with one directory 
  (and subdirectories) for all potential targets within this survey.
  If the target you want to process is not yet included, you need to
  modify the script 'create_CFHTLS_set_dirs.sh'.
  In this case you also need to modify the following files
  within your reduce directory:

  - doall_set_CFHTLS_survey_coord.txt
  - link_sets_CFHTLS_para.sh

  The scripts basically need to know the names and the coordinates
  of the targets within each survey.

- Copy the file GLOBALWEIGHTS.tar.gz from your reduce directory
  to the survey subdirectory, in our case 'CLUSTERS'.
  Unpack the file with 'tar xvfz GLOBALWEIGHTS.tar.gz'.
  After the step you should have the directory structure:

  somedir/CLUSTERS/GLOBALWEIGHTS/globalweight_1.fits ....
               .
               .
               /A1835/i
               /A1835/WEIGHTS_i
               .
               .
               /A1942/u
               /A1942/WEIGHTS_u
               .
               .

  In this case A1835, A1942 etc. represent possible targets of the
  CLUSTERS survey.

THE A1835 DATA
==============

- Obtain Elixir preprocessed MEGAPRIME@CFHT data of the cluster A1835
  from the CADC archive! As of 21/09/2009 the cluster is observed with
  MEGAPRIME in the two colours g and r and the set contains the
  following 14 exposures:

  784537p.fits 784538p.fits 788576p.fits 788575p.fits 788577p.fits 
  788070p.fits 788579p.fits 788069p.fits 788578p.fits 784535p.fits 
  784536p.fits 788574p.fits 784539p.fits 784540p.fits   

  Put those images in a directory (say requestdir) and call the script
  'sort_MEGAPRIME_survey.sh' (located within your reduce directory).
  It gives you instructions and details on its processing.

THE RUN PROCESSING
==================

In the case of Elixir preprocessed MEGAPRIME@CFHT data, the THELI RUN
processing (see THELI paper and CARS paper) consists of the steps
below. It needs to be done separately for each run created by the
'sort_MEGAPRIME_survey.sh' script.  For the whole procedure of RUN
processing, the superscript 'doall_run_CFHTLS_single.sh' is available
(see below).

- 'splitting the MEF Elixir FITS images into single-chip frames' This
  step also should check for file integrity and reject unusable
  exposures

- 'trimming the images and conversion to FLOAT format' The Elixir
  preprocessed images (integer files) still contain the area of 
  overscan regions

- 'subtraction of the sky of the single-chip frames'

- 'creation of small (8x8 binned) mosaics from the single-chip
  frames belonging to one exposure' 
  They serve for quick inspection and they are used for the RUN WWW pages.

- 'creation of weight images' Starting point for the weights are here
  simply bad pixel maps for the individual chips - see also the CARS
  paper. These files are located in the GLOBALWEIGHTS directory
  created in the 'DATA DIRECTORY SETUP' step.

- 'identification of satellite tracks within the exposures' 
  This step creates ds9 region files for each single frame where
  satellite tracks are identified. It is by far the most time and
  disk-space consuming step of THELI RUN processing. It is therefore
  advisable to store the files as soon as they have been created and
  not to repeat this step if redoing RUN processing.

- 'redoing the weights with the inclusion of satellite track' Note
  that the satellite track identification already needs a first
  version of the WEIGHTS; hence, this step has to be done twice.

- 'correction of weights'
  We saw that our cosmic ray detector within the WEIGHTS creation 
  setup is too aggressive. It also flags the cores of stars in this 
  case. This step tries to correct this.

- 'object catalogue creation' This step creates object catalogues
  which are mainly used for astrometric and photometric calibration
  during SET processing. We need them here to extract statistics and
  PSF information for the RUN Web pages.

- 'creation of WWW pages of the RUN processing' (skip this at the moment!)

- 'distribution of the RUN products to the SET structure'
  
CONCRETE RUN PROCESSING FOR A1835
=================================

During the data preparation step, the A1835 data were distributed for
RUN processing. You should have the directory structure:

    /somedir/CLUSTERS/g/run_05AQ02_A1835/SCIENCE_g/ORIGINALS/*p.fits
             ......../g/run_05AQ02_A1835/WEIGHTS
      .
      .
             ....... /r/run_05AQ02_A1835/SCIENCE_r/ORIGINALS/*p.fits
             ......../r/run_05AQ02_A1835/WEIGHTS

      .      ......../A1835/r
             ......../A1835/g
      .      ......../A1835/WEIGHTS_r
             ......../A1835/WEIGHTS_g

Here, the Elixir preprocessed frames are located under the ORIGINALS
directories.

All THELI RUN processing can be done with the superscript
doall_run_CFHTLS_single.sh

A basic call to it would look like (assuming the directory structure
above); Note that all THELI calls should be done within the reduce
directory!

./doall_run_CFHTLS_single.sh -b /somedir -f r -r run_05AQ02_A1835 \
                             -psf Y -reg /somedir/regs -www /somedir \
                             -set /somedir \
                             -w /somedir/CLUSTERS/GLOBALWEIGHTS \
                             -www /somedir -s CLUSTERS -m "PREPARE"

Please consult the script for details (probably the comments/remarks
there are a bit confusing :-)) )

The call above would do parts of the RUN processing for run
'run_05AQ02_A1835', filter 'r' within the clusters survey. 
The arguments mean the following:

- '-b': 'base directory'; the survey directories are assumed
        to be below this one

- '-s': which survey to process (here CLUSTERS)

- '-f': which filter to process (here 'r')

- '-r': which RUN to process (here run_05AQ02_A1835)

- '-psf Y': do PSF analysis (we basically always want this)

- '-reg': The satellite track region files are copied to a location
          /somedir/regs for easy backup. When asking the masterscript
          to perform satellite track detection the script looks at
          that location whether region files for the run under
          consideration are already present. In this case the step is
          skipped and old region files are just copied (this
          behaviour can be altered with command line options to the
          script).

- 'www': RUN WWW pages are created under /somedir/CLUSTERSWWW
         (appropriate name modification if the survey is not CLUSTERS)

- '-set': The SET directory structure will be created under
          /somedir/CLUSTERS (appropriate name modification if the
          survey is not CLUSTERS); usually RUNS and SETS are always
          created under the same structure.

- '-m PREPARE': The individual RUN processing steps outlined above are
          done in so-called modes within
          'doall_run_CFHTLS_single.sh'. The PREPARE mode does the MEF
          data splitting and data verification by examining pixel
          statistics. All the other tasks described above are done in
          one of these modes. Their names are: PREPARE, SCIENCE,
          SKYSUB, MOSAICS, TESTREG (checks whether region files for
          the current run are already available; it should always be
          used together with any MODE concerning weight image
          creation), GLOBALWEIGHTS, WEIGHTS, STRACKS, REGIONWEIGHTS
          (weight 'recreation' with satellite track region files),
          CORRECTWEIGHTS, SINGLEASTROM (catalogue creation),
          RUNSCIENCEPAGES, RUNDISTRIBUTE
          
You can run the script giving several processing modes at the same
time. A complete run processing for run_05AQ02_A1835 in filter 'r'
would, for instance, be done with:

./doall_run_CFHTLS_single.sh ...... \
  -m "PREPARE SCIENCE SKYSUB MOSAICS GLOBALWEIGHTS TESTREG WEIGHTS STRACK REGIONWEIGHTS CORRECTWEIGHTS SINGLEASTROM RUNDISTRIBUTE RUNSCIENCEPAGES"

Each mode populates the SCIENCE, WEIGHT and /somedir directories with
new files, deletes obsolete ones etc. It would go much too far to
describe these processes in detail here.
     
To get familiar with the whole process you are advised to run the
following commands, one after the other, and to inspect the SCIENCE
etc. directories to see what is created.

-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "PREPARE"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "SCIENCE"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "SKYSUB"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "MOSAICS"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "GLOBALWEIGHTS"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "TESTREG WEIGHTS"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "TESTREG STRACK"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "REGIONWEIGHTS"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "CORRECTWEIGHTS"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "SINGLEASTROM"
-  ./doall_run_CFHTLS_single.sh ..-f r -r run_05AQ02_A1835 -m "RUNDISTRIBUTE"

The last step populates the directories

 /somedir/CLUSTERS/A1835/r
 /somedir/CLUSTERS/A1835/WEIGHTS_r

In these directories the SET A1835 r-band will be SET processed
later-on.  This very last step also needs a properly setup script
'link_sets_CFHTLS_para.sh'.  The distribution from RUN data to SET
directories is done on the basis of the target coordinates within this
script (see also the section 'DATA DIRECTORY SETUP' above)!

Repeat the run processing for the remaining RUNS of the A1835
cluster. These are: run_05AQ03_A1835 (r-band), run_05AQ02_A1835
(g-band) and run_05AQ03_A1835 (g-band).

It is difficult for me here to give a more precise overview of what
you should expect from each step. I only provide here some examples
that should be sufficient to see whether things fail completely:
 
- The PREPARE step creates, within the SCIENCE_? (? stands for a
  filter), for each chip an exposure with the name '....._??.cat'; the
  '??' stands for a number between 1 and 36 (MEGAPRIME has 36
  chips). The images are in integer format. In cases where images do
  not pass a simple verification test, the SCIENCE_? contains a
  subdirectory BADMODE with the bad images in it.
 
- The SCIENCE step creates, within the SCIENCE_?, for each chip an
  exposure with the name '...._??C.at'; the 'C" stands for CUT (image
  trimming). The images are in float format.

- The MOSAICS step creates a subdirectory '..../SCIENCE_?/BINNED' which
  contains binned FITS files. One for each 36-chip exposure.

- The GLOBALWEIGHT step copies, for each chip a file globalweight_??.fits
  to the .../run.../WEIGHTS directory. Furthermore, corresponding 
  GLOBAL flag files with names globalflag_??.fits are created.

- The WEIGHT step (same for REGIONWEIGHTS) creates, for each single-chip
  image one weight and one flag image in the .../run.../WEIGHTS directory.
  Both products are 8-bit FITS images.

- The STRACK step should finally result in the following files:

  - r-band: run_05AQ02_A1835:
    784535p_28.reg  784535p_8.reg   784536p_2.reg   784538p_13.reg  
    784538p_32.reg  784535p_29.reg  784535p_9.reg   784536p_3.reg   
    784538p_22.reg  784538p_3.reg   784535p_30.reg  784536p_1.reg   
    784536p_4.reg   784538p_23.reg  784538p_9.reg   784535p_7.reg   
    784536p_28.reg  784538p_12.reg  784538p_2.reg
  
  - r-band: run_05AQ03_A1835:
    788575p_15.reg  788575p_16.reg  788576p_1.reg  788576p_2.reg  
    788576p_3.reg
  
  - g-band: run_05AQ02_A1835:
    784539p_10.reg  784539p_24.reg  784539p_32.reg  784539p_3.reg
    784539p_1.reg   784539p_2.reg   784539p_33.reg  784539p_4.reg
  
  - g-band: run_05AQ03_A1835:
    no region files!

- The SINGLEASTROM step creates subdirectories
  '.../SCIENCE_?/cat'. Therein you should find (amongst others):

  - one catalogue '...C.cat' per chip for each exposure
  - a subdirectory PSFcheck with a postscript file showing the PSF
    anisotropy structure for each exposure


Note that doall_run_CFHTLS_single.sh and most of the THELI scripts do
not react intelligently on errors! Most of them also have confusing
output!

THE SET PROCESSING
==================

The SET processing concerns final astrometric and photometric
calibration of targets/filters, image co-addition and first quality
assessments.  In the case of A1835 all SET processing will happen
within the directories:


    /somedir/CLUSTERS/A1835/r
             ......../A1835/g
      .      ......../A1835/WEIGHTS_r
             ......../A1835/WEIGHTS_g

In this case we have exactly two sets (A1835 r-band and A1835 g-band)
of one target (A1835). Similar to RUN processing, we have a superscript
'doall_set_CFHTLS_single.sh' for SET processing.


- First step is astrometric and final photometric calibration which
  is done together on all sets of a target (we can use all sources from
  different filters to obtain an optimal astrometric solution): 

  ./doall_set_CFHTLS_single.sh -b /somedir -s CLUSTERS -am SCAMP \
                             -a 2MASS -fi A1835 -mf "r g" \
                             -m "ASTROMPHOTOM"

  The arguments '-b', '-s' and '-m' are here the same as for the
  doall_run_CFHTLS_single.sh script.

  - '-fi': target (or field) to process (here A1835)
 
  - '-mf': multiple filters to process; note that you need to give multiple
    filters within quotes. The ASTROMPHOTOM mode is the only one that
    supports multiple filters! For all other modes you have to give
    '-f' and a single filter such as for doall_run_CFHTLS_single.sh.

  - '-am': method used for astrometric calibration (here SCAMP).  We now
     exclusively use the 'scamp' for this purpose. The script contains a
     lot of (obsolete) code on astrometry with ASTROMETRIX.
  
  - '-a': astrometric standard star catalogue to use. This should,be
    SDSS-R6 if the target has Sloan overlap (as of 22/09/2009 'scamp'
    does not yet support Sloan Release 7) or 2MASS in all other cases.
    Stay away from the USNO catalogues! Although they represent the
    densest all-sky catalogues, their astrometric accuracy turned out to
    be poor.

  The step runs Emmanuel Bertins scamp and several other THELI programs.
  You should check:

  - In the SET directory of the first filter you provide (here
    .../CLUSTERS/A1835/r) scamp creates the directories:

    astrom_photom_scamp_2MASS_astrom/cat
                             ......./headers
                             ......./plots

    astrom_photom_scamp_2MASS_photom/cat
                             ......./headers
                             ......./plots

    headers_scamp_2MASS_astrom
    headers_scamp_2MASS_photom
    headers_scamp_2MASS

    We run scamp two times as it turned out that different parameter
    sets are optimal for astrometric and relative photometric
    calibration. You absolutely need to check that the plots in the
    plots directories look fine. If this is the case, you can be
    confident on a successful scamp run.  The final scamp headers,
    needed for image coaddition are located in
    headers_scamp_2MASS. They are an appropriate combination of the
    results in headers_scamp_2MASS_astrom and
    headers_scamp_2MASS_photom.

    You find corresponding header directories/files in the directories
    for the other filters of your target.

    Furthermore, you should find in the directories
    ../CLUSTERS/A1835/r/cat files with the names chips_phot.cat5.
    If those are present, the absolute photometric calibration
    took place.

    To see whether it is reasonable, call in the corresponding cat
    directories:

    'ldactoasc -s -i chips_phot.cat5 -t ABSPHOTOM'

    It is important that the quantity COADDZPSDEV is small. It gives
    the dispersion of zeropoints in the individual frames.
    It is beyond the scope of this HOWTO to discuss in more detail
    what you can do to improve this calibration - just ask in case.

- Second step of SET processing is the generation of some check plots.
  This needs to be done for each filter individually:

  ./doall_set_CFHTLS_single.sh -b /somedir -s CLUSTERS -am SCAMP -a 2MASS \
                -fi A1835 -f "r" -m "SETCHECKPLOT"  
    
  and correspondingly for 'g'. Note that you should ALWAYS pass the '-am'
  and '-a' options, also if your MODE does not seem to need them.
  They are used for appropriately updating image headers, to label plots etc.
  The plots of this command go to

  ../CLUSTERS/A1835/r/plots

  Have a look at them.

- Finally, perform image co-addition and auxiliary operations - for each filter
  separately:

  ./doall_set_CFHTLS_single.sh -b /somedir -s CLUSTERS -am SCAMP -a 2MASS \
     -cc "(SEEING<1.5)AND(EXPTIME>100);" -cm WEIGHTED -www /somedir -ci V2.0A \
     -fi A1835 -f "r" -m "COADD POSTCOADD COADDHEADERUPDATE CUTCOADD SETPAGES"

  - '-cc': coadd condition: you can give several conditions on the data
    to exclude certain exposures from the co-addition process. In our case
    we only include images with a seeing of less than 1.5 arcsec and with an
    exposure time of 100s and more. The syntax of the condition follows that
    of the ldacfilter command within the LDACtools..

  - '-ci': coaddition identifier; this is the version number the co-added
    images will get.

  - '-cm': coaddition method; this always should be WEIGHTED (weighted mean
    co-addition

  - '-www': base directory for SET WWW pages (see doall_run_CFHTLS_single.sh)

  The given modes roughly do the following:

  - COADD: perform image co-addition

  - POSTCOADD: extract first object catalogues and create galaxy
               number-count and size-magnitude plots

  - COADDHEADERUPDATE: Update the image header of the co-added image
                       (magnitude zeropoint, seeing, etc.)
  
  - CUTCOADD: cut the co-added image to a size of 21000x21000. This
              is essential to extract sources from a multi-colour
              observation with the SExtractor double image mode etc.

  - SETPAGES: create WWW pages on the final co-added images (do not run this
              mode at the moment!)

  The essential products of this step are the following (given for the 'r'
  filter; similar files are created for the g-band):
  
  - A subdirectory 
    .../A1835/?/coadd_V2.0A is created (the V2.0A is the coaddition identifier
    you supplied with the '-ci' option).

    Therein you should find the images 
    - A1835_r.V2.0A.swarp.fits: co-added science image

    - A1835_r.V2.0A.swarp.weight.fits: co-added weight image

    - A1835_r.V2.0A.swarp.flag.fits: co-added flag image; it contains a
          '1' where the weight is zero and zero otherwise, i.e. it
          marks all pixels to which none of the input images
          contributed

    - A1835_r.V2.0A.swarp.sum.fits: It contains in each pixel the number
          of input images that contributed to that pixel; it DOES NOT
          take into account the weight for this purpose.

    In addition, you find images with names
    A1835_r.V2.0A.swarp.cut.fits representing the cutted versions
                       -----

    When these images are fine we typically delete the uncutted ones.

- A subdirectory .../A1835/postcoadd with subdirectories plots and
  cats. Have a look at the plots!