# Isoplane Reduction Theory and Strategy (End-to-End)

This document explains what is considered at each reduction stage, why each step exists, and what mathematical model is used in the current `kspecdr` implementation for Isoplane data.

It complements the procedure-oriented guide in [Isoplane Data Reduction Workflow](../tutorials/isoplane-end-to-end.md).

## 1. End-to-End View

```{mermaid}
flowchart TD
    A["Raw Isoplane FITS"] --> B["Header Conversion + Orientation Normalization"]
    B --> C["IM Preprocessing (Bias/Dark/Variance/Masking)"]
    C --> D["Calibration Frame Combination"]
    D --> E["Fiber Flat IM -> Tramline Map (TLM)"]
    E --> F["Arc IM -> Extraction (EX)"]
    F --> G["Arc Wavelength Calibration -> RED (WAVELA, SHIFTS)"]
    E --> H["Science IM -> Extraction (EX)"]
    G --> I["Science Rebin/Scrunch using Arc Solution"]
    H --> I
    I --> J["Final Calibrated Science Product"]
```

## 2. Mathematical Notation

We use the following symbols throughout:

$$
I(x,y): \text{image intensity at spatial pixel }x\text{ and spectral pixel }y
$$

$$
V(x,y): \text{variance of }I(x,y)
$$

$$
g: \text{detector gain (e}^- / \text{ADU}), \quad \sigma_r: \text{read noise (e}^-)
$$

$$
\lambda(y): \text{wavelength as a function of spectral pixel}
$$

$$
T_f(y): \text{tramline center of fiber }f\text{ at spectral pixel }y
$$

## 3. Stage 1: Isoplane Header Conversion and Normalization

### 3.1 What is considered

- Raw Isoplane metadata is instrument-specific and not fully compatible with downstream modules.
- Downstream logic requires standardized keys such as `INSTRUME`, `RO_GAIN`, `RO_NOISE`, `LAMBDAC`, `DISPERS`, and `DISPAXIS`.
- Orientation inconsistencies must be corrected before trace detection and extraction.

### 3.2 Strategy

- Sanitize invalid FITS cards.
- Map readout mode keywords to calibrated gain/noise values.
- Convert timing to standard seconds and UTC fields.
- Parse grating settings and compute dispersion estimate.
- Flip image axes when orientation keywords indicate inversion.
- Attach a valid `FIBRES` table (dummy or user-supplied).

### 3.3 Core equations

Exposure time:

$$
t_{\mathrm{exp}} = \frac{t_{\mathrm{ms}}}{1000}
$$

Reciprocal linear dispersion model:

$$
\mathrm{RLD}\ [\text{\AA/mm}] = 23.0 \times \frac{1200}{L_{\mathrm{mm}}}
$$

Pixel-size conversion:

$$
p_{\mathrm{mm}} = p_{\mu \mathrm{m}} \times 10^{-3}
$$

Pixel dispersion:

$$
\mathrm{DISPERS}\ [\text{\AA/pixel}] = \mathrm{RLD} \times p_{\mathrm{mm}}
$$

Central wavelength:

$$
\lambda_c\ [\text{\AA}] = 10 \times \lambda_c\ [\text{nm}]
$$

## 4. Stage 2: IM Preprocessing (`make_im`)

### 4.1 What is considered

- Hot/bad/saturated pixels can destabilize trace and extraction.
- Bias and dark are additive detector/system backgrounds.
- Variance must be explicit for statistically meaningful extraction/calibration.
- Flat-field and cosmic-ray operations should be optional and controlled.

### 4.2 Strategy

- Apply bad-pixel masks and mark saturated regions.
- Subtract bias using either:
  - scalar median level, or
  - full 2D master bias.
- Subtract dark after exposure-time scaling.
- Initialize or update variance.
- Optionally divide by long-slit flat and remove cosmic rays.

### 4.3 Equations

Bias subtraction:

$$
I_b(x,y) = I_{\mathrm{raw}}(x,y) - B(x,y)
$$

Dark scaling and subtraction:

$$
s_d = \frac{t_{\mathrm{obj}}}{t_{\mathrm{dark}}}, \qquad
I_{bd}(x,y) = I_b(x,y) - s_d D(x,y)
$$

Initial variance model:

$$
V_0(x,y) = \sigma_r^2 + \frac{\max(I_{bd}(x,y), 0)}{g}
$$

Flat-field correction (`F` = flat image):

$$
I'(x,y) = \frac{I_{bd}(x,y)}{F(x,y)}
$$

Implemented variance update for flat division:

$$
V'(x,y) = \frac{V_0(x,y)}{F(x,y)^2}
          + \frac{I'(x,y)^2\,V_F(x,y)}{F(x,y)^2}
$$

### 4.4 Flow

```{mermaid}
flowchart TD
    A["Converted FITS"] --> B["Create IM (float image + VARIANCE HDU)"]
    B --> C["Apply bad/saturated pixel masking"]
    C --> D["Bias correction"]
    D --> E["Dark correction"]
    E --> F["Variance initialization/update"]
    F --> G["Optional: LFLAT division"]
    G --> H["Optional: CR cleaning (LACOSMIC)"]
    H --> I["Output *_im.fits"]
```

## 5. Stage 3: Calibration-Frame Combination (`combine_image`, `reduce_*`)

### 5.1 What is considered

- Multiple exposures reduce random noise and reject outliers.
- Different frame classes require different combination policies.
- Darks may need separate masters by exposure time.

### 5.2 Strategy

- Build stack over input IM frames.
- Optionally normalize level per frame (central median) before combining.
- Combine using `MEAN`, `MEDIAN`, or `SIGMA_CLIP`.
- Propagate or estimate variance for combined frame.

### 5.3 Equations

Mean combine:

$$
I_{\mathrm{mean}} = \frac{1}{N}\sum_{i=1}^{N} I_i
$$

$$
V_{\mathrm{mean}} = \frac{1}{N^2}\sum_{i=1}^{N} V_i
$$

Median combine variance approximation:

$$
V_{\mathrm{median}} \approx \frac{\pi}{2}\,V_{\mathrm{mean}}
$$

Sigma-clipped variance on surviving samples:

$$
V_{\mathrm{clip}} = \frac{\sum V_i^{(\mathrm{valid})}}{N_{\mathrm{valid}}^2}
$$

## 6. Stage 4: Tramline Map Generation (`make_tlm`)

### 6.1 What is considered

- Fiber traces are curved and may shift with optical distortion.
- Peak finding must be robust against noise and local background variation.
- Traces must be linked consistently across sampled columns.

### 6.2 Strategy

- Sample image in spectral direction (`STEP=50` in current code).
- Build spatial profile in each sampled column window.
- Detect candidate peaks (Isoplane default uses wavelet-based detection).
- Link peaks with multi-target tracking (Hungarian assignment with gating).
- Fit polynomial trace model and interpolate/extrapolate missing fibers.
- Write TLM and predicted wavelength extension.

### 6.3 Peak detection mathematics

The wavelet threshold uses a robust scale:

$$
\sigma_{\mathrm{robust}} = 1.4826 \cdot \mathrm{MAD}(R)
$$

$$
z_{\mathrm{tol}} = k_{\mathrm{MAD}} \cdot \sigma_{\mathrm{robust}}
$$

where \(R\) is the wavelet response and \(k_{\mathrm{MAD}}\) is a tunable multiplier.

Sub-pixel peak center from zero crossings:

$$
x_{\mathrm{peak}} = \frac{x_{\mathrm{L0}} + x_{\mathrm{R0}}}{2}
$$

### 6.4 Tracking mathematics

Assignment cost between existing track and current peak:

$$
C = \sqrt{(\Delta x)^2 + (\Delta s)^2}
$$

with gating constraints:

$$
|\Delta x| \le d_{\max}, \qquad \Delta s \le s_{\max}
$$

### 6.5 Predicted wavelength in TLM

For spectral pixel center \(y\):

$$
\lambda_{\mathrm{pred}}(y) = \lambda_c + d_\lambda\,(y-y_{\mathrm{mid}})
$$

where \(d_\lambda = \mathrm{DISPERS}\), \(y_{\mathrm{mid}} = 0.5\,N_{\mathrm{spec}}\).

### 6.6 Flow

```{mermaid}
flowchart TD
    A["Input flat IM"] --> B["Column sampling + spatial profile"]
    B --> C["Wavelet peak detection"]
    C --> D["Track linking (MTT + Hungarian)"]
    D --> E["Polynomial trace fit per track"]
    E --> F["Trace-to-fiber matching"]
    F --> G["Interpolate/extrapolate missing fibers"]
    G --> H["Write TLM + predicted WAVELA"]
```

## 7. Stage 5: Extraction (`make_ex`, SUM path)

### 7.1 What is considered

- Extraction aperture must follow tramline center for each fiber and spectral pixel.
- Partial-pixel overlap is needed for sub-pixel center locations.
- Invalid pixels and disabled fiber classes should be handled explicitly.

### 7.2 Strategy

- For each \((y,f)\), define aperture centered at \(T_f(y)\) with width \(W\).
- Integrate all touched spatial pixels by overlap fraction.
- Accumulate flux and variance over aperture.
- Zero fibers of types `F`, `N`, `U`.

### 7.3 Equations

Aperture bounds:

$$
x_{\mathrm{low}} = T_f(y) - \frac{W}{2}, \qquad
x_{\mathrm{high}} = T_f(y) + \frac{W}{2}
$$

Flux (implemented):

$$
F_f(y) = \sum_{x \in \mathcal{A}(y,f)} I(x,y)\,w_x
$$

Variance (implemented):

$$
V_f(y) = \sum_{x \in \mathcal{A}(y,f)} V(x,y)\,w_x
$$

where \(w_x \in [0,1]\) is the overlap fraction for pixel \(x\).

## 8. Stage 6: Arc Wavelength Calibration (`reduce_arc`, `reduce_arcs`)

### 8.1 What is considered

- The predicted wavelength from header is approximate.
- Arc line matching needs robust handling of blends, saturation, and outliers.
- Fiber-to-fiber distortion requires landmark synchronization.

### 8.2 Strategy

- Select a robust reference fiber near detector center.
- Build a high-S/N template by fiber synchronization and averaging.
- Read lamp line table (`*.arc`) and mask problematic blends.
- Generate synthetic arc model and estimate shift field by cross-correlation.
- Refine line centroids and fit global polynomial wavelength model robustly.
- Propagate calibration to all fibers and write `WAVELA`/`SHIFTS`.

### 8.3 Key equations

Line-width estimate from wavelet zero-crossing gap \(w\):

$$
\sigma_{\mathrm{line}} = \sqrt{\left(\frac{w}{2}\right)^2 - a^2}
$$

Resonance scale:

$$
a_{\mathrm{res}} = \sqrt{5}\,\sigma_{\mathrm{line}}
$$

Adaptive wavelet threshold:

$$
z_{\mathrm{tol}} =
\left(3\,\sigma_{\mathrm{noise}}\right)
\frac{\max(\mathrm{CWT}_{a_{\mathrm{res}}})}{\max(S)}
$$

Synthetic arc model:

$$
M(\lambda)=\sum_i A_i
\exp\!\left(-\frac{(\lambda-\mu_i)^2}{2\sigma_\lambda^2}\right)
$$

Blend criterion:

$$
\Delta\lambda < 3\,\sigma_\lambda
$$

Robust residual rejection:

$$
\left|r_i-\mathrm{median}(r)\right| \ge 3\cdot \mathrm{MAD}(r)
\;\Rightarrow\; \text{outlier}
$$

### 8.4 Flow

```{mermaid}
flowchart TD
    A["Arc EX + predicted WAVELA + line list"] --> B["Reference fiber selection"]
    B --> C["Wavelet landmark detection per fiber"]
    C --> D["Landmark tracking across fibers"]
    D --> E["Fiber synchronization + template construction"]
    E --> F["Synthetic model from lamp table"]
    F --> G["Cross-correlation shift path estimation"]
    G --> H["Peak refinement + robust polynomial fit"]
    H --> I["Propagate solution to all fibers"]
    I --> J["Write RED with calibrated WAVELA and SHIFTS"]
```

## 9. Stage 7: Science Path in Current Implementation

### 9.1 What is considered

- Full `reduce_object` wrapper is not yet complete.
- Reliable current approach is staged processing using implemented modules.

### 9.2 Strategy

1. Convert raw science frames.
2. Run `make_im`.
3. Run `make_ex` with the same TLM used for arcs/flats.
4. Rebin/scrunch science EX using calibrated arc RED solution.

### 9.3 Rebinning equation (current interpolation approach)

For each fiber \(f\):

$$
S^{\mathrm{out}}_f(\lambda_j) =
\mathcal{I}\!\left(S^{\mathrm{in}}_f(\lambda), \lambda_j\right)
$$

where \(\mathcal{I}\) is linear interpolation on wavelength coordinates.

### 9.4 Flow

```{mermaid}
flowchart TD
    A["Science converted FITS"] --> B["Science IM"]
    B --> C["Science EX (SUM extraction)"]
    C --> D["Apply arc RED wavelength solution"]
    D --> E["Scrunched/calibrated science output"]
```

## 10. Cross-Stage Quality Control Logic

Recommended checks:

- Conversion:
  - required header keys and orientation validity.
  - `FIBRES` extension count.
- IM:
  - expected bias/dark subtraction levels.
  - saturation fraction and variance scale sanity.
- TLM:
  - trace count and continuity vs expected fibers.
- Extraction:
  - NaN/bad fraction and fiber consistency.
- Arc calibration:
  - matched-line count, residual median/MAD/RMS.
  - wavelength monotonicity and smoothness.
- Science:
  - skyline placement and repeatability across frames/fibers.

## 11. Assumptions and Limits

Current assumptions:

- Read-noise + Poisson-like variance model is adequate for commissioning.
- Isoplane matching can be approximated by ordered 1-to-1 trace mapping.
- SUM extraction is acceptable for current use.
- Interpolation-based scrunch is acceptable for current goals.

Current limits:

- No production optimal extraction path yet.
- `reduce_object` full chain is not complete.
- Some heuristics (thresholds, blend masks, width criteria) may require setup-specific tuning.

## 12. Practical Interpretation

The current architecture is strongest as a transparent staged reduction framework:

- each intermediate product can be inspected,
- statistical logic is explicit,
- and the mathematical structure is already suitable for incremental upgrades (optimal extraction, complete object wrapper, stricter flux-conserving rebinning).