Understanding pyqz

For a set of nebular emission line fluxes and errors, pyqz measures the associated value of the oxygen abundance 12+log(O/H) and ionization parameters log(Q), given a set of MAPPINGS simulations of HII regions. The code uses flat(-ish) emission line diagnostic grids to disentangle and interpolate the values of log(Q) and 12+log(O/H).

As pyqz wraps around MAPPINGS simulations, it can provide estimates of the total abundance (Tot[O]+12) or the gas-phase abundance of the HII region (gas[O]+12). In the reminder of this document, whenever the former is used, it is understood that it is replaceable by the latter.

If you have read this doc from the start, you probably have pyqz installed on your machine by now, and managed to run the basic examples described in Running pyqz I. But before you move on to process your own data, there are a few critical elements that you cannot ignore any longer.

Warning

We’re serious here - read this page or be doomed !

A note on the pyqz syntax

The pyqz module is intimately linked to the MAPPINGS code. While both are stand alone and distinct programs, pyqz was designed to employ the same notation conventions than that of the MAPPINGS code for clarity, both from a user and programming perspective.

These conventions, designed to maximise clarity while minimizing the overall character counts, are as follows:

  • the ionization parameter is LogQ
  • the oxygen abundance is Tot[O]+12 (total) or gas[O]+12 (for the gas-phase)
  • the Balmer lines from Hydrogen are Ha, Hb, etc …
  • the main forbidden lines are marked as [OIII], [NII], [SII], [OI], etc …
  • other strong lines are tagged with their wavelength, i.e. 4363, 3726, 3729, etc …
  • for the usual strong line doublets, when the doublet line fluxes are considered together ( i.e. [OIII]5007 + 4959), a + is appended to the said emission line, e.g. [OIII]+. By convention, the single line is always the strongest within the doublet. In short, [OIII] corresponds to [OIII]5007, [OIII]+ corresponds to [OIII]5007+4959, [NII] corresponds to [NII]6584, [NII]+ corresponds to [NII]6584+6548, etc …

This syntax must be followed carefully when using pyqz, or errors will arise.

The spirit of pyqz

The pyqz module is composed of a core function: pyqz.interp_qz. This function is responsible for interpolating the MAPPINGS V grid of simulations of HII regions (using scipy.interpolate.griddata) and returns the corresponding value of z or q for a given pair of line ratios. This function is basic, in that it does not propagate errors on its own. You feed it a pair of line ratio, it returns LogQ, Tot[O]+12 or gas[O]+12, and that’s it.

The function pyqz.get_global_qz is a wrapper around pyqz.interp_qz. It is designed as a top interaction layer for the pyqz module, and can propagate errors or upper-limits on the line flux measurements. You feed it your measured line fluxes and associated errors, and it returns all the LogQ and Tot[O]+12 or gas[O]+12 estimates and associated errors.

Yep, that’s right: estimateS. What are these ?

Direct estimates

pyqz uses a well defined set of line ratio diagnostic grids (the list of which can be seen using pyqz.diagnostics.keys()) to interpolate LogQ and Tot[O]+12. Given a set of line fluxes, pyqz can therefore compute 1 estimate of LogQ and Tot[O]+12 per diagnostic diagram chosen by the user, e.g. [NII]/[SII]+;[OIII]/[SII]+. These single direct estimates (labelled with |LogQ and |Tot[O]+12 for each diagnostic diagram, e.g. [NII]/[SII]+;[OIII]/[SII]+|LogQ) are the most straightforward ones computed by pyqz.

Of course, because all line ratio diagnostic grids are constructed from the same set of MAPPINGS simulations, all these individual direct estimates ought to be consistent, so that computing their mean value is a sensible thing to do. These global direct estimates are labelled <LogQ>, <Tot[O]+12>, etc. and the associated standard deviations are labelled std(LogQ), std(Tot[O]+12), etc.

KDE estimates

As we do not live in a perfect world, some errors are usually associated with the measurement of line fluxes (sigh!). The direct estimates do not take any errors into account - the KDE estimates (KDE = Kernel Density Estimation) do.

The idea is as follows. First, a set of srs (where srs=400 is the default) random flux values (for each emission line) sampling the probability density function of each measurement is generated. Each of these srs pseudo-sets of line fluxes are fed through pyqz.interp_qz(), which returns srs random estimates of LogQ and Tot[O]+12. pyqz then uses a Kernel Density Estimation tool to reconstruct

  1. the probability density function (PDF) in the LogQ and Tot[O]+12 plane for every single diagnostic grid selected by the user, and
  2. the full probability density function in the LogQ and Tot[O]+12 plane resulting
from the combination of all srs estimates for all chosen diagnostic grids.

Python users have the ability to pickle these (individual and global) reconstructed PDFs for external use (via the KDE_save_PDFs keyword), e.g. to draw some diagnostics plots later on.

From the reconstructed probability density functions, pyqz computes the 0.61% (i.e. the \(1-{\sigma}\) contour for a log normal distribution) level contour in the LogQ vs Tot[O]+12 plane, with respect to the peak. pyqz subsequently returns as an (individual or global) KDE estimate the mean of the 0.61% contour and its associated half spatial extent along the LogQ and Tot[O]+12 directions.

These single KDE estimates are referred to (accordingly) using |LogQ{KDE} and |Tot[O]+12{KDE} for the individual diagnostic grids (e.g. [NII]/[SII]+;[OIII]/[SII]+|LogQ{KDE} with an error err([NII]/[SII]+;[OIII]/[SII]+|LogQ{KDE})). The global KDE estimates are labelled as <LogQ{KDE}> and <Tot[O]+12>, with associated errors err(LogQ{KDE}) and err(Tot[O]+12{KDE}).

At this point, things are most likely more confused than ever, and one may be wondering …

What estimates of LogQ and Tot[O]+12 should one use ?

Unfortunately, there is no definite answer to this question. If all goes well (i.e. your measurements are reliable and have reasonable errors), the global KDE estimates (<LogQ{KDE}> and <Tot[O]+12>) are the values one should use: these combine all requested diagnostic grids estimates and observational errors down to one number.

But many things can go wrong: one (or more) of your line fluxes might be unknowingly off, or perhaps the choice of MAPPINGS simulations is not quite appropriate for the HII regions one may be working with (in terms of pressure, abundances, structure, depletion, etc.), or perhaps real HII regions may simply not behave quite like MAPPINGS is predicting (sigh!).

In all those cases, one must use extreme caution with the global KDE estimates. A lot of information lies in the individual estimates of LogQ and Tot[O]+12, and especially in bad cases.

So, how does one identify the good cases from the bad cases ?

Comparing the averaged direct estimates (e.g. <LogQ>) with the global KDE estimates (e.g. <LogQ{KDE}>) is a good way to spot problem. For each set of line ratios fed to pyqz.get_global_qz(), the code checks how similar those estimates are, and issues a flag if they are not. The possible flag values are as follows:

  • 9: the PDF is multipeaked. This indicates a likely mismatch between some of the

    diagnostic grids in their estimates of LogQ and Tot[O]+12.

  • 8: the observed set of line fluxes is located outside the valid region of one or

    more of the chosen diagnostic grids.

  • -1: no KDE was computed (either srs was set to 0, or a line flux errors was

    set to 0).

  • 1 to 4: these flags are raised when the averaged direct estimates are offset by

    more than flag_level times their standard deviations, e.g.:

    • 1 \({\leftrightarrow}\) \({|}\) <LogQ> - <LogQ{KDE}> \({|}\) \({<}\) std(LogQ) \({\cdot}\) flag_level
    • 2 \({\leftrightarrow}\) \({|}\) <LogQ> - <LogQ{KDE}> \({|}\) \({<}\) err(LogQ{KDE}) \({\cdot}\) flag_level
    • 3 \({\leftrightarrow}\) \({|}\) <Tot[O]+12> - <Tot[O]+12{KDE}> \({|}\) \({<}\) std(Tot[O]+12) \({\cdot}\) flag_level
    • 4 \({\leftrightarrow}\) \({|}\) <Tot[O]+12> - <Tot[O]+12{KDE}> \({|}\) \({<}\) err(Tot[O]+12{KDE}) \({\cdot}\) flag_level

Looking at the flags can be helpful in identifying potentially problematic sets of line fluxes and (maybe?) the cause. Is one diagnostic grid estimates consistently off ? Then maybe some errors in one of the associated line ratio measurements is not properly accounted for.

In the end, it remains to the user to decide which estimate(s) to use. The final choice will significantly depend on the intended usage, the importance given to the LogQ and Tot[O]+12 estimates in a subsequent analysis, and the ability to construct a precise model of the said HII region in the first place.

It cannot be stressed enough that choosing appropriate HII regions parameters (in terms of pressure, spatial structure, abundances, etc.) for the MAPPINGS simulations can and will influence the final estimates of “LogQ“ and “Tot[O]+12“, both single and global ones.

If you are using pyqz, chances are that you do not possess enough information to define these elements with certainty, and simply use the default diagnostic grids provided. This is fine. But in case of estimates mismatch, one must then keep this fact in mind.