Frequently asked questions

0. Please see this paper: Leito, I., Helm, I. Metrology in chemistry: some questions and answers. J.Chem.Metrol. 2020, 14:2, 83-87 for a number of questions and answers relevant to practical chemical analysis situations.

1. How many decimal places should we leave after comma when presenting results?

The number of decimals after the comma depends on the order of magnitude of the result and can be very different. It is more appropriate to ask, how many significant digits should be in the uncertainty estimate. This is explained in the video in section 4.5. The number of decimals according to that video is OK for the results, unless there are specific instructions given how many decimals after the point should be presented. When presenting result together with its uncertainty then the number of decimals in the result and in uncertainty must be the same.

2. If we need to find standard deviation of those wihtin-lab reproducibility measurments, then we need certainly use the pooled one? We can not take the simpliest standard deviation, which is calculated by standard deviation formula?

The within-lab reproducibility standard deviation s_RW characterises how well can the measurement procedure reproduce the same results on different days with the same sample. If the sample is not the same (as in this self-test) then if you just calculate the standard deviation of the results then the obtianed standard deviation includes both the reproducibility of the procedure and also the difference between the samples. The difference between the samples is in the case of this self-test much larger than the within-lab reproducibility. So, if you simply calculate the standard deviation over all the results then you will not obtain within-lab reproducibility but rather the variability of analyte concentrations in samples, whith a (small) within-lab reproducibility component added.

3. In estimation of uncertainty via the modelling approach: When when we can use the Kragten approach and when we just use the combination of uncertainties?

In principle, you can always use the Kragten approach. However, if the relative uncertainties of the input quantities are large, and especially if such a quantity happens to be in the denominator, then the uncertainty found with the Kragten approach can differ from that found using equation 4.11. This is because the Kragten approach is an approximative approach.

4. Exactly what is human factor? I thought that it may be for example person’s psychological conditions and personal experience and so on? This will definitely influence measurement, but is this taken into account then?

The "human factor" is not a strict term. It collectively refers to different sources of uncertainty that are due to the person performing the analysis. These uncertainty sources can either cause random variation of the results or systematic shift (bias). In the table below are some examples, what uncertainty sources can be caused by the "human factor". In correct measurement uncertainty estimation the “human factor” will be automatically taken into account if the respective uncertainty sources are taken into account.

Uncertainty source	Type	Taken into account by
Variability of filling a volumetric flask to the mark, variability of filling the pipette to the mark	Random	Repeatability of filling the flask/pipetting
Systematically titrating until indicator is very strongly coloured	Systematic (causes systematically higher titration results)	Uncertainty of the titration end-point determination
Systematically grinding the sample for shorter time than should be done, leading to less dispersed sample and lowered recovery	Systematic	Uncertainty due to sample preparation (uncertainty due to recovery)

5. Can we report as V = (10.006 ± 0.016) mL at 95 % CL at coverage factor of 2?

We use in this course the conventional rounding rules for uncertainty. Therefore uncertainty ±0.0154 ml is rounded to ±0.015 ml. Sometimes it is recommended to round uncertainties only upwards (leading in this case to ±0.016 ml). However, in the graded test quizes please use the conventional rounding rules.

6. How can I attach my photo into Moolde profile?

This is done from your profile in Moodle. Click on your name on the right on the status bar, then click "Profile", then "Edit profile".

7. In case of a simple titration, if replicate titrations are carried out then in the uncertainty of pipetting the uncertainty contribution of repeatability is omitted. Why we ignore the repeatability effect in this case when calculating the result of the titration?

In this case we have results of repeated titrations. Their scatter is caused among other effects also by pipetting repeatability. I.e. one of the reasons why different amounts of titrant were consumed in replicate titrations is the fact that the amount of pipetted acidic liquid slightly differed from titration to titration. For this reason, the repeatability of consumed titrant volume automatically takes into account also pipetting repeatability. If we would take it into account in the uncertainty of pipetted volume, we would account for it two times.

8. Can systematic effects really count as uncertainty sources? The GUM says that the recognized systematic effects should be corrected for and the uncertainties of the resulting corrections should be taken into account.

Indeed, systematic effects (sources of bias) can often be reduced significantly by determining corrections and applying them. The corrections are never perfect and have uncertainties themselves. However, the resulting uncertainties from corrections will be mostly caused by different random effects.

However, the fact that systematic effects influence measurement results, automatically means that they cause uncertainty and are thus uncertainty sources. Furthermore, although the GUM (https://www.bipm.org/en/publications/guides/gum.html) says that known systematic effects should preferably be corrected, in many cases – in particular in chemistry and especially at routine lab level – correcting for the systematic effects is either impossible to do reliably or is not practical, as it would make the measurement much more expensive. It also is often unclear whether a systematic effect exists at all – in this course we often speak about possible systematic effects. As a conclusion, it is often more practical to include the possible systematic effects as additional uncertainty components, rather than try to correct for all of them. Probably the best practical guide on this issue is the Eurachem leaflet Treatment of observed bias (https://www.eurachem.org/index.php/publications/leaflets/bias-trt-01).

9. What is the difference between confidence interval and measurement uncertainty?

Measurement uncertainty defines a range (also called interval), around the measured value where the true value of the measurand lies in with some predefined probability. This interval is called coverage interval and measurement uncertainty is (usually) its half-width. Coverage interval has to take into account all possible effects that cause uncertainty, i.e. both to random and systematic effects.

Confidence interval is somewhat similar to coverage interval. It typically refers to some statistical interval estimate. It expresses the level of confidence that the true value of a certain statistical parameter resides within the interval. A typical example is the confidence interval of a mean value found from a limited number of replicates, which is calculated from the standard deviation of the mean and the respective Student coefficient. The main difference is that we speak only of the mean value, not the true value, and only random effects are accounted for – i.e. all replicate measurements can be biased but the confidence interval does not account for that in any way.

10. What is the basis for the rule (explained in Section 4.5) that when the first significant digit of uncertainty is 1 .. 4 then it is presented with 2 significant digits and when it is 5 .. 9 then it is presented with one significant digit?

The rationale behind this rule is that the uncertainty should change by less than 10%, relative, when rounding it. If uncertainty would be e.g. 0.15 g then rounding it to 0.2 would change it by 33%. At the same time if it is e.g. 0.55 g then by rounding it to 0.6 g would change it by 9% relative.

11. The true value lies within the uncertainty range with some probability. Therefore, is it OK if it is sometimes outside that range?

The situation that the true value is outside the uncertainty range is not impossible, but its probability is low. If it is strongly outside (i.e. far from the uncertainty range) or if it is outside for several measurement results obtained with the same method during a short period then the most probable reason is underestimated uncertainty.

Of course, we (almost) never know the true value, so instead of true values we usually operate with their highly reliable estimates, such as e.g. certified values of certified reference materials.

12. Why do we used two-tailed t values in calculating expanded uncertainty, not one-tailed values?

One-tailed t values would be justified if we would know for sure that the true value is smaller or larger than our measured value. This is usually not the case and thus it is not justified to use one-tailed values. One-tailed values are also smaller than two-tailed values (for example: ca 1.7 vs ca 2.0, in the case of large number of degrees of freedom and 95% coverage probability), so that the use of one-tailed t values would artificially decrease the uncertainty estimate, possibly leading to underestimated uncertainty.

13. When converting from rectangular or triangular distribution to the Normal distribution, where do the rules of dividing by SQRT(3) and SQRT(6) come from?

This is clearly beyond the scope of our course. This derivation can be found in specialised books, e.g.: Rein Laaneots, olev Mathiesen An Introduction to Metrology Tallinn University of Technology press, Tallinn, 2006.

Unfortunately I do not have a freely available source in English. There is one in Estonian: http://tera.chem.ut.ee/~ivo/metro/Room/II_vihik.pdf The derivation is on pages 12-13. You will probably understand the mathematical equations and you can try to translate the text with Google translator.

14. Please explain regarding triangular and rectangular distribution function with some real laboratory examples. What is the concept that this is triangular and this is rectangular distribution?

There are in broad terms two types of situation where rectangular or triangular distribution are used:

--1-- When the quantity under question is indeed distributed according to these distributions. In chemistry this occurs first of all in the case of rounding of a digital reading. Example: if a thermometer shows 22 °C then, because of rounding, the value could be anywhere between 21.5 and 22.5 °C. Thus, rounding uncertainty in this case is ±0.5 °C. If rounding uncertainty is the dominant uncertainty component, then we could say that this temperature is distributed according to rectangular distribution. 0.5 °C is half of the last digit of the digital reading. And this is a general rule: rounding uncertainty of a digital reading is "± half of the last digit". In order to convert this uncertainty estimate to standard uncertainty it has to be divided by square root of 3.
It can be shown that if two rectangularly distributed quantities (with equal uncertainty) are added or subtracted then the resulting quantity is distributed according to triangular distribution.
These were examples of situations when these distribution functions are “real”.

--2-- It is, however, much more common that these distribution functions are “assumed” or “postulated” (see Section 3.5). This need comes whenever you need to use some uncertainty estimate that is presented in the form “± X” and we have no knowledge of the underlying distribution of that quantity. In such a case we usually recommend to assume rectangular distribution, as it is safer (lower probability of underestimating uncertainty) than assuming triangular distribution. Examples can be: calibration uncertainties of volumetric ware, uncertainties of purchased standard solution concentrations, uncertainty due to possible interferents (see Section 9.5), uncertainties of educated guesses/expert opinions, uncertainties of various systematic effects of measurement instruments, etc. The course materials contain quite some examples on the use of these distributions, as well as self-tests. Please see sections 3.5, 4.1, 9.5 and self-tests 3.5, 9 A, 9 B.

15. Does failing at even one graded test quiz means total failure (eventhough the rest of the quizzes are successful) and the participant do not receive the digital certificate of completion?

Exactly, failing one graded test means failing the whole course and not getting the certificate of completion for this edition of the course. But of course, you are welcome to attend again next year.

Failing one test means, that you have not acquired the whole knowledge that you should have acquired form this course and the learning outcomes are not fulfilled. For an analogous example, should you get the driver license, when you know really well how to change the gear, but steering the wheel would be an obstacle for you? Probably not – for successful driving you need to be able to handle all aspects of controlling the car. It is the same with uncertainty.

Therefore, we strongly suggest not to waste attempts. For this, before starting a new attempt, please try with the last dataset to obtain the answer provided by the system and find your mistake.

16. Is it always preferable to use your own calibration data of volumetric instruments?

This depends on how high accuracy of volumetric measurement you need.

If high accuracy of volumetric measurements is needed, then it is more correct to calibrate it by herself/himself. Why? Because the uncertainty of the calibration consists to a large extent of the so-called “human factor”. So calibration and working manners should be the same. If more people use the same glassware, then everyone should calibrate it for herself/himself, e.g. person X should not use the pipette with the calibration data, obtained by person Y.

If high accuracy of volumetric measurement is not needed (i.e. if in the used method much more uncertainty comes from other sources than volumetric measurement) then usually the uncertainties assigned to glassware by manufacturers are sufficiently low.

17. Is "Incomplete sample matrix decomposition during digestion" a systematic or a random effect?

“Incomplete sample matrix decomposition during digestion” causes a systematic effect. Your result will always be somewhat lower than it should be, because you effectively lose some analyte.

But it is important to add, that the “extent of incompleteness” will almost certainly vary from sample to sample. So, you always get a lower result (and there is a systematic effect), but sometimes it is “more lower” sometimes “less lower”. This means that there is additionally a random effect “sitting on top” of the systematic effect. This is actually quite common that with analyte losses by decomposition or incomplete extraction, etc or analyte addition by contamination and other similar systematic effects there are accompanying (and often quite large) random effects.

18.How can we check the validity of our uncertainty estimates? Can we use s_RW < u_c or PT z-score < 2 as criteria?

The best check for the validity of your uncertainty estimate is to compare with an independent result obtained for the same sample. Very common is, e.g. analyzing a CRM and then comparing your result with the reference value of the CRM, e.g. using the zeta score as described in Section 12. Also, if you participate in a PT then comparing your result with the PT consensus value is useful. In the case of a PT the consensus values usually do not have uncertainty estimates. Then a simple, although not 100% rigorous, approach is to see, whether the consensus value is within the k = 2 uncertainty range of your result.

Concerning the two ways proposed by you: Just the fact that s_RW < u_c does not say that u_c has been correctly estimated. It can still be underestimated (or overestimated). And z-scores of PTs do not say anything about the validity of your uncertainty estimate. But of course z-scores are still useful for getting an idea, how similar your measurement result is compared to other laboratories.

19. I understand the different sources of uncertainty well in the example, but it strikes me that the standard deviation value is used to calculate the repeatability uncertainty and then re-taken into account in calculating the calibration uncertainty, it is an over estimate of uncertainty?

Repeatability indeed influences pipetting two times: once when the pipette is calibrated and the second time when actual pipetting is done. So, indeed, it has to be accounted for in both cases.

However, as you could see, repeatability is taken into account differently. In the case of the actual pipetting you take it into account as the standard deviation of an individual measurement. In the case of calibration – as standard deviation of the mean. The more individual measurements have been done for a pipette calibration, the more reliable is the correction value. Therefore, also the uncertainty of calibration is smaller: we use the standard deviation of the mean for calibration uncertainty and standard deviation of the mean is dependent on the number of individual measurements. Moreover, calibration uncertainty given by the manufacturer is usually much higher: in our example in Section 4.6 it is approximately 10 times higher than the one we have obtained.

20. I cannot figure out how the standard deviation of b₁ and b₀ are calculated. the solved excel file in section 9.7 has the same formula for the s(b₁) and s(b₀) in all of the cells. "=LINEST(C7:C11,B7:B11,1,1)"

The calculation in the original file is carried out with the LINEST function. It is a peculiar function in that it returns a matrix (i.e. a small table of values), not a single value.

Its usage is quite well described in the Excel help. Let me give here just the main steps:

(1) Mark the matrix area – two columns, three rows.

(2) Immediately start typing the function, a la “=LINEST(C7:C11,B7:B11,1,1)” (without quotation marks). Instead of commas “,” you may need to use dot-commas “;” as separators, depending on your language settings. C7:C11 stand for analytical signals, B7:B11 stand for concentrations. The “1” and “1” are for not forcing intercept to zero and giving full set of data about the regression.

(3) While typing, the typed text will go to just one of the marked 6 cells and this is OK. It is also not important, to which of them.

(4) Press CTRL-SHIFT-ENTER. (Not just ENTER!)

(5) The sample file uncertainty_of_photometric_nh4_determination_kragten_initial.xls in Section 9.7 shows, which parameters are in which cells.

Now you have an “automatic” function which is linked to the calibration data: every time you change something in the calibration data, all regression parameters are immediately recalculated.

21. Does the density of most liquids decrease with temperature? From the context, would the "density" parameter refer to the amount (in mass) of the liquid or to its volume that affects the pipette's delivered volume?

Yes, in case of most liquids, the density decreases when the temperature increases. The main idea behind “uncertainty of volume due to temperature” is that in almost all practical cases in analytical chemistry liquid volume is defined as volume at 20°C. I.e., if a 10.00 ml pipette is calibrated at 20 °C, then when using it, for example, at 25 °C, the pipetted volume of liquid at 25 °C is indeed 10.00 ml (in the range of these temperatures the volume of glassware changes so little that it can be neglected), but the amount of liquid (in terms of mass or number of molecules) is smaller compared to the amount pipetted at 20 °C, although the pipetted volume is the same at both temperatures. So that if the volume of liquid that was 10.00 ml at 20 °C, would be cooled to 20 °C, its volume would be 9.99 ml.
Since temperature differences from 20 °C are usually small, the changes in density are not very large and therefore, the bias is also relatively small in most situations. Therefore, in most cases we do not have to correct the volume, but we take this small effect into account as a measurement uncertainty component.

22. It is still unclear for me, when to use standard deviation of the mean and when to use standard deviation of an individual value in uncertainty evaluation. For example if I have these values in an experiment: 3.2, 3.6, 3.4, 3.0, 3.9 and I calculate the standard deviation and I get a value of 0.349, can I report any individual value as 3.4 and standard deviation of 0.349?

This question is explained in Section 3.4, but let me try to give some additional explanations here.

The general rule: whenever it is feasible to make replicate measurements of the quantity you are measuring, please do it. And in such cases for the quantity value you should use the mean value and for estimating repeatability you should use the standard deviation of the mean.

Thus, in the example that you are giving, you should certainly report the mean value, not a random individual value and as repeatability (assuming you did the measurements on the same day) you should use the standard deviation of the mean.

Now, when do you use the standard deviation of an individual value? This is done in such cases, when for your concrete measurement with your concrete object you cannot do replicates (or it is not feasible or reasonable). Therefore, you do your measurement just once. And the repeatability of your measurement you estimate from some other experiment that can be repeated.

Two examples:

--- Pipetting. if you need to pipet 10 ml of some solution during your analysis then you cannot do averaging: you cannot pipet 5 times and then somehow “average” the volume. Instead you do pipetting in the course of that analysis just once and you estimate repeatability separately (e.g. by pipetting the same amount of water numerous times). In this case, since you pipetted just once in your analysis, you will use standard deviation of an individual result.

--- Overall repeatability or within-lab reproducibility of an analysis: If you typically analyze your routine samples without replicates then you can estimate the repeatability (or within-lab reproducibility) separately with some control sample which you analyze several times. If that control sample is sufficiently similar to your routine samples then the obtained standard deviation can also be applied to your routine samples (this approach is, for example, used in the Nordtest uncertainty approach). Since you analyze your routine samples just once, you should use standard deviation of a single analysis, not standard deviation of the mean for quantifying repeatability (or within-lab reproducibility).

23. How many repetitions we should perform for estimating uncertainty due to the non-ideal repeatability?

It depends on your setup, needs and possibilities. When using some standard procedure, the number of replicates required may be given in the standard. If your aim is to achieve very low uncertainty level using the mean value and (importantly) if uncertainty due to repeatability is an important uncertainty source, then the more replicates you can do, the better result you obtain.

However, in practice, we cannot usually perform many replicate measurements for several practical reasons: we do not have a sufficient amount of sample, we are limited time and/or finances etc. Therefore, as soon as you are able to calculate the standard deviation, e.g. already with just 3 measurements, you can have the first rough estimate the repeatability. But do not stop there! You should collect more data, e.g. on similar samples. And you can pool the data using the pooled standard deviation approach.

24. What is the uncertainty of a measurement result that is an average of two results of which both have uncertainties?

This is a more difficult issue than one might think. What is presented here is a simplistic and conservative approach.

If the individual results are X₁ and X₂ with combined standard uncertainties u_c(X₁) and u_c(X₂) then, unless the uncertainties are not too different, the value to be presented as the final result can be the simple average of the values X₁ and X₂: X = (X₁ + X₂) / 2. If the uncertainties are very different then weighted average should be used whereby 1/u_c(X₁)² and 1/u_c(X₂)²are used as weights.

The combined standard uncertainty u_c(X) can be conservatively estimated as follows: Take the highest of the values X₁ and X₂ and add to it its combined standard uncertainty. What you get is the upper uncertainty limit L_U. Then take the lowest of the values and subtract from it its combined standard uncertainty. This way you the lower uncertainty limit L_L. Calculate the distances of the limits from X. The larger of the two distances can be used as combined standard uncertainty estimate.

Example (data with arbitrary units): X₁ = 154; u_c(X₁) = 7; X₂ = 160; u_c(X₂) = 9. In this case X = 157 and u_c(X) = 12.

25. For intermediate precision, if we happen to have an anomalous value due to gross error, is it safe to assume to just omit the value? Or it's a necessity to perform statistical treatment to justify?

Leaving data points out is a very tricky thing. As a very general recommendation: avoid leaving data points out on statistical basis, if you can. However, leave them out if a physical/chemical reason is found. Just some examples of what could be such reasons: there was a precipitate in the derivatization reagent solution, which has never been there; the slope of the calibration graph on that day was lower than usual; retention time of your analyte differed from what it has usually been.

26. It has been said that for determining within-lab reproducibility, a longer timeframe with fewer replicates is typically more preferred than shorter time with more replicates. Since it's opinion-based, what could be the argument when the other scenario is preferred?

Possible example: 4 data points over 6 months as opposed to 14 data points over 4 months. In this case I would probably prefer the latter.

27. For within-lab reproducibility evaluation, it was emphasised that it should be the same sample (stable, homogenous, and available in sufficient amount). But given the case that we don't have sufficiently large samples and can only do few replicates, can we use pooled SD for evaluation of reproducibility?

Indeed, both in the case of repeatability and within-lab reproducibility, for the calculation of a standard deviation it has to be the same sample. But pooling of standard deviations can be done also in the case if the samples are not the same but are similar.

28. When you are starting to implement methods, you don't have the long term data set to produce a viable s_RW analysis. Let’s say your laboratory was hired to do a new analysis, and you'd have a strict time frame to develop and implement this method. You don't have neither routine samples nor old data, because this is a new method. However, you are required to present uncertainty calculations to your costumer in order to start analysing samples. But you need samples and data to calculate the uncertainty using the Nordtest approach. How to get out of this vicious circle in which you need the uncertainty to start the analysis and need the analysis to calculate the uncertainty?

This is a typical “Start with little but do not stop there!” situation in case of single lab validation approach (see also I. Leito, I. Helm J. Chem. Metrol. 2020, 14:2, 83-87). So, in the very beginning, just one or two weeks of data can be used. It is not good, but is much better than nothing. And as time goes, you get more data and more reliable uncertainty estimates.

29. Aren't we underestimating the actual purity value when assuming the middle of the range as most probable value for purity? If a minimum is given, shouldn't we assume the probability distribution leans towards this minimum instead of probability distribution being equally spread between this value and 100%? I'm thinking that, buisness wise, providers may elect to keep their more pure lots to sell as higher purity grade.

Indeed, if we do not have any other information than, say, “at least 98% purity” then we can only assume what the distribution is and where the actual purity can be. Rectangular distribution covering the whole interval from 98% to 100% is quite conservative/safe.

Moreover, having spoken to a person familiar with chemical industry I learned that producers typically want “to play it safe”, i.e. they will declare minimum purity as being such that they can safely achieve. This means that a chemical declared as having at least 98% purity can in fact often be 99% or more. But in some batches, it is below 99% and therefore, in order to be safe and avoid any disputes or accusations, they declare “at least 98%”.

In this course we recommend that “at least 98% purity” should be interpreted as (99 ± 1)% and assuming rectangular distribution. This means that the standard uncertainty of purity is 0.58% and the k = 2 expanded uncertainty is 1.15%. This means that the uncertainty range with roughly 95% coverage extends down to 97.85%. I.e. the low-probability situation where purity is slightly below 98% is in fact also covered.

However, importantly, all of the above holds if there is no additional information. Wherever you have more information, e.g. that the actual purity is around 98%, you are welcome to use different estimates for the purity and its uncertainty.

30. I don't understand how the degrees of freedom are found for different types of uncertainty estimates.

In the case of the B-type uncertainty estimates the formal number of degrees of freedom is infinity. This applies also to those cases where we do not have clear knowledge about the exact status of uncertainty and assume rectangular distribution.
It is not possible to do calculations with infinity. Therefore, people usually pick some number that is large in the context on numbers of replicate measurements. 30 and 50 are quite common and are both OK. You can also pick 37, 48, 61 – all are OK. Why people typically do not pick e.g. 100 000 or billions? By picking a “realistic” number (in terms of number of replicate measurements) we introduce a small probability into the calculation of expanded uncertainty that in some rare cases the true value can be slightly outside of the interval of the rectangular distribution.

In the case of the A-type uncertainty estimates the generalized way of interpreting the number of degrees of freedom (df) is: df = n – m. Here n is the number of parallel measurements and m is the number of parameters that are obtained from those measurements by data analysis.
If you do e.g. a number of titrations and calculate the mean value then m = 1, because there is only one parameter value that you are getting: the mean value. If you do linear regression without forcing intercept to zero then m = 2, because you are getting the values of two parameters: the slope and intercept.

31. In the lecture, it was explained, that RMS_bias is the average bias. However, in the equation 10.3 it is calculated by square root of the squared bias divided by n (where n is the number of bias determinations carried out) and therefore is always positive. Shouldn't the average bias be found by SUM(bias)/n so that it can be positive or negative?

Indeed, RMS_bias is the average bias. However, the word average here does not mean arithmetic mean (which is expressed by SUM(bias)/n) but root mean square (RMS, also known as Quadratic mean). There are also numerous other means (see e.g. https://en.wikipedia.org/wiki/Mean).
Mathematically RMS means that we calculate the arithmetic mean of the squared values and then take square root.

There are three reasons why in the case of averaging biases we use RMS and not arithmetic mean:

(1) The arithmetic mean of bias can be positive or negative. It also has the important property that positive and negative values cancel each other. Thus, if we have determined two (relative) bias values as -10% and 10% then the arithmetic mean is 0%. At the same their RMS is 10%. Obviously 0% in this case is not an adequate estimate of uncertainty due to possible bias, as our real sample might also have a positive or a negative bias.

(2) RMS amplifies the influence of larger (in absolute terms) values, thereby making the RMS_bias a more conservative estimate of uncertainty of possible bias, as opposed to just arithmetic mean of biases. As an example, let us assume that two bias determinations gave 2% and 8%. Their arithmetic mean is 5% but RMS is 5.8%.

(3) Finally, and most importantly, there is a fundamental mathematical reason, why we almost never add standard deviations, uncertainties, and other similar parameters but instead almost always make all calculations with their squares and then take square root. If you look at the equations used in this course then you see that this is a pervasive situation. The reason is that standard deviations are in mathematical terms not additive (i.e. you are not allowed to add them) but their squares (statisticians prefer calling them variances) are additive. Thus, from the fundamental mathematical standpoint just adding uncertainties, repeatabilities, biases, etc is incorrect.

FIX LINKS