Andreas Luedeke edited sectionPROPOSED_SECO.tex  over 8 years ago

Commit id: 2d4aaa0404a0c9d119484c4d50bbd7a16ace6669

deletions | additions      

       

Failure data for these modes is rarely published,   and therefore a common metrics that fits all facilities is difficult to define.  Therefore we give some examples of possible definitions and  would consider it useful if facilities would publish statistics data  for these failure modes.\subsection{Low-beam-lifetime}   Facilities in top-up can keep the beam current constant even with a low beam lifetime.   But this will cause an increased frequency of injections and therefore more distortions and background radiation for the experiments.  The limit for this failure mode does depend on the facility and the specific operation mode.   We propose to define a minimal lifetime $\tau_{\hbox{low}}$ for each user operation mode of a facility and  record whenever the lifetime is below that limit for more than a minute.   {\em ALBA} has a nominal beam lifetime that is given by the combination of both the filling pattern and the RF voltage.   The typical lifetime at 100$\,$mA is 22 hours.   ALBA operates with 6 RF cavities each fed with 2 IOTs and a typical ``low-beam-lifetime'' event is the trip of one IOT.   Since it is the trip of a sub-system, the operator will record the event as a beam incidence with the ``low-beam-lifetime''   label on the consequences to the beam field.   Normally the operator will recover the IOT and the ``low-beam-lifetime'' event will be closed once the nominal lifetime is recovered.   Low lifetime events are recorded when the lifetime is below 18 hours.   If the lifetime reaches below 10 h top-up is stopped.  {\em BESSY II} states excessive rates for the beam losses if the lifetime at 300\,mA drops below five hours.   Then the radiation protection interlock prevents injections and enforces "decaying beam".   As a failure this mode is covered by the ``Low-beam-current'' events.  {\em Elettra} has a third harmonic superconducting cavity (S3HC) installed that contributes to a beam lifetime at 2$\,$GeV that corresponds to 3.5 times the theoretical lifetime for this storage ring.   In normal operation at 2$\,$GeV the typical lifetime is about 23 hours.   At 2.4$\,$GeV the S3HC doesn't affect the beam lifetime so much,   because the beam current is only 160$\,$mA with respect to the 310$\,$mA of 2$\,$GeV.   27 hours is the nominal lifetime at 2.4$\,$GeV.   In agreement with the users, an injection period higher than 1 minute is not tolerated by the experiments,   so it causes a downtime due to low beam lifetime.   At the 310$\,$mA of 2$\,$GeV this is reached below 10 hour of beam lifetime, at 160$\,$mA this corresponds to a beam lifetime below 5.3 hours.  But since this causes a downtime, Elettra does not record ``low beam lifetime'' as a failure mode.  {\em LNLS-UVX} has a typical beam lifetime with a low vertical coupling at initial current of about 14 hours.   As the current decays the lifetime increases.   In the UVX ``low-beam-lifetime'' events are usually triggered by the trip of a power supply or by a vacuum problem and usually leads to a ``no-beam'' event.   The beam is dumped so that the source of the problem can be identified and corrected.   Micro-drops in the stored current that usually lead to a reduced lifetime are sometimes observed during short periods of time.   These events are not accounted for as ``low-beam-lifetime'' events but as machine faults and are not considered for reliability calculations.   {\em PETRA III} has a typical beam lifetime of about 12 hours in the continuous mode;   due to strong Touschek scattering the lifetime is only about 1.5 hours in the 40 bunch timing mode.   ``Low-beam-lifetime'' events are not recorded at PETRA III and low lifetime is not used as a fault criteria.  {\em SPring-8} beam lifetime varies with the operation mode and the gap condition of the insertion devices (ID's) from 15 hours to 50.   Then the injection interval changes from 20 seconds to 1 minute depending on the beam lifetime.   Beam lifetime is not considered to be very important in normal user operation.   Although the allowed electron losses are limited by the radiation safety:  if the beam lifetime is too low, for example when one of the four RF stations goes downs,   SPring-8 would increase the lifetime by sacrificing the beam performances,  i.e. by increasing the vertical beam size or decreasing the stored current.   Such ``low-beam-lifetime'' events caused by the drop-out of an RF station   formerly occured at the rate of once per a few years.,  until a new set-point had been found to increase the RF power of the remaining three stations in those cases.  {\em SLS} has in both operation modes a typical beam lifetime of about 8 hours.   Depending on the vacuum and the selected coupling this lifetime may varies in practice between 6 and 10 hours during normal operation.   A low beam lifetime leads to more frequent injections from the top-up.   We record a ``low-beam-lifetime'' event when the beam lifetime stays below 4.5 hours for longer than 5 minutes.   The event stops as soon as the lifetime is above 4.5 hours again for longer than  1 minute.   Table~\ref{tab:lifetime-limits} shows a comparison of the ``low-beam-lifetime'' limits.  \begin{table}  \centering  \caption{\label{tab:lifetime-limits} Low-beam-lifetime limits}  \scriptsize  \begin{ruledtabular}  \begin{tabular}{lcrrl}  \textbf{Facility}&\textbf{Mode}&\textbf{$\tau_{\hbox{normal}}$}  &\textbf{$\tau_{\hbox{low}}$}  &\textbf{Remark}\\  & & (h) & (h) & \\\hline  ALBA & decay & 22 & 18 & \\  ALBA & top-up & 22 & 18 & \\  BESSY II & MB & 7 & 5 & stops top-up \\  BESSY II & SB & 2 & 1 & stops top-up \\  Elettra & 2.0$\,$GeV & 23 & 10 & \\  Elettra & 2.4$\,$GeV & 27 & 10 & \\  LNLS-UVX & decay & \textgreater 14 & not defined & \\  PETRA III & continuous & 12 & not defined & \\  PETRA III & timing & 1.5 & not defined & \\  SPring-8 & all modes &15 \dots 50 & 10 & \\  SLS & top-up & 8 & 4.5 & yearly evaluation \\  \end{tabular}  \end{ruledtabular}  \end{table}  \subsection{Distorted-orbit}  A stable orbit is a prerequisite for most experiments.   A possible failure mode definition could be that to record  orbit deviations above 20\% of the beam size start such an event. size.  But this would require a different limit for each beam position monitor.   We rather suggest a simpler definition: to require the RMS orbit distortion to stay below a nominal, facility and operation mode dependent value $dx_{\hbox{nom}}$, $dy_{\hbox{nom}}$.  {\em ALBA} provides to the beamlines the RMS orbit distortion in both, the horizontal and the vertical plane.   The beamlines are informed whenever this deviation is larger than usual.   The beam position at the photon beam position monitorsin the FEs  is also used as a figure of merit for the orbit and beamlines are informed if the beam position at their FE source point  is different from nominal by more than 20\% in any plane. Orbit feedback outage on the SOFB are recorded by the operator and can be cross checked with a log-file generated by the slow orbit feedback, which registers any interruption of the SOFB. Operators generate a beam incidence entry on the logbook for each SOFB interruption.   If the problem persists it will usually generate a ``no-beam'' event because the beam will be dumped to solve the problem.  {\em BESSY II} covers all ``orbit-out-of-control'' situations with as  a ``Distorted-orbit'' event: failure:  if none of the corrections FOFB/SOFB is usable/active (Orbit-feedback-outage) (orbit-feedback-outage)  of if the RMS deviation from the "Golden Orbit" nominal orbit  exceeds 0.08$\,$mm. Typical RMS orbit deviation range between 0.00 - 0.01$\,$mm (installation of Golden Orbit, ``Golden Orbit'',  based on new BBA measurement) and 0.04 - 0.05$\,$mm year(s) later. The Orbit-feedback-outage event Orbit-feedback outages are recorded if they  lasts longer than 60 sec. Succeeding events failures  are counted as one if the feedback runs for less than 10 min. {\em Elettra} has currently a long term orbit stability (2 to 5 days) of $\pm 5\,\mu m$ maximum  

Orbit feedback outages are recorded if they are longer than 10 sec.  {\em LNLS-UVX} warns the beamlines whenever the records  orbit distortion exceeds distortions that exceed  10\% of the beam size in any plane, measured relative to a BBA beam-based-alignment  defined golden orbit. A fault event is then registered in the operation log. ``golden orbit''.  These events failures  are quite rare but may lead to a ``no-beam'' event to correct the problem. rare.  The limits are Vertical vertical  = 8$\,\mu m$ and Horizontal horizontal  = 30$\,\mu m$. {\em PETRA III} has a fast orbit feedback to keep the orbit stable.   If this feedback fails and certain limits for the orbit deviation are exceeded, the beam is dumped automatically to protect the machine. At the insertion devices the limits for a beam dump are 250$\,\mu m$ deviation from the Golden Orbit nominal orbit  in the vertical plane and or  500$\,\mu m$ in the horizontal plane. A warning for the operator is issued if the beam position or the angle of the beam deviates from the Golden Orbit nominal orbit  by more than an individually given limit, which is of the order of 5$\,\mu m$ in the vertical plane and 20$\,\mu m$ in the horizontal plane. All Orbit orbit  related beam dumps have to be investigated to find the root cause of the orbit deviation (i.e. a faulty magnet power supply or a fault in the orbit feedback).{\em SPring-8} stabilises closed-orbit-deviations (COD) in sub-micron by the orbit feedback.   There are abrupt changes of COD by e.g. the gap change of ID, which are immediately corrected by the feedback.   The period of the COD correction is 1 Hz, and the data of COD are archived on database at the rate of once per ten times.  The most sensitive user to {\em SPring-8} stabilises closed-orbit-deviations in sub-micron by  the photon axis change requires orbit feedback once per second.  There are abrupt changes of orbit by e.g.  the variation under 1 micro radian.   At gap change of an insertion device,   which are immediately corrected by  the SPring-8 storage ring this corresponds to 10$\,\mu m$ for x, and 5$\,\mu m$ for y, respectively. feedback.  The most sensitive user to the photon axis change desire variations under 1 micro radian.   At the SPring-8 storage ring this corresponds to 10$\,\mu m$ in the horizontal and 5$\,\mu m$ in the vertical plane.  The long term (per month) monthly  variation of COD the closed orbit distortion  grows to almost this value in both plane. But we ignore it, Those slow variations are ignored,  and take notice of the only  abrupt change of COD caused for example by the ID gap change.   All COD data is recorded to the database, so later we can check the distorted orbit.   The noticeable distortions orbit changes  are written down into noted in  the logbook. {\em SLS} does currently not record closed orbit deviations.   If all beam position monitors (BPM) are used within the orbit feedback, then the deviation is always "zero" as long as the fast orbit feedback is running and correcting every 250$\,\mu s$. Instead we record Orbit-feedback-outages orbit-feedback outages are automatically recorded  if they are longer than 10 seconds. Large transient or persistent closed orbit deviations will switch off the orbit feedback to avoid beam losses due to malfunctioning BPMs. Succeeding events outages  are counted as one if the feedback runs for less than 2 minutes. We are planning to implement an additional event based on our independent analogue BPMs, adjacent to the insertion devices.   We do not have an absolute calibration for those BPM Number  and they have limited resolution.   But we can take a reference whenever the ID has been closed below a defined gap; a deviation duration  of the position at these BPM from their reference position by more than 10$\,\mu m$ will then be recorded as an orbit-deviation event. orbit-feedback outages are reported in the yearly operation statistics.  Table~\ref{tab:distorted-orbit-limits} shows a comparison how ``Distorted-orbit'' events failures  are handled at the different facilities. \begin{table}  \centering  \caption{\label{tab:distorted-orbit-limits} Distorted-orbit events ``Distorted-orbit'' failure mode  for different facilities, covering orbit feedback outages and deviations from the nominal orbit} \scriptsize  \begin{ruledtabular}  \begin{tabular}{lll} 

\end{table}  \subsection{Low-beam-lifetime}   Facilities in top-up can keep the beam current constant even with a low beam lifetime.   But this will cause an increased frequency of injections and therefore more distortions and background radiation for the experiments.  The limit for this failure mode does depend on the facility and the specific operation mode.   We propose to define a minimal lifetime $\tau_{\hbox{low}}$ for each user operation mode of a facility and  record whenever the lifetime is below that limit for more than a minute.   {\em ALBA} has a nominal beam lifetime that is given by the combination of both the filling pattern and the RF voltage.   The typical lifetime at 100$\,$mA is 22 hours.   ALBA operates with 6 RF cavities each fed with 2 IOTs and a typical ``low-beam-lifetime'' failure is caused by the trip of one IOT.   Since it is the trip of a sub-system, the operator will record it.  Normally the operator will recover the IOT and thus recover the nominal beam lifetime.  A beam lifetime below 18 hours is considered ``low''.   Top-up is stopped if the lifetime drops below 10 hours.  {\em SLS} has a typical beam lifetime of about 8 hours.   Depending on the vacuum and the selected coupling this lifetime may varies in practice between 6 and 10 hours during normal operation.   A low beam lifetime leads to more frequent injections from the top-up.   A ``low-beam-lifetime'' failure is automatically recorded when the beam lifetime stays below 4.5 hours for longer than five minutes.   The failure mode stops as soon as the lifetime is above 4.5 hours again for longer than one minute.   The failure mode ``low-beam-lifetime'' is not independently recorded at   neither BESSY II, Elettra, LNLS-UVX, PETRA III nor SPring-8 .  Table~\ref{tab:lifetime-limits} shows a comparison of what would be considered a ``low-beam-lifetime'' at the different facilities.  \begin{table}  \centering  \caption{\label{tab:lifetime-limits} Low-beam-lifetime limits}  \scriptsize  \begin{ruledtabular}  \begin{tabular}{lcrrl}  \textbf{Facility}&\textbf{Mode}&\textbf{$\tau_{\hbox{normal}}$}  &\textbf{$\tau_{\hbox{low}}$}  &\textbf{Remark}\\  & & (h) & (h) & \\\hline  ALBA & decay & 22 & 18 & \\  ALBA & top-up & 22 & 18 & \\  BESSY II & MB & 7 & 5 & stops top-up \\  BESSY II & SB & 2 & 1 & stops top-up \\  Elettra & 2.0$\,$GeV & 23 & 10 & \\  Elettra & 2.4$\,$GeV & 27 & 10 & \\  LNLS-UVX & decay &\textgreater 14 & not defined & \\  PETRA III & continuous & 12 & not defined & \\  PETRA III & timing & 1.5 & not defined & \\  SPring-8 & all modes &15 \dots 50 & 10 & \\  SLS & top-up & 8 & 4.5 & yearly evaluation \\  \end{tabular}  \end{ruledtabular}  \end{table}  \subsection{Beam-blow-up}  The beam size should stay constant for a light source, since the emittance is an important parameter.   We propose to define vertical and horizontal beam size limits for each operation mode and record whenever the beam dimensions are larger than these limits for more than a minute. While this failure mode is easy to define, it is very hard to detect for many facilities.   A beam height of 10 µm requires costly diagnostics to measure it continuously to 10\% precision.  Table~\ref{tab:blow-up-limits} shows a comparison of the Beam-blow-up event ``Beam-blow-up'' failure mode  detection. \begin{table}  \centering  \caption{\label{tab:blow-up-limits} Beam-blow-up ``Beam-blow-up''  limits} \begin{ruledtabular}  \begin{tabular}{lrrl}\scriptsize  \textbf{Facility}&\textbf{typical dimension}&\textbf{size increase}&\textbf{Remark}\\\hline 

ALBA & 70 x 30 & 20\% & recorded \\  BESSY II & 250 x 14 & 30\% & recorded \\  Elettra & 260 x 10 & 10\% & no on-line measurement yet \\  LNLS-UVX & 1000 x 120& 10\% & recorded \\ PETRA III & 140 x 7 & - & \\ no on-line measurement\\  SPring-8 & 100 x 12 & - & \\ not an independent failure mode\\  SLS & 50 x 10 & 50\% & recorded and evaluated \\  \end{tabular}  \end{ruledtabular} 

Some experiments have very strict requirements on the ratio between a filled single bunch and the residual charge in the neighbouring bucket.   This again depends on the specific requirements from the experiments.  Any deviation from the desired bunch filling may cause problems to some experiments.   This failure mode is mainly relevant to time resolved measurements and the usefulness of any definition depends on the requirements of the specific users. For each operation mode an allowed maximum charge deviation $dQ_{\hbox{max}}$ should be defined.   A bunch purity of $10^{-8}$ requires a lengthy procedure to be measured to a useful precision. 

Some failures do not affect the beam, but they do affect the user experiments.   If the beam is stored and all beam parameters are within the desired limits,   there still can be problems that prevent most users from running any experiments.  Infrastructure outages like massive control system and IT-infrastructure failures or photon shutter interlocks can lead to those situations. There cannot be a simple rule to calculate the start and stop of these types of event; failure;  but they should be recorded if they have an influence on a significant number of the experiments.   Currently beam unrelated incidences are considered to be ``downtime'' at some facilities,   if they prevent all beamline to continue their measurements.  This is the case at ALBA, Elettra and the SLS.  Other facilities do neglect those types of event failures  for their downtime calculation, as long as the electron beam was not affected. affected, for example PETRA III.  At most facilities those types of events these failures  are evaluated on a case by case basis: for example an interlock of all photon shutter would clearly be considered ``downtime'',   at least at ALBA, BESSY II and LNLS-UVX;  but a problem with the IT infrastructure might not, even if the majority of the users where affected.  \subsection{Short-user-time}  Many facilities have a cut off for a minimal time to store the beam.   E.g. For example  if less than one hour is between two beam trips then the time in-between is counted as downtime. This can be defined as an extra failure mode: ``short-user-time'. ``short-user-time''.  The limit of what time is too short for user experiments depends on the time the facility needs to get into thermal equilibrium and on the typical length of a measurement at an experiment. Each facility should define this cutoff time limit $ T_{\hbox{short-user-time}}$; it may depend on the operation mode.  BESSY II, Elettra, LNLS-UVX and the SLS consider a  beam delivery of a total length of less that one hour to be ``downtime'';  at ALBA the cut-off is at 30 minutes.  PETRA III does not record short-user-time as separate fault criteria,   but covers them by the rule which adds one hour (or one downtime) to the length of each beam outage.  SPring-8 does not have a cutoff for a beam delivery time.  {\em ALBA} considers ``Short-user-time'' as ``no-beam'' events if the interval between beam delivery and trip is less than about 30 minutes; although at the moment there is no official number for this short-user-time but a common sense rule is used.  {\em BESSY II} does not count beam up-times of less than an hour.   The start of the event is only known after it ends.  {\em Elettra} considers beam availabilities of less than 1 hour not to be user beam time; this period is not associated to a particular system, but is considered as "general injection setup" failure.   The second beam drop event is not considerate to evaluate the MTBF.   {\em LNLS-UVX} considers short-user-time a ``no-beam'' event if the interval between beam delivery and beam trip is less than one hour.   However, this is not automatically accounted for by the database analysis software in the calculations of machine reliability.   The database software prompts the user that a ``short-user-time'' was recorded but the elimination None  of the short period has to be made manually.  {\em PETRA III} does not these facilities  record short-user-time currently ``short-user-uptime''  asseparate fault criteria, but covers them by the rule which adds one hour (or one downtime) to the length of each beam outage.  {\em SPring-8} does not have a cutoff for  a beam storing time, although it is necessary for user experiments.   So, in the case of expecting short-user-time like a voltage drop by the thunderbolt, we postpone the resumption of the user operation until lightning goes away.   Then the waiting time is recorded as a downtime.   {\em SLS} does not count beam up-times of less than an hour.   Those short up-times are automatically neglected by the calculation of the beam availability.   Since this calculation is done independently of the recording of operation events, it can create a mismatch between the downtime calculated for ``no-beam'' events, and the downtime from the beam availability calculation.   We would therefore like to record short user-time events with the operation event logging system, for a better comparability of the operation event data and the calculated beam availability. separate failure mode.  \subsection{Secondary failure modes overview}  Table~\ref{tab:sf-limits} shows a comparison of secondary failure modes: 

%\scriptsize  \begin{ruledtabular}  \begin{tabular}{lccccr}  \textbf{Facility}&\textbf{Low-beam-lifetime}&\textbf{Distorted-orbit}&\textbf{Beam-blow-up}&\textbf{Distorted-fill}&\textbf{Short-up-time}\\ \textbf{Facility}&\textbf{Distorted-orbit}&\textbf{Low-beam-lifetime}&\textbf{Beam-blow-up}&\textbf{Distorted-fill}&\textbf{Short-up-time}\\  & & & & & (h) \\\hline  ALBA & on-line & on-line & on-line & - & 0.5 \\  BESSY II & on-line & on-line & on-line & on-line & 1.0 \\  Elettra & on-line & on-line & on-line & - & 1.0 \\  LNLS-UVX & on-line & on-line & on-line & - & 1.0 \\  PETRA III &- &  on-line & - & - & - &  $ \le$1 \\ SPring-8 & on-line & on-line & on-line & on-line & 0 \\  SLS & on-line & on-line & report & on-line & 1.0 \\  \end{tabular} 

\subsection{Discussion of the Secondary Failure Modes}  {\em ``Distorted-orbit''} failures are recorded at most facilities   but are rarely taken into account in the yearly failure statistics.  The limits when an orbit is considered out of specification are varying by orders of magnitudes between different facilities.  Publication of these limits would be very useful to compare facilities.  {\em  ``Low-beam-lifetime''} events failures  are apparently rare events at most facilities. The example of ALBA shows that it still makes sense to record and evaluate these events. faults.  The normal beam lifetime varies considerably between facilities and operation modes.   The nominal beam lifetime at the SLS would be considered a very low lifetime at ALBA or SPring-8.  But a significant decrease in the beam lifetime can cause problems at many facilities and  should therefore be recorded to evaluate the reliability of the facility in this respect.  {\em``Distorted-orbit''} events are recorded at most facilities   but are rarely taken into account in the yearly failure statistics.  The limits when an orbit is considered out of specification are varying by orders of magnitudes between different facilities.  Publication of these limits would be very useful to compare facilities.  {\em  ``Beam-blow-up''} events failures  are again infrequent at most facilities. Betatron coupling can affect the occurrence of  this event failure  type, since at very low coupling even small errors can lead to relatively large changes of the vertical beam size. For some facilities the vertical beam size is difficult to measure.  Nevertheless one needs to define limits for the tolerated variation of the beam size.  The number of reported failures outside these limits is an essential measure for the reliability of the facility.  {\em ``Distorted-filling and Bunch-purity''} events faults  are not relevant at all facilities. Time resolved measurements depend on bunch purity.  At present only few facilities have the means to measure the bunch purity on-line.  Sophisticated procedures are required to measure a bunch charge ratio of $10^{-8}$.  Many facilities do have the means to detect deviations from the nominal bunch charge distribution.  Where these means exist we encourage to publish failure limits and associated data.  {\em ``Beam-unrelated''} events failure  should be recorded whenever they have an impact on a significant number of beamlines. These events faults  turned out to be rather rare and are  often facility specific. {\em ``Short-user-time''} are for most facilities just subtracted in the beam availability calculations; they are not recorded as events. a failure mode.  An independent recordingof these events  would enable to calculate beam availability with and without accounting for the ``Short-user-time';. ``short-user-time'.  This would improve the comparison between facilities that handle ``Short-user-time'' ``short-user-time''  differently.