The “Data()” class

The Data class is the main container for smFRET measurements. It contains timestamps, detectors and all the results of data processing such as background estimation, burst data, fitted FRET and so on.

The reference documentation of the class follows.

“Data()” class: description and attributes

A description of the Data class and its main attributes.

class fretbursts.burstlib.Data(leakage=0.0, gamma=1.0, dir_ex=0.0, **kwargs)

Container for all the information (timestamps, bursts) of a dataset.

Data() contains all the information of a dataset (name, timestamps, bursts, correction factors) and provides several methods to perform analysis (background estimation, burst search, FRET fitting, etc...).

When loading a measurement file a Data() object is created by one of the loader functions in loaders.py. Data() objects can be also created with Data.copy(), Data.fuse_bursts() or Data.select_bursts().

To add or delete data-attributes use .add() or .delete() methods. All the standard data-attributes are listed below.

Note

Attributes of type “list” contain one element per channel. Each element, in turn, can be an array. For example .ph_times_m[i] is the array of timestamps for channel i; or .nd[i] is the array of donor counts in each burst for channel i.

Measurement attributes

fname

string

measurements file name

nch

int

number of channels

clk_p

float

clock period in seconds for timestamps in ph_times_m

ph_times_m

list

list of timestamp arrays (int64). Each array contains all the timestamps (donor+acceptor) in one channel.

A_em

list

list of boolean arrays marking acceptor timestamps. Each array is a boolean mask for the corresponding ph_times_m array.

leakage

float or array of floats

leakage (or bleed-through) fraction. May be scalar or same size as nch.

gamma

float or array of floats

gamma factor. May be scalar or same size as nch.

D_em

list of boolean arrays

[ALEX-only] boolean mask for .ph_times_m[i] for donor emission

D_ex, A_ex

list of boolean arrays

[ALEX-only] boolean mask for .ph_times_m[i] during donor or acceptor excitation

D_ON, A_ON

2-element tuples of int

[ALEX-only] start-end values for donor and acceptor excitation selection.

alex_period

int

[ALEX-only] duration of the alternation period in clock cycles.

Background Attributes

The background is computed with Data.calc_bg() and is estimated in chunks of equal duration called background periods. Estimations are performed in each spot and photon stream. The following attributes contain the estimated background rate.

bg

dict

background rates for the different photon streams, channels and background periods. Keys are Ph_sel objects and values are lists (one element per channel) of arrays (one element per background period) of background rates.

bg_mean

dict

mean background rates across the entire measurement for the different photon streams and channels. Keys are Ph_sel objects and values are lists (one element per channel) of background rates.

nperiods

int

number of periods in which timestamps are split for background calculation

bg_fun

function

function used to compute the background rates

Lim

list

each element of this list is a list of index pairs for .ph_times_m[i] for first and last photon in each period.

Ph_p

list

each element in this list is a list of timestamps pairs for first and last photon of each period.

bg_ph_sel

Ph_sel object

photon selection used by Lim and Ph_p. See fretbursts.ph_sel for details.

Th_us

dict

thresholds in us used to select the tail of the interphoton delay distribution. Keys are Ph_sel objects and values are lists (one element per channel) of arrays (one element per background period).

Additionlly, there are a few deprecated attributes (bg_dd, bg_ad, bg_da, bg_aa, rate_dd, rate_ad, rate_da, rate_aa and rate_m) which will be removed in a future version. Please use Data.bg and Data.bg_mean instead.

Burst search parameters (user input)

These are the parameters used to perform the burst search (see burst_search()).

ph_sel

Ph_sel object

photon selection used for burst search. See fretbursts.ph_sel for details.

m

int

number of consecutive timestamps used to compute the local rate during burst search

L

int

min. number of photons for a burst to be identified and saved

P

float, probability

valid values [0..1]. Probability that a burst-start is due to a Poisson background. The employed Poisson rate is the one computed by .calc_bg().

F

float

(F * background_rate) is the minimum rate for burst-start

Burst search data (available after burst search)

When not specified, parameters marked as (list of arrays) contains arrays with one element per bursts. mburst arrays contain one “row” per burst. TT arrays contain one element per period (see above: background attributes).

mburst

list of Bursts objects

list Bursts() one element per channel. See fretbursts.phtools.burstsearch.Bursts.

TT

list of arrays

list of arrays of T values (in sec.). A T value is the maximum delay between m photons to have a burst-start. Each channels has an array of T values, one for each background “period” (see above).

T

array

per-channel mean of TT

nd, na

list of arrays

number of donor or acceptor photons during donor excitation in each burst

nt

list of arrays

total number photons (nd+na+naa)

naa

list of arrays

number of acceptor photons in each bursts during acceptor excitation [ALEX only]

bp

list of arrays

time period for each burst. Same shape as nd. This is needed to identify the background rate for each burst.

bg_bs

list

background rates used for threshold computation in burst search (is a reference to bg, bg_dd or bg_ad).

fuse

None or float

if not None, the burst separation in ms below which bursts have been fused (see .fuse_bursts()).

E

list

FRET efficiency value for each burst: E = na/(na + gamma*nd).

S

list

stoichiometry value for each burst: S = (gamma*nd + na) /(gamma*nd + na + naa)

Summary information

List of Data attributes and methods providing summary information on the measurement:

class fretbursts.burstlib.Data
time_max

The last recorded time in seconds.

time_min

The first recorded time in seconds.

ph_data_sizes

Array of total number of photons (ph-data) for each channel.

num_bursts

Array of number of bursts in each channel.

burst_sizes(gamma=1.0, add_naa=False, beta=1.0, donor_ref=True)

Return gamma corrected burst sizes for all the channel.

Compute burst sizes by calling burst_sizes_ich() for each channel. See burst_sizes_ich() for a description of the arguments.

Returns
List of arrays of burst sizes, one array per channel.
burst_sizes_ich(ich=0, gamma=1.0, add_naa=False, beta=1.0, donor_ref=True)

Return gamma corrected burst sizes for channel ich.

If donor_ref == True (default) the gamma corrected burst size is computed according to:

1)    nd + na / gamma

Otherwise, if donor_ref == False, the gamma corrected burst size is:

2)    nd * gamma  + na

With the definition (1) the corrected burst size is equal to the raw burst size for zero-FRET or D-only bursts (that’s why is donor_ref). With the definition (2) the corrected burst size is equal to the raw burst size for 100%-FRET bursts.

In an ALEX measurement, use add_naa = True to add counts from AexAem stream to the returned burst size. The argument gamma and beta are used to correctly scale naa so that it become commensurate with the Dex corrected burst size. In particular, when using definition (1) (i.e. donor_ref = True), the total burst size is:

(nd + na/gamma) + naa / (beta * gamma)

Conversely, when using definition (2) (donor_ref = False), the total burst size is:

(nd * gamma + na) + naa / beta
Parameters:
  • ich (int) – the spot number, only relevant for multi-spot. In single-spot data there is only one channel (ich=0) so this argument may be omitted. Default 0.
  • add_naa (boolean) – when True, add a term for AexAem photons when computing burst size. Default False.
  • gamma (float) – coefficient for gamma correction of burst sizes. Default: 1. For more info see explanation above.
  • beta (float) – beta correction factor used for the AexAem term of the burst size. Default 1. If add_naa = False or measurement is not ALEX this argument is ignored. For more info see explanation above.
Returns
Array of burst sizes for channel ich.

See also fretbursts.burstlib.Data.get_naa_corrected().

get_naa_corrected(ich=0, gamma=1.0, beta=1.0, donor_ref=True)

Return corrected naa array for channel ich.

Parameters:
  • ich (int) – the spot number, only relevant for multi-spot.
  • gamma (floats) – gamma-factor to use in computing the corrected naa.
  • beta (float) – beta-factor to use in computing the corrected naa.
  • donor_ref (bool) – Select the convention for naa correction. If True (default), uses naa / (beta * gamma). Otherwise, uses naa / beta. A consistent convention should be used for the corrected Dex burst size in order to make it commensurable with naa.

See also fretbursts.burstlib.Data.burst_sizes_ich().

burst_widths

List of arrays of burst duration in seconds. One array per channel.

ph_in_bursts_ich(ich=0, ph_sel=Ph_sel(Dex='DAem', Aex='DAem'))

Return timestamps of photons inside bursts for channel ich.

Returns
Array of photon timestamps in channel ich and photon selection ph_sel that are inside any burst.
ph_in_bursts_mask_ich(ich=0, ph_sel=Ph_sel(Dex='DAem', Aex='DAem'))

Return mask of all photons inside bursts for channel ich.

Returns
Boolean array for photons in channel ich and photon selection ph_sel that are inside any burst.
status(add='', noname=False)

Return a string with burst search, corrections and selection info.

name

Measurement name: last subfolder + file name with no extension.

Name(add='')

Return short filename + status information.

Analysis methods

The following methods perform background estimation, burst search and burst-data calculations:

The methods documentation follows:

class fretbursts.burstlib.Data
calc_bg(fun, time_s=60, tail_min_us=500, F_bg=2, error_metrics=None, fit_allph=True)

Compute time-dependent background rates for all the channels.

Compute background rates for donor, acceptor and both detectors. The rates are computed every time_s seconds, allowing to track possible variations during the measurement.

Parameters:
  • fun (function) – function for background estimation (example bg.exp_fit)
  • time_s (float, seconds) – compute background each time_s seconds
  • tail_min_us (float, tuple or string) – min threshold in us for photon waiting times to use in background estimation. If float is the same threshold for ‘all’, DD, AD and AA photons and for all the channels. If a 3 or 4 element tuple, each value is used for ‘all’, DD, AD or AA photons, same value for all the channels. If ‘auto’, the threshold is computed for each stream (‘all’, DD, DA, AA) and for each channel as bg_F * rate_ml0. rate_ml0 is an initial estimation of the rate performed using bg.exp_fit() and a fixed threshold (default 250us).
  • F_bg (float) – when tail_min_us is ‘auto’, is the factor by which the initial background estimation if multiplied to compute the threshold.
  • error_metrics (string) – Specifies the error metric to use. See fretbursts.background.exp_fit() for more details.
  • fit_allph (bool) – if True (default) the background for the all-photon is fitted. If False it is computed as the sum of backgrounds in all the other streams.

The background estimation functions are defined in the module background (conventionally imported as bg).

Example

Compute background with bg.exp_fit (inter-photon delays MLE tail fitting), every 30s, with automatic tail-threshold:

d.calc_bg(bg.exp_fit, time_s=20, tail_min_us='auto')
Returns:None, all the results are saved in the object itself.

Performs a burst search with specified parameters.

This method performs a sliding-window burst search without binning the timestamps. The burst starts when the rate of m photons is above a minimum rate, and stops when the rate falls below the threshold. The result of the burst search is stored in the mburst attribute (a list of Bursts objects, one per channel) containing start/stop times and indexes. By default, after burst search, this method computes donor and acceptor counts, it applies burst corrections (background, leakage, etc...) and computes E (and S in case of ALEX). You can skip these steps by passing computefret=False.

The minimum rate can be explicitly specified with the min_rate_cps argument, or computed as a function of the background rate with the F argument.

Parameters:
  • m (int) – number of consecutive photons used to compute the photon rate. Typical values 5-20. Default 10.
  • L (int or None) – minimum number of photons in burst. If None (default) L = m is used.
  • F (float) – defines how many times higher than the background rate is the minimum rate used for burst search (min rate = F * bg. rate), assuming that P = None (default). Typical values are 3-9. Default 6.
  • P (float) – threshold for burst detection expressed as a probability that a detected bursts is not due to a Poisson background. If not None, P overrides F. Note that the background process is experimentally super-Poisson so this probability is not physically very meaningful. Using this argument is discouraged.
  • min_rate_cps (float or list/array) – minimum rate in cps for burst start. If not None, it has the precedence over P and F. If non-scalar, contains one rate per each multispot channel. Typical values range from 20e3 to 100e3.
  • ph_sel (Ph_sel object) – defines the “photon selection” (or stream) to be used for burst search. Default: all photons. See fretbursts.ph_sel for details.
  • compact (bool) – if True, a photon selection of only one excitation period is required and the timestamps are “compacted” by removing the “gaps” between each excitation period.
  • index_allph (bool) – if True (default), the indexes of burst start and stop (istart, istop) are relative to the full timestamp array. If False, the indexes are relative to timestamps selected by the ph_sel argument.
  • c (float) – correction factor used in the rate vs time-lags relation. c affects the computation of the burst-search parameter T. When F is not None, T = (m - 1 - c) / (F * bg_rate). When using min_rate_cps, T = (m - 1 - c) / min_rate_cps.
  • computefret (bool) – if True (default) compute donor and acceptor counts, apply corrections (background, leakage, direct excitation) and compute E (and S). If False, skip all these steps and stop just after the initial burst search.
  • max_rate (bool) – if True compute the max photon rate inside each burst using the same m used for burst search. If False (default) skip this step.
  • dither (bool) – if True applies dithering corrections to burst counts. Default False. See Data.dither().
  • pure_python (bool) – if True, uses the pure python functions even when optimized Cython functions are available.

Note

when using P or F the background rates are needed, so .calc_bg() must be called before the burst search.

Example

d.burst_search(m=10, F=6)

Returns:None, all the results are saved in the Data object.
calc_fret(count_ph=False, corrections=True, dither=False, mute=False, pure_python=False)

Compute FRET (and stoichiometry if ALEX) for each burst.

This is an high-level functions that can be run after burst search. By default, it will count Donor and Acceptor photons, perform corrections (background, leakage), and compute gamma-corrected FRET efficiencies (and stoichiometry if ALEX).

Parameters:
  • count_ph (bool) – if True (default), calls calc_ph_num() to counts Donor and Acceptor photons in each bursts
  • corrections (bool) – if True (default), applies background and bleed-through correction to burst data
  • dither (bool) – whether to apply dithering to burst size. Default False.
  • mute (bool) – whether to mute all the printed output. Default False.
  • pure_python (bool) – if True, uses the pure python functions even when the optimized Cython functions are available.
Returns:

None, all the results are saved in the object.

calc_ph_num(alex_all=False, pure_python=False)

Computes number of D, A (and AA) photons in each burst.

Parameters:
  • alex_all (bool) – if True and self.ALEX is True, computes also the donor channel photons during acceptor excitation (nda)
  • pure_python (bool) – if True, uses the pure python functions even when the optimized Cython functions are available.
Returns:

Saves nd, na, nt (and eventually naa, nda) in self. Returns None.

fuse_bursts(ms=0, process=True, mute=False)

Return a new Data object with nearby bursts fused together.

Parameters:
  • ms (float) – fuse all burst separated by less than ms millisecs. If < 0 no burst is fused. Note that with ms = 0, overlapping bursts are fused.
  • process (bool) – if True (default), reprocess the burst data in the new object applying corrections and computing FRET.
  • mute (bool) – if True suppress any printed output.
calc_sbr(ph_sel=Ph_sel(Dex='DAem', Aex='DAem'), gamma=1.0)

Return Signal-to-Background Ratio (SBR) for each burst.

Parameters:
  • ph_sel (Ph_sel object) – object defining the photon selection for which to compute the sbr. Changes the photons used for burst size and the corresponding background rate. Valid values here are Ph_sel(‘all’), Ph_sel(Dex=’Dem’), Ph_sel(Dex=’Aem’). See fretbursts.ph_sel for details.
  • gamma (float) – gamma value used to compute corrected burst size in the case ph_sel is Ph_sel(‘all’). Ignored otherwise.
Returns:

A list of arrays (one per channel) with one value per burst. The list is also saved in sbr attribute.

calc_max_rate(m, ph_sel=Ph_sel(Dex='DAem', Aex='DAem'), compact=False, c=1)

Compute the max m-photon rate reached in each burst.

Parameters:
  • m (int) – number of timestamps to use to compute the rate. As for burst search, typical values are 5-20.
  • ph_sel (Ph_sel object) – object defining the photon selection. See fretbursts.ph_sel for details.
  • c (float) – this parameter is used in the definition of the rate estimator which is (m - 1 - c) / t[last] - t[first]. For more details see phtools.phrates.mtuple_rates().

Burst corrections

Correction factors

The following are the various burst correction factors. They are Data properties, so setting their value automatically updates all the burst quantities (including E and S).

class fretbursts.burstlib.Data
gamma

Gamma correction factor (compensates DexDem and DexAem unbalance).

leakage

Spectral leakage (bleed-through) of D emission in the A channel.

dir_ex

Direct excitation correction factor.

chi_ch

Per-channel relative gamma factor.

Correction methods

List of Data methods used to apply burst corrections.

class fretbursts.burstlib.Data
background_correction(relax_nt=False, mute=False)

Apply background correction to burst sizes (nd, na,...)

leakage_correction(mute=False)

Apply leakage correction to burst sizes (nd, na,...)

dither(lsb=2, mute=False)

Add dithering (uniform random noise) to burst counts (nd, na,...).

The dithering amplitude is the range -0.5*lsb .. 0.5*lsb.

Burst selection methods

Data methods that allow to filter bursts according to different rules. See also Burst selection.

class fretbursts.burstlib.Data
select_bursts(filter_fun, negate=False, computefret=True, args=None, **kwargs)

Return an object with bursts filtered according to filter_fun.

This is the main method to select bursts according to different criteria. The selection rule is defined by the selection function filter_fun. FRETBursts provides a several predefined selection functions see Burst selection. New selection functions can be defined and passed to this method to implement arbitrary selection rules.

Parameters:
  • filter_fun (fuction) – function used for burst selection
  • negate (boolean) – If True, negates (i.e. take the complementary) of the selection returned by filter_fun. Default False.
  • computefret (boolean) – If True (default) recompute donor and acceptor counts, corrections and FRET quantities (i.e. E, S) in the new returned object.
  • args (tuple or None) – positional arguments for filter_fun()
kwargs:
Additional keyword arguments passed to filter_fun().
Returns:A new Data object containing only the selected bursts.

Note

In order to save RAM, the timestamp arrays (ph_times_m) of the new Data() points to the same arrays of the original Data(). Conversely, all the bursts data (mburst, nd, na, etc...) are new distinct objects.

select_bursts_mask(filter_fun, negate=False, return_str=False, args=None, **kwargs)

Returns mask arrays to select bursts according to filter_fun.

The function filter_fun is called to compute the mask arrays for each channel.

This method is useful when you want to apply a selection from one object to a second object. Otherwise use Data.select_bursts().

Parameters:
  • filter_fun (fuction) – function used for burst selection
  • negate (boolean) – If True, negates (i.e. take the complementary) of the selection returned by filter_fun. Default False.
  • return_str – if True return, for each channel, a tuple with a bool array and a string that can be added to the measurement name to indicate the selection. If False returns only the bool array. Default False.
  • args (tuple or None) – positional arguments for filter_fun()
kwargs:
Additional keyword arguments passed to filter_fun().
Returns:A list of boolean arrays (one per channel) that define the burst selection. If return_str is True returns a list of tuples, where each tuple is a bool array and a string.
select_bursts_mask_apply(masks, computefret=True, str_sel='')

Returns a new Data object with bursts selected according to masks.

This method select bursts using a list of boolean arrays as input. Since the user needs to create the boolean arrays first, this method is useful when experimenting with new selection criteria that don’t have a dedicated selection function. Usually, however, it is easier to select bursts through Data.select_bursts() (using a selection function).

Parameters:
  • masks (list of arrays) – each element in this list is a boolean array that selects bursts in a channel.
  • computefret (boolean) – If True (default) recompute donor and acceptor counts, corrections and FRET quantities (i.e. E, S) in the new returned object.
Returns:

A new Data object containing only the selected bursts.

Note

In order to save RAM, the timestamp arrays (ph_times_m) of the new Data() points to the same arrays of the original Data(). Conversely, all the bursts data (mburst, nd, na, etc...) are new distinct objects.

See also

Data.select_bursts(), Data.select_mask()

Fitting methods

Some fitting methods for burst data. Note that E and S histogram fitting with generic models is now handled with the new fitting framework.

class fretbursts.burstlib.Data
fit_E_generic(E1=-1, E2=2, fit_fun=<function two_gaussian_fit_hist>, weights=None, gamma=1.0, **fit_kwargs)

Fit E in each channel with fit_fun using burst in [E1,E2] range. All the fitting functions are defined in fretbursts.fit.gaussian_fitting.

Parameters:
  • weights (string or None) – specifies the type of weights If not None weights will be passed to fret_fit.get_weights(). weights can be not-None only when using fit functions that accept weights (the ones ending in _hist or _EM)
  • gamma (float) – passed to fret_fit.get_weights() to compute weights

All the additional arguments are passed to fit_fun. For example p0 or mu_fix can be passed (see fit.gaussian_fitting for details).

Note

Use this method for CDF/PDF or hist fitting. For EM fitting use fit_E_two_gauss_EM().

fit_E_m(E1=-1, E2=2, weights='size', gamma=1.0)

Fit E in each channel with the mean using bursts in [E1,E2] range.

Note

This two fitting are equivalent (but the first is much faster):

fit_E_m(weights='size')
fit_E_minimize(kind='E_size', weights='sqrt')

However fit_E_minimize() does not provide a model curve.

fit_E_ML_poiss(E1=-1, E2=2, method=1, **kwargs)

ML fit for E modeling size ~ Poisson, using bursts in [E1,E2] range.

fit_E_minimize(kind='slope', E1=-1, E2=2, **kwargs)

Fit E using method kind (‘slope’ or ‘E_size’) and bursts in [E1,E2] If kind is ‘slope’ the fit function is fret_fit.fit_E_slope() If kind is ‘E_size’ the fit function is fret_fit.fit_E_E_size() Additional arguments in kwargs are passed to the fit function.

fit_E_two_gauss_EM(fit_func=<function two_gaussian_fit_EM>, weights='size', gamma=1.0, **kwargs)

Fit the E population to a Gaussian mixture model using EM method. Additional arguments in kwargs are passed to the fit_func().

Data access methods

The following methods are used to access (or iterate over) the arrays of timestamps (for different photon streams), timestamps masks and burst data.

The methods documentation follows:

class fretbursts.burstlib.Data
get_ph_times(ich=0, ph_sel=Ph_sel(Dex='DAem', Aex='DAem'), compact=False)

Returns the timestamps array for channel ich.

This method always returns in-memory arrays, even when ph_times_m is a disk-backed list of arrays.

Parameters:
  • ph_sel (Ph_sel object) – object defining the photon selection. See fretbursts.ph_sel for details.
  • compact (bool) – if True, a photon selection of only one excitation period is required and the timestamps are “compacted” by removing the “gaps” between each excitation period.
iter_ph_times(ph_sel=Ph_sel(Dex='DAem', Aex='DAem'), compact=False)

Iterator that returns the arrays of timestamps in .ph_times_m.

Parameters:Same arguments as :meth:`get_ph_mask` except for `ich`.
get_ph_mask(ich=0, ph_sel=Ph_sel(Dex='DAem', Aex='DAem'))

Returns a mask for ph_sel photons in channel ich.

The masks are either boolean arrays or slices (full or empty). In both cases they can be used to index the timestamps of the corresponding channel.

Parameters:ph_sel (Ph_sel object) – object defining the photon selection. See fretbursts.ph_sel for details.
iter_ph_masks(ph_sel=Ph_sel(Dex='DAem', Aex='DAem'))

Iterator returning masks for ph_sel photons.

Parameters:ph_sel (Ph_sel object) – object defining the photon selection. See fretbursts.ph_sel for details.
iter_bursts_ph(ich=0)

Iterate over (start, stop) indexes to slice photons for each burst.

expand(ich=0, alex_naa=False, width=False)

Return per-burst D and A sizes (nd, na) and their background counts.

This method returns for each bursts the corrected signal counts and background counts in donor and acceptor channels. Optionally, the burst width is also returned.

Parameters:
  • ich (int) – channel for the bursts (can be not 0 only in multi-spot)
  • alex_naa (bool) – if True and self.ALEX, returns burst sizes and background also for acceptor photons during accept. excitation
  • width (bool) – whether return the burst duration (in seconds).
Returns:

List of arrays – nd, na, donor bg, acceptor bg. If alex_naa is True returns: nd, na, naa, bg_d, bg_a, bg_aa. If width is True returns the bursts duration (in sec.) as last element.

copy(mute=False)

Copy data in a new object. All arrays copied except for ph_times_m

slice_ph(time_s1=0, time_s2=None, s='slice')

Return a new Data object with ph in [time_s1,`time_s2`] (seconds)

If ALEX, this method must be called right after fretbursts.loader.alex_apply_periods() (with delete_ph_t=True) and before any background estimation or burst search.