Data Manipulation

When data sets are read into SeqExpress, the data is processed using a cascade of:

Filters

This is the first part in the filter, constrain and transform cascade that a data set undergoes within SeqExpress, for more information see Data menu.
Filtering out of genes will decrease the number of genes to be examined. Any number of filters can be applied to a data set. The addition, modification or removal of a filter will affect the whole data set such that any subsets of genes and visualizations will be invalidated (as genes that were represented in these samples may have been filtered out).

Filter By Value

To apply a new filter, select the filter option from the data menu - this provides a means to add new filters, to modify existing ones, and to remove filters from the list.

Filters are defined using matching rules of the type constant operator value, see Figure 2. The filters can produce a wide range of effects, it is possible to:
Filter all genes so that their values in all the experiments meets certain conditions (e.g. all values less than or equal to 100). Typically used to filter genes that show low or no expression (SAGE tags with zero count).
Filter genes so that their values in certain experiments meet certain conditions (e.g. their value in experiment 1 is greater than 1). Typically used to filter out genes from a control experiment.
Filter genes so that their range of values meets certain conditions (e.g. the max-min intensity for individual genes is greater than or equals to 10.). Typically used to filter out genes that do not show differential expression in the experiments
Filter genes so that their degree of change meets certain conditions (e.g. the max/min intensity for individual genes is greater than or equal to 100). Typically used to only use genes that show significant change in their expression levels.
Filter genes so that each of their values in an individual experiment is greater than the mean for that experiment (those that are always highly expressed), alternatively means plus/minus multiplies of standard deviations can be used (it is possible to find all genes that have values less than or equal to the mean minus one standard deviation – representing those that are relatively lowly expressed in all the experiments. The bands are: mean, mean plus one standard deviation, mean plus two standard deviations, mean minus one standard deviation and means minus two standard deviations.
The mean and standard deviation filters can also be applied to individual experiments (get all genes that have values greater than the mean value in experiment one).
Filtering using gene means and standard deviations is also available, so it is possible to select genes that have values greater than the mean value of all genes.
It is possible to combine some of these selection criteria within a filter (for example such that the range of values is greater than the mean value). If a filter is defined that produces no results a warning is given. The filters are applied sequentially to the data set.

The filter is defined using matching rules, in the case shown above the filter defined is such that only genes that have intensities in all the experiments greater than zero will be used.

Filter By Selection

Unwanted genes can also be filtered out by selecting one of the two filter by selection options. These options provide a means to either remove all the genes that have been selected (or are within a cluster) or to remove all genes that are not within a specific selection or cluster. To reset the information a reset filter mean option is provided.

Constraints

This is the second part in the filter, constrain and transform cascade that a data set undergoes within SeqExpress, for more information see Data menu. Constraints provide a mechanism to ensure that all the results from the collection of experiments reside within a certain range.
It is possible to add, modify or remove a constraint by selecting the constraint option in the data menu. For either all the experiments or for individual ones it is possible to enter either floor or ceiling values.

The above shows the dialog that is used to define a constraint that sets a floor value on the specified experiment, so that all values that are less than ten are automatically set to 10.

Transformations

The data can be transformed in a number of ways by using the transform option in the data menu. It is possible to transform the data by:

The transformations can be specified using the dialog from the data/transform menu. The transformation are applied in the following order: normalization,proportional adjustment,log.

Proportional adjustment and ranking of the data are the more commonly used techniques, as they allow for the comparison of data values that are independent of different experimental conditions.

Ranking is a useful technique, although it will adjust the values so that the differences between the more extreme values are minimised (effectively exaggerates the mediocre). Local ranking is useful for comparing different gene chip experiments, whilst global ranking is more applicable to SAGE experiments.

Proportional adjustment will alter the data so that unusual profiles are exaggerated (gene that are only highly expressed in a few experiments). If only one dimension is used then the effects of per gene/per experiment will result in major differences (if a per experiment proportional adjustment is used then this is adjusting the values so that the sum of all the expression profiles in each experiment is the same).