Data set groups¶
Data availability for this tutorial: the medium sized data set of 614 genes and 48 taxa that will be used can be downloaded here.
What are active data sets¶
Most operations in TriFusion can be applied to either the total data set (all files and taxa currently loaded) or to custom made data sets, named active data sets. When a custom data set is specified, operations will be applied only on the active files and/or taxa and ignore all others. These active data sets can defined in TriFusion in several ways and serve to quickly apply different operations on different sets of files/taxa.
How to define active data sets¶
Active data sets can be created/modified in two main ways:
Create data set groups¶
When the workflow requires the execution of operations to multiple
taxa/files data sets, it is more convenient to define all data set groups and
then use the dropdown menus (see How to apply data set groups below) to
select the desired active data set. Data set groups can be defined in
TriFusion by navigating to
Menu > Dataset Groups.
File and taxa groups are sorted into two tabs, like in the
panel, and clicking the
Set new file/taxa group button will start the
creation of the group.
Here you can choose to create the data set group either manually in TriFusion, or by providing the names of the files/taxa in a text file.
Manual creation in TriFusion¶
This option is discouraged for larger data sets (>500 items). In these cases, it is recommended to use the Group creation from file method.
The creation of groups is the same for both files and taxa. In this tutorial,
we will create a taxa group by clicking in the Taxa tab and then the
Set new taxa group button at the bottom of the side panel.
Here, groups can be created by selecting the desired taxa from the
All taxa column and using the arrow buttons to move them to the
Selected taxa column. Once the group is complete, give it a unique name
and the group is ready to be defined. If you wish to create multiple groups
in one sitting, click the
Apply button to create the group but remain
in the dialog.
Any previously created group will be listed under the Created groups column. These can be selected to move their corresponding taxa to the Selected taxa column and continue a new group definition from there.
Group creation from file¶
Here, we only have to provide a text file with the names of the files/taxa we wish to select for the group. The text file is the same as the one described in the Import selection from file example.
# Example of a text file for taxa selection in TriFusion Agaricus_bisporus Botrytis_cinerea Coniophora_puteana
After providing the file with the group names, specify a unique name of the new data set group, and that’s it!
How to apply data set groups¶
Now that we know how to create active data set groups, the final step is how can they be specified.
When using the Orthology module, only the active proteome files are used for the Orthology search operation.
Process and Statistics¶
For both Process and Statistics modules, the active data set is selected by default (that is, the file/taxa buttons active in the side panel). You can change to the total data set or to any user made data set by clicking the group’s name in the corresponding dropdown menu.
Dropdown menu in the Process screen:
Dropdown menu in the Statistics screen: