Registering Samples


Interactive

For this, we will use the materials in the samples/interactive/ folder of the example data. Here, we have the same spreadsheet in 2 different formats :

  • sample_spreadsheet.xlsx : this is in Excel format and has been annotated with colour-coding to highlight important features

  • sample_spreadsheet.tsv : tab-separated version of the above spreadsheet - this is the format accepted by Webin

In both cases, each row represents a sample and each column represents a metadata field.

To upload, simply visit the Webin Submissions Portal again and click on the green ‘Register Samples’ button. Choose to ‘Upload filled spreadsheet’ and upload the example sheet provided: samples/interactive/sample_spreadsheet.tsv. You should get a popup showing a successful submission and your 3 new sample accessions.

Programmatic

In this section, we will use the materials in the samples/programmatic folder of the example data. Here, you will find several XML files. Navigate to the directory to see the file list:

cd $WORKSHOP 
cd samples/programmatic
ls 
  • samples.xml : this contains the same set of samples as those submitted interactively in the previous section.

  • submission.xml : this XML is the same as that used to submit your study. It defines the <ADD/> action to create new samples.

  • submission_modify.xml : this submission XML defines the <MODIFY/> action, to allow us to update existing samples.

Samples XML

The samples XML format allows us to define many samples inside a <SAMPLE_SET> tag. Each sample (enclosed in <SAMPLE> tags), contains:

  • <TITLE> tags : defining the title of the sample

  • <SAMPLE_NAME> tags : defining the taxonomic information

  • <DESCRIPTION> tags : providing a description of what’s been sampled and

  • many <SAMPLE_ATTRIBUTE> tags : defining all other metadata fields

Note

Sample aliases are defined within the <SAMPLE> tag, e.g. <SAMPLE alias='this_alias'>. In the example data, the alias has been suffixed with the word ‘programmatic’. This is to avoid clashes with the same samples that were submitted interactively in the previous section.

Aliases must be unique.

Submit the samples

As we did with study registration, let’s send the samples XML and submission XML (with the <ADD/> action) to our test service using cURL to perform a submission:

curl -u username:password -F "SUBMISSION=@submission.xml" -F "SAMPLE=@samples.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"

Again, you should receive a receipt XML with information about submission success and accession numbers. Note that this time, you will receive a <SAMPLE> tag for each submitted sample (in this case, 3). Please take note of each sample alias and accession as we will use these later to submit data files against.

For more general information on programmatic sample registration, please see our documentation.

Modifying a sample

Sometimes, erroneous metadata can be uploaded, and the sample needs to be updated at a later date. This can be achieved by editing the sample XML file to update the relevant fields, and resubmitting with a submission XML containing the <MODIFY/> action in place of <ADD/>.

  1. First, open the samples.xml file and update a metadata field of your choice. e.g. new collection date. Save the file.

  2. This time, we will submit with the submission_modify.xml, which instructs the service to update an existing sample. The update uses the alias to detect existing samples, so it is important not to change the alias.

curl -u username:password -F "SUBMISSION=@submission_modify.xml" -F "SAMPLE=@samples.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"

Check the receipt for successful update. Note, it will also report the samples that haven’t been updated.

Warning

Although sample metadata can be updated, these updates are not automatically propagated to the EMBL files of their sequences. This is due to computational constraints.

Updating samples will be reflected in the sample page in the ENA browser, the BioSamples record, and the COVID-19 Data Portal.

If it is very important for your EMBL files to be updated with new metadata, please contact our helpdesk at virus-dataflow@ebi.ac.uk and we will endeavour to assist you.

Tip

Now we have our samples registered to our project, it’s time to add some data files. We’ll start with submitting raw read data.