UCSC track hub tutorial

How to make a track hub for UCSC browser using Amazon S3 Bucket

2025-03-26

In this tutorial, we're going to learn how to upload a collection of genomic files to the UCSC browser.

Usually, when I need to look at genomic data I'm analyzing, I'm using IGV Genome Browser installed locally. So, why UCSC?

The answer is quite simple: sharing. Sure, it's possible to send a link to your publicly available file so that another person could load it in IGV, but track hubs are a bit more convenient, in my opinion. They allow to share a whole session with multitude of tracks using just one hyperlink.

So, the first obstacle that we run into is the problem of hosting the files we want to share with UCSC browser session. If you're a lucky one and have a server that can be accessed from the outside, e.g., SFTP, and you can just upload your files there and share with anybody, you can just skip to the step 2. The rest of us need first to find a solution for storing our data.

1. Creating AWS S3 Bucket and uploading your files.

AWS S3 (Simple Storage Service) is true to its name and provides a cheap and simple option to store and share your files. If you don't have an account yet, you're in for a treat, because, as of time of writing this, AWS offers a free tier that lasts for a year and includes S3 as well. If I remember correctly, there's a limitation on how much data you can store, but I don't think we'll come even close using our data, unless it's something very heavy, like BAM files. Even when your free tier expires, AWS offers a variety of alarms for exceeding your budget that by default amounts to 0. But to put it more into perspective, and it's something that was very difficult to find for me, I'm now out of my free tier, have about 4 buckets that aren't often paged, and I'm paying less than a US dollar per month. If you're going to experiment with Elastic Cloud (EC2) services with your free tier, be very carefull, because they cost more than S3.

For creating your AWS account and S3 Bucket, I'll refer you to this guide. The process is not very complicated. Two things that are highlighted there and I'll repeat them: the name of your bucket should be unique for your region and you need to grant public access for your files so that UCSC Browser could "see" them.

2. Filling your bucket with tracks and hub info

Our next mission is to prepare our hub. First, we upload the tracks using either Web-interface or AWS cli tool. If using the first option, don't forget to grant the public access to your files by selecting all the files, pressing the "Actions" button and then "Make public using ACL".

But the tracks aren't the only things we need to upload. Your bucket should also contain a folder with three files:

Let's go over the files' contents, starting with genomes.txt


genome hg38
trackDb trackDb.txt
      

In the first line, we'll write the name of the genome Usually, you get to know it before you map your reads. In case you downloaded the data from GEO or some other repository, it should be stated in the data description. we used to process our files. Let's assume that we're going to share some bigWig files from ChIP-seq done on most popular human genome version hg38.

The second line is the location of our trackDb.txt file. We're planning to put it in the same folder as genomes.txt, so we need just to enter the name of the file.

Now onto the hub.txt, which is a bit more complicated:


hub p300_ChIP_hub
shortLabel My Track Hub
longLabel My Custom UCSC Track Hub
genomesFile genomes.txt
email me@myself.com
      

But not to worry! You can leave all the lines as they are here, except for the first one: generally it's a good idea to give a proper name to your hub. You can reiterate the same name on the second and third lines using spaces.

Show Code

#!/bin/bash

# Configuration variables
BUCKET_NAME="your-bucket-name"
GENOME="hg38"
EMAIL="me@myself.com"
HUB_NAME="my_track_hub"
LOCAL_DIR="/tmp/track_hub"
S3_HUB_PATH="s3://$BUCKET_NAME/hub/"
REGION="eu-north-1"
PUBLIC_URL="https://$BUCKET_NAME.s3.$REGION.amazonaws.com"

# Create local directory for hub files
mkdir -p $LOCAL_DIR

# Create hub.txt file
cat <<EOL > $LOCAL_DIR/hub.txt
hub $HUB_NAME
shortLabel My Track Hub
longLabel My Custom UCSC Track Hub
genomesFile genomes.txt
email $EMAIL
EOL

# Create genomes.txt file
cat <<EOL > $LOCAL_DIR/genomes.txt
genome $GENOME
trackDb trackDb.txt
EOL

# Initialize trackDb.txt file
TRACKDB="$LOCAL_DIR/trackDb.txt"
echo "" > $TRACKDB

# List files from S3 bucket and generate tracks based on file extensions
aws s3 ls s3://$BUCKET_NAME/ --recursive | while read -r line;
do
    # Extract file name and path
    file_path=$(echo $line | awk '{print $4}')
    file_name=$(basename "$file_path")
    
    # Skip non-data files (you can modify this based on the file types you care about)
    if [[ ! $file_name =~ \.(bw|bed|bam)$ ]]; then
        continue
    fi

    # Get the file type from the extension
    if [[ $file_name == *.bw ]]; then
        track_type="bigWig"
        additional_settings="autoscale on
    maxHeightPixels 80:80:80"
    elif [[ $file_name == *.bed ]]; then
        track_type="bed"
        additional_settings=""
    elif [[ $file_name == *.bam ]]; then
        track_type="bam"
        additional_settings=""
    else
        continue
    fi

    # Generate track entry
    # Strip the extension for track name
    track_name=$(basename "$file_name" | sed 's/\.[^.]*$//')  
    bigDataUrl="$PUBLIC_URL/$file_path"

    # Add the track entry to trackDb.txt
    cat <<EOL > $TRACKDB
track $track_name
type $track_type
shortLabel $track_name
longLabel $track_name
visibility full
bigDataUrl $bigDataUrl
$additional_settings

EOL

done

# Upload generated hub files to the S3 bucket
aws s3 cp $LOCAL_DIR/hub.txt $S3_HUB_PATH --acl public-read
aws s3 cp $LOCAL_DIR/genomes.txt $S3_HUB_PATH --acl public-read
aws s3 cp $LOCAL_DIR/trackDb.txt $S3_HUB_PATH --acl public-read

# Output the UCSC track hub URL
echo "Track Hub URL: $PUBLIC_URL/hub/hub.txt"

# Clean up temporary files
rm -rf $LOCAL_DIR