UCSC track hub tutorial
How to make a track hub for UCSC browser using Amazon S3 Bucket
How to make a track hub for UCSC browser using Amazon S3 Bucket
2025-03-26
In this tutorial, we're going to learn how to upload a collection of genomic files to the UCSC browser.
Usually, when I need to look at genomic data I'm analyzing, I'm using IGV Genome Browser installed locally. So, why UCSC?
The answer is quite simple: sharing. Sure, it's possible to send a link to your publicly available file so that another person could load it in IGV, but track hubs are a bit more convenient, in my opinion. They allow to share a whole session with multitude of tracks using just one hyperlink.
So, the first obstacle that we run into is the problem of hosting the files we want to share with UCSC browser session. If you're a lucky one and have a server that can be accessed from the outside, e.g., SFTP, and you can just upload your files there and share with anybody, you can just skip to the step 2. The rest of us need first to find a solution for storing our data.
AWS S3 (Simple Storage Service) is true to its name and provides a cheap and simple option to store and share your files. If you don't have an account yet, you're in for a treat, because, as of time of writing this, AWS offers a free tier that lasts for a year and includes S3 as well. If I remember correctly, there's a limitation on how much data you can store, but I don't think we'll come even close using our data, unless it's something very heavy, like BAM files. Even when your free tier expires, AWS offers a variety of alarms for exceeding your budget that by default amounts to 0. But to put it more into perspective, and it's something that was very difficult to find for me, I'm now out of my free tier, have about 4 buckets that aren't often paged, and I'm paying less than a US dollar per month. If you're going to experiment with Elastic Cloud (EC2) services with your free tier, be very carefull, because they cost more than S3.
For creating your AWS account and S3 Bucket, I'll refer you to this guide. The process is not very complicated. Two things that are highlighted there and I'll repeat them: the name of your bucket should be unique for your region and you need to grant public access for your files so that UCSC Browser could "see" them.
Our next mission is to prepare our hub. First, we upload the tracks using either Web-interface or AWS cli tool. If using the first option, don't forget to grant the public access to your files by selecting all the files, pressing the "Actions" button and then "Make public using ACL".
But the tracks aren't the only things we need to upload. Your bucket should also contain a folder with three files:
Let's go over the files' contents, starting with genomes.txt
genome hg38
trackDb trackDb.txt
In the first line, we'll write the name of the genome Usually, you get to know it before you map your reads. In case you downloaded the data from GEO or some other repository, it should be stated in the data description. we used to process our files. Let's assume that we're going to share some bigWig files from ChIP-seq done on most popular human genome version hg38.
The second line is the location of our trackDb.txt file. We're planning to put it in the same folder as genomes.txt, so we need just to enter the name of the file.
Now onto the hub.txt, which is a bit more complicated:
hub p300_ChIP_hub
shortLabel My Track Hub
longLabel My Custom UCSC Track Hub
genomesFile genomes.txt
email me@myself.com
But not to worry! You can leave all the lines as they are here, except for the first one: generally it's a good idea to give a proper name to your hub. You can reiterate the same name on the second and third lines using spaces.
#!/bin/bash
# Configuration variables
BUCKET_NAME="your-bucket-name"
GENOME="hg38"
EMAIL="me@myself.com"
HUB_NAME="my_track_hub"
LOCAL_DIR="/tmp/track_hub"
S3_HUB_PATH="s3://$BUCKET_NAME/hub/"
REGION="eu-north-1"
PUBLIC_URL="https://$BUCKET_NAME.s3.$REGION.amazonaws.com"
# Create local directory for hub files
mkdir -p $LOCAL_DIR
# Create hub.txt file
cat <<EOL > $LOCAL_DIR/hub.txt
hub $HUB_NAME
shortLabel My Track Hub
longLabel My Custom UCSC Track Hub
genomesFile genomes.txt
email $EMAIL
EOL
# Create genomes.txt file
cat <<EOL > $LOCAL_DIR/genomes.txt
genome $GENOME
trackDb trackDb.txt
EOL
# Initialize trackDb.txt file
TRACKDB="$LOCAL_DIR/trackDb.txt"
echo "" > $TRACKDB
# List files from S3 bucket and generate tracks based on file extensions
aws s3 ls s3://$BUCKET_NAME/ --recursive | while read -r line;
do
# Extract file name and path
file_path=$(echo $line | awk '{print $4}')
file_name=$(basename "$file_path")
# Skip non-data files (you can modify this based on the file types you care about)
if [[ ! $file_name =~ \.(bw|bed|bam)$ ]]; then
continue
fi
# Get the file type from the extension
if [[ $file_name == *.bw ]]; then
track_type="bigWig"
additional_settings="autoscale on
maxHeightPixels 80:80:80"
elif [[ $file_name == *.bed ]]; then
track_type="bed"
additional_settings=""
elif [[ $file_name == *.bam ]]; then
track_type="bam"
additional_settings=""
else
continue
fi
# Generate track entry
# Strip the extension for track name
track_name=$(basename "$file_name" | sed 's/\.[^.]*$//')
bigDataUrl="$PUBLIC_URL/$file_path"
# Add the track entry to trackDb.txt
cat <<EOL > $TRACKDB
track $track_name
type $track_type
shortLabel $track_name
longLabel $track_name
visibility full
bigDataUrl $bigDataUrl
$additional_settings
EOL
done
# Upload generated hub files to the S3 bucket
aws s3 cp $LOCAL_DIR/hub.txt $S3_HUB_PATH --acl public-read
aws s3 cp $LOCAL_DIR/genomes.txt $S3_HUB_PATH --acl public-read
aws s3 cp $LOCAL_DIR/trackDb.txt $S3_HUB_PATH --acl public-read
# Output the UCSC track hub URL
echo "Track Hub URL: $PUBLIC_URL/hub/hub.txt"
# Clean up temporary files
rm -rf $LOCAL_DIR