GeneProf Manual

Florian Halbritter, 27-Nov-2012

Table of Contents

Chapter: Introduction

Chapter: Concepts Explained

Chapter: Tutorials

Chapter: Modules

Chapter: Pages

Chapter: Advanced Topics

Chapter: Frequently Asked Questions (FAQ)


What is GeneProf?

GeneProf is a web-based, graphical software suite that allows users to analyze data produced using next-generation sequencing platforms.
Some of GeneProf's highlights include:

System Requirements

GeneProf does not require you to install any additional software. A public instance of the software can be accessed via your web browser. GeneProf makes use of modern browser technologies based on JavaScript to provide you with a rich, user-friendly interface. This technology is supported in all modern web browsers, but might need to be enabled (mostly, it's enabled by default). Please refer to this tutorial from Google to find out how to enable JavaScript in your browser.
At the moment, GeneProf has been tested on these browsers:
ff_logoFirefox 3.x+(this is the recommended browser)
gc_logoChrome (any version)
ie_logoInternet Explorer 7+
sf_logoSafari 5+
op_logoOpera 11+
We also recommend a screen resolution of at least 1024x768 pixels for optimal use of GeneProf. A higher resolution will provide for an even better user experience.
Additional system requirements apply if you intend to install GeneProf locally. Please refer to Section 'Installing GeneProf Locally' for more information.

How can GeneProf help me with my research?

GeneProf has been designed to facilitate biological research across a number of fields. Anyone interested in high-throughput sequencing or using it as a tool for their own research might benefit from this software. At the moment, GeneProf focuses on transcriptomic (gene expression, miRNAs, etc.) and regulatory (transcription factor binding, gene regulation, etc.) data, but we are working on expanding GeneProf's scope further. Also, researchers dealing with other kinds of sequencing data might find GeneProf's modular workflows (cp. Concept 'Workflows') useful to address parts of the analysis issues they encounter (e.g. for quality control and alignment).
GeneProf might be used in two ways:

Who is behind GeneProf?

The GeneProf web application and computational back-end were designed and implemented by Florian Halbritter under supervision of Simon Tomlinson as part of his PhD project at the Institute for Stem Cell Research / MRC Centre for Regenerative Medicine, University of Edinburgh, UK.
Additional funding was provided to Simon Tomlinson from the EU FP7 project EuroSyStem to support the work. Additional contributions to the project were from Harsh J. Vaidya (data analysis) funded by Eurosystem and Duncan R. I. Godwin (software development and data analysis) funded by the MRC. Florian Halbritter has been funded by the Medical Research Council (MRC).
Centre for Regenerative Medicine
The University of Edinburgh
Medical Research Council

How to cite?

If you use GeneProf for your data analysis and publish your research, please cite the following paper:
Halbritter F, Vaidya HJ, Tomlinson SR. GeneProf: analysis of high-throughput sequencing experiments. Nature Methods 9, 7-8 (2011).
If you use GeneProf as a data resource or if you're using the GeneProf web services please cite:
Halbritter F, Kousa AI, Tomlinson SR. GeneProf data: a resource of curated, integrated and reusable high-throughput genomics experiments. Nucleic Acids Research (2013).
In your paper, you can link to your GeneProf experiment using its permalink (cp. this page) and / or accession number (provided that you have made the experiment public (cp. SubConcept 'Finalization and Publication').
Additionally, you should cite references for any modules that are based on externally-developed algorithms, programs and resources that have been used in your workflows in your Materials & Methods section. Please refer to the modules' documentation (cp. Chapter 'Modules') for the correct references.

Getting Started..

To get started using GeneProf we recommend having a look at the GeneProf tutorials (cp. Chapter 'Tutorials'). In particular the tutorials on how to make find public data available in GeneProf's database (cp. Tutorial 'Examining Public Next-Gen Data..') and on how to get started with your own experiment (cp. Tutorial 'How to Create a GeneProf Exper..') might be a good start. If you are unclear about any of the terms or concepts used in this document or anywhere in GeneProf, you might also like to take a look at the next chapter (cp. Chapter 'Concepts Explained'), in which we try to explain some of the basics.

Reporting Bugs and Feature Requests

While we've made every effort to make GeneProf as good as possible, it's unfortunately not always possible to find (and resolve) every little fault in advance. If you encounter a problem using GeneProf, please consider submitting a report to the bug tracker system to make us aware of the issue and we'll attempt to solve it as quickly as possible.
Have a great idea how to improve GeneProf, missing an important feature or completely lost? You can use the same system to suggest new features, ask questions, suggest collaborations or just to get in touch with the developers. However, please take a moment to read the following information before you submit any report (also refer to the documentation of the bug tracker page, cp. this page):

Natively Supported Organisms

Currently, the organisms listed below are natively supported by GeneProf, i.e. our team provides up-to-date gene and genome reference datasets for them (cp. SubConcept 'Reference Data'). However, this does not mean that you cannot use GeneProf with other organisms, because you can easily create your own reference data (cp. Tutorial 'Creating a Custom Reference Set').
EnsemblPlants 12 A. thaliana Genes, TAIR10 AssemblyEnsembl 71 Cow Genes, UMD3.1 AssemblyEnsembl59 C. elegans Genes, WS210 AssemblyEnsembl59 Fruitfly Genes, BDGP5.13 AssemblyEnsembl59 Zebrafish Genes, Zv8 Assembly
EnsemblPlants 12 A. thaliana Genes, TAIR10 AssemblyEnsembl 71 Cow Genes, UMD3.1 AssemblyEnsembl59 C. elegans Genes, WS210 AssemblyEnsembl59 Fruitfly Genes, BDGP5.13 AssemblyEnsembl59 Zebrafish Genes, Zv8 Assembly
Ensembl59 Chicken Genes, WASHUC2 AssemblyEnsembl 59 Human Genes, GRCh37 Assembly Ensembl 58 Mouse Genes, NCBIM37 AssemblyEnsemblPlants 12 Rice Genes, MSU6 AssemblyEnsembl59 Rat Genes, RGSC3.4 Assembly
Ensembl59 Chicken Genes, WASHUC2 AssemblyEnsembl 59 Human Genes, GRCh37 Assembly Ensembl 58 Mouse Genes, NCBIM37 AssemblyEnsemblPlants 12 Rice Genes, MSU6 AssemblyEnsembl59 Rat Genes, RGSC3.4 Assembly
Ensembl59 Yeast Genes, EF2 AssemblyEnsembl 66 Pig Genes, Sscrofa9 Assembly
Ensembl59 Yeast Genes, EF2 AssemblyEnsembl 66 Pig Genes, Sscrofa9 Assembly
That amounts to a total of 355,061 genes (and other transcriptional features) from 12 organisms, or about 9,848,776 data points, obtained mining publicly available data from the Ensembl projects.

Public Data Summary

The repository of public data in GeneProf currently boasts (as of Mar 24, 2017).
Experiments by Organism (non-exclusive)Experiments by Platform / Technology (non-exclusive)Browser Tracks by Category


The GeneProf logo (below) is Copyright (c) Mei Sze Lam, 2009-2014.
GeneProf Logo
GeneProf makes use of a number of open-source or otherwise free software packages, tools and concepts (in no particular order):
JavaScript, AJAX & CSS libraries:
(Web) Programming & Database:
Conceptual Influences & Miscellaneous:
GeneProf Modules:
We thank the authors of the respective products and all others that contributed to this project.
Forgot you in the list of acknowledgements? Dreadful sorry! Drop us a line!


[1] Bailey, TL and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol (1994). PMID: 7584402.
[2] Bolstad, BM and Irizarry, RA and Astrand, M and Speed, TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (2003). PMID: 12538238.
[3] Mortazavi, A and Williams, BA and McCue, K and Schaeffer, L and Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods (2008). PMID: 18516045.
[4] Chen, X and Xu, H and Yuan, P and Fang, F and Huss, M and Vega, VB and Wong, E and Orlov, YL and Zhang, W and Jiang, J and Loh, YH and Yeo, HC and Yeo, ZX and Narang, V and Govindarajan, KR and Leong, B and Shahab, A and Ruan, Y and Bourque, G and Sung, WK and Clarke, ND and Wei, CL and Ng, HH. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell (2008). PMID: 18555785.
[5] Jothi, R and Cuddapah, S and Barski, A and Cui, K and Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. (2008). PMID: 18684996.
[6] Zhang, Y and Liu, T and Meyer, CA and Eeckhoute, J and Johnson, DS and Bernstein, BE and Nusbaum, C and Myers, RM and Brown, M and Li, W and Liu, XS. Model-based analysis of ChIP-Seq (MACS). Genome Biol. (2008). PMID: 18798982.
[7] Durinck, S and Bullard, J and Spellman, PT and Dudoit, S. GenomeGraphs: integrated genomic data visualization with R. BMC Bioinformatics (2009). PMID: 19123956.
[8] Langmead, B and Trapnell, C and Pop, M and Salzberg, SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. (2009). PMID: 19261174.
[9] Trapnell, C and Pachter, L and Salzberg, SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009). PMID: 19289445.
[10] Robinson, MD and McCarthy, DJ and Smyth, GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (2010). PMID: 19910308.
[11] Ouyang, Z and Zhou, Q and Wong, WH. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc. Natl. Acad. Sci. U.S.A. (2009). PMID: 19995984.
[12] Quinlan, AR and Hall, IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (2010). PMID: 20110278.
[13] Young, MD and Wakefield, MJ and Smyth, GK and Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. (2010). PMID: 20132535.
[14] Xu, H and Handoko, L and Wei, X and Ye, C and Sheng, J and Wei, CL and Lin, F and Sung, WK. A signal-noise model for significance analysis of ChIP-seq with negative control. Bioinformatics (2010). PMID: 20371496.
[15] Anders, S and Huber, W. Differential expression analysis for sequence count data. Genome Biol. (2010). PMID: 20979621.
[16] Halbritter, F and Vaidya, HJ and Tomlinson, SR. GeneProf: analysis of high-throughput sequencing experiments. Nat. Methods (2011). PMID: 22205509.