Florian Halbritter, 27-Nov-2012
Table of Contents
Introduction
What is GeneProf?
GeneProf is a web-based, graphical software suite that allows users to analyze data produced using next-generation sequencing platforms.
Some of GeneProf's highlights include:
- Easy-to-use web-based interface:Access your data at any time from any reasonably modern computer with a working internet connection -- no need to install any software! (cp. Section 'System Requirements').
- Analysis wizards make your life easy:Pre-defined, step-by-step workflows make it possible for anybody to analyse their short read data in a minimum of hands-on time. (cp. SubConcept 'Analysis Wizards').
- Versatile modules:Advanced users and data analysis experts benefit from GeneProf's broad range of analysis modules, which can be combined freely a minimum of hands-on time (cp. Concept 'Workflows').
- Integrated Analysis:Analysis ofChIP-seq and RNA-seq data in one place, plus support for the integration of other external data (e.g. from microarrays).
- Comprehensive Resource:GeneProf provides a comprehensive resource of analyzed next-generation sequencing data. Experimental results can be easily accessed and compared and the analysis procedures employed to produce the data are fully transparent and can be accessed readily (cp. Tutorial 'Examining Public Next-Gen Data..').
- Extensibility:Algorithm developers and computer programmers can develop their own modules and extend the functionality of GeneProf. Existing software can be easily wrapped and integrated in the GeneProf framework (cp. Section 'Module Development: Adding new..') and data from GeneProf may be used externally (cp. Section 'Web API: Retrieving Data from ..').
System Requirements
GeneProf does not require you to install any additional software. A public instance of the software can be accessed
via your web browser. GeneProf makes use of modern browser technologies based on
JavaScript to provide you with a rich, user-friendly interface. This technology is supported in all modern web browsers, but might need to be enabled (mostly, it's enabled by default). Please refer to
this tutorial from Google to find out how to enable
JavaScript in your browser.
At the moment, GeneProf has been tested on these browsers:
 | Firefox 3.x+ | (this is the recommended browser) |
 | Chrome (any version) | |
 | Internet Explorer 7+ | |
 | Safari 5+ | |
 | Opera 11+ | |
We also recommend a screen resolution of at least 1024x768 pixels for optimal use of GeneProf. A higher resolution will provide for an even better user experience.
How can GeneProf help me with my research?
GeneProf has been designed to facilitate biological research across a number of fields. Anyone interested in high-throughput sequencing or using it as a tool for their own research might benefit from this software. At the moment, GeneProf focuses on transcriptomic (gene expression, miRNAs, etc.) and regulatory (transcription factor binding, gene regulation, etc.) data, but we are working on expanding GeneProf's scope further. Also, researchers dealing with other kinds of sequencing data might find GeneProf's modular workflows (cp.
Concept 'Workflows') useful to address parts of the analysis issues they encounter (e.g. for quality control and alignment).
GeneProf might be used in two ways:
Who is behind GeneProf?
Additional funding was provided to Simon Tomlinson from the EU FP7 project
EuroSyStem to support the work. Additional contributions to the project were from Harsh J. Vaidya (data analysis) funded by Eurosystem and Duncan R. I. Godwin (software development and data analysis) funded by the MRC. Florian Halbritter has been funded by the
Medical Research Council (MRC).
Centre for Regenerative Medicine
The University of Edinburgh
EuroSyStem
Medical Research Council
How to cite?
If you use GeneProf for your data analysis and publish your research, please cite the following paper:
Halbritter F, Vaidya HJ, Tomlinson SR. GeneProf: analysis of high-throughput sequencing experiments. Nature Methods 9, 7-8 (2012).
Additionally, you should cite references for any modules that are based on externally-developed algorithms, programs and resources that have been used in your workflows in your
Materials & Methods section. Please refer to the modules' documentation (cp.
Chapter 'Modules') for the correct references.
Getting Started..
Reporting Bugs and Feature Requests
While we've made every effort to make GeneProf as good as possible, it's unfortunately not always possible to find (and resolve) every little fault in advance. If you encounter a problem using GeneProf, please consider submitting a report to the
bug tracker system to make us aware of the issue and we'll attempt to solve it as quickly as possible.
Have a great idea how to improve GeneProf, missing an important feature or completely lost? You can use the same system to suggest new features, ask questions, suggest collaborations or just to get in touch with the developers. However, please take a moment to read the following information before you submit any report (also refer to the documentation of the bug tracker page, cp.
this page):
- Bug Report:
- In general, please provide as much information about the problem at hand as possible. Please also take a moment to check whether anybody else has previously reported the same issue and, if so, do not post a new report, but support the previous report by adding a note to it.
- Front-end / web-application issues: On which page did you encounter the problem? Which operating system and web browser do you use? Did you try to repeat the same step and do you encounter the error every time?
- Back-end / data analysis issues: Which module caused the problem? What sort of input data and parameter settings did you use? Did you create the workflow 'by hand' or did you use a wizard (if so, which?)?
- Has GeneProf given you an error message or an error type?
- If your report contains sensitive information (e.g. if it appears to be closely related to your research data) make the report private, otherwise allow access to all users.
- If possible, allow administrators to check your log files / history to discover the source of the problem. If you do not agree to this, we will not have a look at any of your private data or records, but it might be difficult to track down errors.
- Feature Request:
- Firstly: Great! We like to hear about new ideas!
- Please take a moment to check existing feature requests before you issue a new one. If there already is one for a similar feature, consider supporting this request by adding your comments to the other report.
- Describe what you had in mind in as much detail as possible.
- Why do you think this is an important feature for GeneProf to have?
- Any ideas how to go about it?
- Other:
- This category of reports takes anything that doesn't fall under the headings 'bug' or 'feature' :-).
- Use this report type to ask questions, suggest collaborations, arrange a tutorial session at your department, or just get in touch about any subject you like.
- However, bare in mind that -- unless you un-tick the corresponding checkbox -- your message will be public and all registered GeneProf users will be able to see it.
-
Natively Supported Organisms
Currently, the organisms listed below are natively supported by GeneProf, i.e. our team provides up-to-date gene and genome
reference datasets
for them (cp.
SubConcept 'Reference Data'). However, this does not mean that you cannot use GeneProf with other organisms, because you can easily create your own reference data (cp.
Tutorial 'Creating a Custom Reference Set').
 |  |  |  |  |
| Ensembl 58 Mouse Genes, NCBIM37 Assembly | Ensembl 59 Human Genes, GRCh37 Assembly | Ensembl 66 Pig Genes, Sscrofa9 Assembly | Ensembl59 C. elegans Genes, WS210 Assembly | Ensembl59 Chicken Genes, WASHUC2 Assembly |
 |  |  |  |  |
| Ensembl59 Fruitfly Genes, BDGP5.13 Assembly | Ensembl59 Rat Genes, RGSC3.4 Assembly | Ensembl59 Yeast Genes, EF2 Assembly | Ensembl59 Zebrafish Genes, Zv8 Assembly | EnsemblPlants 12 A. thaliana Genes, TAIR10 Assembly |
 | | | | |
| EnsemblPlants 12 Rice Genes, MSU6 Assembly | | | | |
That amounts to a total of
330,445 genes (and other transcriptional features) from
11 organisms, or about
9,536,857 data points, obtained mining publicly available data from the
Ensembl projects.
Public Data Summary
The repository of public data in GeneProf currently boasts 140 public experiments and 8,372 public datasets (as of 19-Jun-2013).
 |  |  |
| Experiments by Organism (non-exclusive) | Experiments by Platform / Technology (non-exclusive) | Browser Tracks by Category |
Acknowledgements
The GeneProf logo (below) is Copyright (c) Mei Sze Lam, 2009-2013.
GeneProf makes use of a number of open-source or otherwise free software packages, tools and concepts (in no particular order):
JavaScript, AJAX & CSS libraries:
(Web) Programming & Database:
Statistics:
Conceptual Influences & Miscellaneous:
GeneProf Modules:
- BEDTools: intersectBed [12], URL: BEDTools Homepage
- Assign TFBS to Genes / Put Aligned Reads into Bins / Map Regions to Genes / Basic Feature Annotations Filter / Basic Genomic Region Annotations Filter / Basic Sequence Annotations Filter / Calculate Additional Columns / Calculate Additional Columns (Region Data) / Complex Feature Annotations Filter / Complex Genomic Region Annotations Filter / Complex Sequence Annotations Filter / Extract Sequences from Regions / Merge Genomic Region Data / Merge Sequence Data / Modify Genomic Regions / Modify and Filter Sequences / Separate Mate Sequences / Split Sequences into Mate Pairs / Feature Annotations Parser / Bowtie Output Parser / Bowtie Output Parser (Mate-Paired) / CCAT Peaks Parser / ChIPSeqPeakFinder Output Parser / MACS Peaks.xls Parser / CisGenome 1-sample .cod Parser / CisGenome 2-sample .cod Parser / Genomic Region Parser (BED, TXT, TSV) / FASTA Parser / FASTQ Parser / FASTQ Paired-End Parser / FASTQ Paired-End Parser (2 Files) / Raw Sequence Parser / Add Annotations to Reference / Define a new Reference Set / Quantitate Gene Expression / Differential Expression by Fold Change / Differential Expression by Fold Change (for Region Data) / Random Sample of Features / Random Sample of Genomic Regions / Random Sample of Sequences / Select Regions for Regions / Select Regions for Sequences / Select Sequences for Regions / Select Sequences for Sequences / Gene Expression Summary / General Genomic Region Statistics / General Sequence Statistics / ChIP-seq Peak Summary [16]
- Find Peaks with CCAT [14], URL: CCAT Software Homepage @ GIS
- Find Peaks with ChIPSeqPeakFinder [4,16], URL: ChIPSeqPeakFinder Homepage @ GIS
- Find Peaks with MACS [6], URL: MACS Homepage
- MEME Motif Discovery [1], URL: MEME Homepage
- Find Peaks with SISSRs v1.4 [5,16], URL: SISSRs Homepage
- Calculate TFAS [11,16]
- FASTX Toolkit: Artifacts Filter / FASTX Toolkit: Clip Adapter Sequences / FASTX Toolkit: Reverse Complement , URL: FASTX-Toolkit Homepage
- Align against DNA with Bowtie (v0.12.3) / Align against cDNA with Bowtie (v0.12.3) / Align against Sequences with Bowtie (v0.12.3) [8,16], URL: Bowtie Homepage
- TopHat 1.2 Alignment [9,16], URLs: TopHat Homepage, Picard Tools (for SAM parsing)
- Quality Control + Bowtie Iterative Trimming Alignment / Quality Control + Bowtie Alignment [16,8], URL: Bowtie Homepage
- Quality Control + Tophat 1.20 Alignment [16,9], URLs: TopHat Homepage, Picard Tools (for SAM parsing)
- MACS + Gene Association + Statistics [6,16], URL: MACS Homepage
- SAM/BAM Region Parser [16], URL: Picard Project
- SRA File Parser [16], URL: SRA Homepage
- DESeq / DESeq (for Region Data) [15,16], URL: DESeq @ BioConductor
- EdgeR / EdgeR (for Region Data) [10,16], URL: edgeR @ BioConductor
- GOSeq Enrichment Analysis [13], URL: goseq Bioconductor Page
- Quantile Normalization [2]
We thank the authors of the respective products and all others that contributed to this project.
Forgot you in the list of acknowledgements? Dreadful sorry!
Drop us a line!
Bibliography
[1] Bailey, TL and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers.
Proc Int Conf Intell Syst Mol Biol (1994). PMID:
7584402.
[2] Bolstad, BM and Irizarry, RA and Astrand, M and Speed, TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.
Bioinformatics (2003). PMID:
12538238.
[3] Mortazavi, A and Williams, BA and McCue, K and Schaeffer, L and Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq.
Nat. Methods (2008). PMID:
18516045.
[4] Chen, X and Xu, H and Yuan, P and Fang, F and Huss, M and Vega, VB and Wong, E and Orlov, YL and Zhang, W and Jiang, J and Loh, YH and Yeo, HC and Yeo, ZX and Narang, V and Govindarajan, KR and Leong, B and Shahab, A and Ruan, Y and Bourque, G and Sung, WK and Clarke, ND and Wei, CL and Ng, HH. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells.
Cell (2008). PMID:
18555785.
[5] Jothi, R and Cuddapah, S and Barski, A and Cui, K and Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data.
Nucleic Acids Res. (2008). PMID:
18684996.
[6] Zhang, Y and Liu, T and Meyer, CA and Eeckhoute, J and Johnson, DS and Bernstein, BE and Nusbaum, C and Myers, RM and Brown, M and Li, W and Liu, XS. Model-based analysis of ChIP-Seq (MACS).
Genome Biol. (2008). PMID:
18798982.
[7] Durinck, S and Bullard, J and Spellman, PT and Dudoit, S. GenomeGraphs: integrated genomic data visualization with R.
BMC Bioinformatics (2009). PMID:
19123956.
[8] Langmead, B and Trapnell, C and Pop, M and Salzberg, SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
Genome Biol. (2009). PMID:
19261174.
[9] Trapnell, C and Pachter, L and Salzberg, SL. TopHat: discovering splice junctions with RNA-Seq.
Bioinformatics (2009). PMID:
19289445.
[10] Robinson, MD and McCarthy, DJ and Smyth, GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Bioinformatics (2010). PMID:
19910308.
[11] Ouyang, Z and Zhou, Q and Wong, WH. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells.
Proc. Natl. Acad. Sci. U.S.A. (2009). PMID:
19995984.
[12] Quinlan, AR and Hall, IM. BEDTools: a flexible suite of utilities for comparing genomic features.
Bioinformatics (2010). PMID:
20110278.
[13] Young, MD and Wakefield, MJ and Smyth, GK and Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias.
Genome Biol. (2010). PMID:
20132535.
[14] Xu, H and Handoko, L and Wei, X and Ye, C and Sheng, J and Wei, CL and Lin, F and Sung, WK. A signal-noise model for significance analysis of ChIP-seq with negative control.
Bioinformatics (2010). PMID:
20371496.
[15] Anders, S and Huber, W. Differential expression analysis for sequence count data.
Genome Biol. (2010). PMID:
20979621.
[16] Halbritter, F and Vaidya, HJ and Tomlinson, SR. GeneProf: analysis of high-throughput sequencing experiments.
Nat. Methods (2011). PMID:
22205509.