GeneProf Web Services

The GeneProf Web Services enable programmatic access to the public data stored in GeneProf's databases via a simple web service API. GeneProf is a web-based data analysis for RNA-seq and ChIP-seq experiments that is coupled with a database of ready-analysed experiments. As such, GeneProf constitutes a rich and growing, high-quality resource for information about gene expression and regulation and the GeneProf web services further allow computational biologists and software developers to utilise these data from external software and websites.

Check out the introductory chapter of the manual for more information about GeneProf! You might also like to have a look at the 'Concepts Explained' chapter in order to better understand what we mean by the terms 'experiments', 'datasets', 'workflows' and the like. There's also a video tutorial demonstrating the use of GeneProf web services for loading data into R.

Examples below!

Overview:

General Usage Instructions

The GeneProf WebAPI services are RESTful web services, exposing much of GeneProf's databases via a set of URLs with a fixed pattern. Each URL pattern may contain a number of required parameters, which are given as part of the path (indicated in curly brackets in the URLs listed below, e.g. {ID}), and a number of optional parameters which may be passed into the request as query parameters (e.g. ..?opt-param1=true&opt-param2=true). Most web services can return the results in either XML or JSON format.

Some web services expect the ID of a GeneProf reference dataset as one input parameter. These services accept a number of (more memorable) aliases for easier use. You may use any of these aliases in place of the actual dataset IDs, as you please:

AliasDataset ID
arabidopsispub_at_ensp12_tair10
atpub_at_ensp12_tair10
cepub_ce_ens59_ws210
celeganspub_ce_ens59_ws210
chickpub_gg_ens59_washuc2
chickenpub_gg_ens59_washuc2
daniopub_dr_ens59_zv8
dmpub_dm_ens59_bdgp5_13
dmelpub_dm_ens59_bdgp5_13
drpub_dr_ens59_zv8
drosphilapub_dm_ens59_bdgp5_13
efpub_sc_ens59_ef2
fruitflypub_dm_ens59_bdgp5_13
ggpub_gg_ens59_washuc2
hspub_hs_ens59_grch37
hsapienspub_hs_ens59_grch37
humanpub_hs_ens59_grch37
mmpub_mm_ens58_ncbim37
mmusculuspub_mm_ens58_ncbim37
mousepub_mm_ens58_ncbim37
ospub_os_ensp12_mus6
pigss_ens66_sscrofa9
ratpub_rn_ens59_rgsc3_4
ricepub_os_ensp12_mus6
rnpub_rn_ens59_rgsc3_4
ssss_ens66_sscrofa9
sscrofass_ens66_sscrofa9
tairpub_at_ensp12_tair10
yeastpub_sc_ens59_ef2
zebrafishpub_dr_ens59_zv8

If you're using the GeneProf web services in your public-facing tools or websites, we'd kindly ask you to acknowledge GeneProf by including one of the following images with a link back to the GeneProf homepage in the appropriate place:

If you're using the GeneProf web services in conjunction with a publication, please cite the original GeneProf paper:

, , . GeneProf: analysis of high-throughput sequencing experiments. Nature Methods 9(1), 7-8 (). PubMed ID: 22205509, Full Article @ Nature Methods.

GeneProf Web Services (WADL)

Most of the web services listed below can be used without any need to register for a GeneProf account, however, using a GeneProf WebAPI key (see below) will give you additional access to your private data stored in GeneProf.

To obtain an API key, you will need to sign up for a (free!) GeneProf user account and obtain a WebAPI key to use these services. After signing into your new user account,you can get one from your user profile page.

When you issue new API keys there are a number of points of concern to you: The API keys are meant to identify you to GeneProf. Each key is a rather long (128 symbols), random string of symbols, which is quite hard to guess, but not impossible per se. Thus, in order to avoid any misuse, keys are usually only valid for a limited period of time (1 day by default, but you can change this as you see fit). Also, by default an API key will only give you access to the public data in GeneProf, but, again, you may change this, if you need to access your own, private data. Furthermore, the number of requests you may send to the API is limited. At the moment, you are allowed to sent up to 250 request per day, but we may increase this in future (if feasible).

List of GeneProf Experiments (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/exp/list.{FORMAT}
Summary: Use this web service to retrieve a list of GeneProf experiments.
Full Description: 'Experiments' are what GeneProf calls each individual data analysis project. An experiment typically consists of a set of input data (e.g. raw high-throughput sequencing reads), some experimental sample annotation, an analysis workflow and a selection of main outputs. Please check the manual for further information about experiments. This web service simply retrieves a list of all the experiments available in the database along with a range of metadata.
Required URL parameters:
{FORMAT} The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the experiment metadata and does not support any of the additional output parameters (with-ats, with-samples, etc.)!
Additional query parameters:
Parameter Default Value Description
with-ats false Include descriptions for all datasets' annotation types (data columns).
with-samples false Include information about the sample annotation per experiment.
with-inputs false Include a listing of all input datasets per experiment.
with-outputs false Include a listing of the main output datasets per experiment.
with-workflow false Include the analysis workflow per experiment.
with-all-data false Include ALL datasets linked with the experiment (large response!)
only-user-experiments false List only experiments owned by the user identified by the WebAPI key.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve a concise list of all experiments as XML: http://www.geneprof.org/GeneProf/api/exp/list.xml
Retrieve a list of all experiments including their main outputs as JSON: http://www.geneprof.org/GeneProf/api/exp/list.json?with-outputs=true
Retrieve a concise list of all experiments as plain text: http://www.geneprof.org/GeneProf/api/exp/list.txt
Retrieve a concise list of all experiments for R http://www.geneprof.org/GeneProf/api/exp/list.rdata

Metadata about a GeneProf Experiment (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/exp/{ID}.{FORMAT}
Summary: Use this web service to retrieve metadata (names, descriptions, IDs, references, etc) about GeneProf experiments.
Full Description: 'Experiments' are what GeneProf calls each individual data analysis project. An experiment typically consists of a set of input data (e.g. raw high-throughput sequencing reads), some experimental sample annotation, an analysis workflow and a selection of main outputs. Please check the manual for further information about experiments. This web service retrieves metadata about a specific GeneProf experiment given the experiment's accession ID (a string of the form gpXP_XXXXXX).
Required URL parameters:
Parameter Description
{ID} The identifier of the experiment of interest. Either the entire accession ID (e.g. gpXP_000003) or just the numeric part (e.g. 3).
{FORMAT} The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the experiment metadata and does not support any of the additional output parameters (with-ats, with-samples, etc.)!
Additional query parameters:
Parameter Default Value Description
with-ats false Include descriptions for all datasets' annotation types (data columns).
with-samples false Include information about the sample annotation per experiment.
with-inputs false Include a listing of all input datasets per experiment.
with-outputs false Include a listing of the main output datasets per experiment.
with-workflow false Include the analysis workflow per experiment.
with-all-data false Include ALL datasets linked with the experiment (large response!)
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve basic metadata about experiment gpXP_000385 as XML: http://www.geneprof.org/GeneProf/api/exp/385.xml
Retrieve metadata including the analysis workflow for experiment gpXP_000023 as JSON: http://www.geneprof.org/GeneProf/api/exp/gpXP_000023.json?with-workflow=true
Retrieve basic metadata about experiment gpXP_000385 as a plain text file: http://www.geneprof.org/GeneProf/api/exp/385.txt

Metadata about a GeneProf Dataset (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/ds/{ID}.{FORMAT}
Summary: Use this web service to retrieve metadata (names, descriptions, IDs, etc) about GeneProf datasets.
Full Description: 'Datasets', in GeneProf, are collections of data of the same type generated as the output of a component of an data analysis workflow. There are six generic types of datasets: FILE, SEQUENCES, GENOMIC_REGIONS, FEATURES, REFERENCE and SPECIAL. Please check the manual for further information about datasets. This web service retrieves metadata about a specific GeneProf dataset given the dataset's accession ID (a string of the form gpDS_XXX_XXX_XXX_XXX).
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
{FORMAT} The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the dataset metadata and does not support any of the additional output parameters (with-ats)!
Additional query parameters:
Parameter Default Value Description
with-ats false Include descriptions for all datasets' annotation types (data columns).
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve metadata about the dataset gpDS_11_385_44_1 as XML: http://www.geneprof.org/GeneProf/api/ds/gpDS_11_385_44_1.xml
Retrieve metadata about the dataset gpDS_11_12_122_1as JSON: http://www.geneprof.org/GeneProf/api/ds/11_12_122_1.json?with-ats=true

List of Public Reference Datasets (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/ds/pubref.{FORMAT}
Summary: Use this web service to retrieve a list of public GeneProf-recommended reference datasets.
Full Description: GeneProf provides a number of recommended reference datasets for several organisms (human, mouse, rat, etc.). These reference datasets provide genomic sequence assemblies and genic annotations that serve as a scaffold for GeneProf's analyses, so most of GeneProf's datasets are based on one of these reference datasets. This web service simply retrieves a list of all the public, recommended reference datasets currently available in the database.
Required URL parameters:
{FORMAT} The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the dataset metadata and misses out some information available in the other formats!
Examples:
Retrieve a list of all reference datasets as XML: http://www.geneprof.org/GeneProf/api/ds/pubref.xml
Retrieve a list of all reference datasets as JSON: http://www.geneprof.org/GeneProf/api/ds/pubref.json
Retrieve a list of all reference datasets as plain text: http://www.geneprof.org/GeneProf/api/ds/pubref.txt

List of Public Experiment Samples (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/gene.info/list.samples/{REF}.{FORMAT}
Summary: Use this web service to retrieve a list of public experiment samples for a GeneProf-recommended reference dataset.
Full Description: All public data in the GeneProf databases has been annotated with the biological sample of origin, described in terms of cell type, tissue, treatment, and so on. This web service simply retrieves a list of all the public sample annotations in the database for a specific reference dataset (see the List Public Reference Datasets service).
Required URL parameters:
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, xml, txt, rdata.
Examples:
Retrieve a list of all samples for mouse as XML: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/mouse.xml
Retrieve a list of all samples for human as JSON: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/human.json
Retrieve a list of all samples for human as tab-delimited text: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/human.txt
Retrieve a list of all samples for mouse as RData: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/human.rdata

Search Genes (XML, JSON, TXT, RDATA)

URL(s):
http://www.geneprof.org/api/search/gene/{QUERY}.{FORMAT}
Summary: Use this web service to search for genes using search terms against the genes' description, name and accession IDs.
Full Description: GeneProf uses well-defined sets of gene annotations based on those from Ensembl. Using this web service, you can search for genes of interest using arbitrarily complex search queries against the names and identifiers (from Ensembl, RefSeq and more) of those genes. The search results are categorised by the reference dataset the genes belong to (also see the List Public Reference Datasets service).
Required URL parameters:
Parameter Description
{QUERY} The search term to look for, e.g. a gene name or paper title. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for genes are: Valid search fields are: id, label, description, type and reference. You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page.
{FORMAT} The file format requests, one of: json, xml, txt, rdata.
Additional query parameters:
Parameter Default Value Description
taxons N/A Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list).
Examples:
Search for all genes matching the query 'sox2' (in XML format): http://www.geneprof.org/GeneProf/api/search/gene/sox2.xml
Search for all genes matching the query 'sox2' (in XML format), but only in human (taxon 9606): http://www.geneprof.org/GeneProf/api/search/gene/sox2.json?taxons=9606
Search for genes matching both terms 'brca2' and 'cancer' and only those with reference 'mouse', in plain text format: http://www.geneprof.org/GeneProf/api/search/gene/brca2 AND cancer AND reference:mouse.txt

Search Experiments (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/search/experiment/{QUERY}.{FORMAT}
Summary: Use this web service to search for experiments using search terms against the experiments name, description and citations.
Full Description: 'Experiments' are what GeneProf calls each individual data analysis project. An experiment typically consists of a set of input data (e.g. raw high-throughput sequencing reads), some experimental sample annotation, an analysis workflow and a selection of main outputs. Please check the manual for further information about experiments. Using this web service, you can search for experiments of interest using arbitrarily complex search queries against the names, descriptions, linked citations, linked reference dataset, and so on of those experiments. The search results are categorised by the reference dataset the experiments belong to (also see the List Public Reference Datasets service).
Required URL parameters:
Parameter Description
{QUERY} The search term to look for, e.g. a gene name or paper title. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for experiments are: Valid search fields are: id, label, description, type, reference, user, dataset, citation, platform and sample. You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page.
{FORMAT} The file format requests, one of: json, xml, txt, rdata.
Additional query parameters:
Parameter Default Value Description
taxons N/A Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list).
Examples:
Search for experiments mentioning 'sox2' anywhere (in XML format): http://www.geneprof.org/GeneProf/api/search/experiment/sox2.xml
Search for experiments mentioning 'cancer' in their description (in JSON format): http://www.geneprof.org/GeneProf/api/search/experiment/Summary:cancer.json
Search for experiments mentioning 'cell stem cell' in a linked citation (in plain text format): http://www.geneprof.org/GeneProf/api/search/experiment/citation:("cell stem cell").txt

Search Datasets (XML, JSON, TXT, RDATA)

URL(s):
http://www.geneprof.org/api/search/dataset/{QUERY}.{FORMAT}
Summary: Use this web service to search for datasets using search terms against the dataset name.
Full Description: 'Datasets', in GeneProf, are collections of data of the same type generated as the output of a component of an data analysis workflow. There are six generic types of datasets: FILE, SEQUENCES, GENOMIC_REGIONS, FEATURES, REFERENCE and SPECIAL. Please check the manual for further information about datasets. Using this web service, you can search for experiments of interest using arbitrarily complex search queries against the names and types of these datasets.
Required URL parameters:
Parameter Description
{QUERY} The search term to look for, e.g. a gene name or cell type. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for samples are: Valid search fields are: id, label, description, datatype, user, experiment .You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page.
{FORMAT} The file format requests, one of: json, xml, txt, rdata.
Additional query parameters:
Parameter Default Value Description
taxons N/A Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list).
Examples:
Search for datasets mentioning 'sox2' (in XML format): http://www.geneprof.org/GeneProf/api/search/dataset/sox2.xml
Search for datasets mentioning 'gene expression': http://www.geneprof.org/GeneProf/api/search/dataset/gene expression.json
Search for genomic data for 'sox2' in plain text format: http://www.geneprof.org/GeneProf/api/search/dataset/datatype:GENOMIC_REGIONS AND sox2.txt

Search Public Samples (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/search/sample/{QUERY}.{FORMAT}
Summary: Use this web service to search for public experiment samples using search terms against their annotations.
Full Description: All public data in the GeneProf databases has been annotated with the biological sample of origin, described in terms of cell type, tissue, treatment, and so on. Using this web service, you can search for samples of interest using arbitrarily complex search queries against the annotations of these samples.
Required URL parameters:
Parameter Description
{QUERY} The search term to look for, e.g. a gene name or cell type. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for samples are: Valid search fields are: id, label, description , Age, Antibody, Cell_Line, Cell_Type, Description, Developmental_Stage, Gender, Gene, Label, Organism, Platform, Sample_Group, SRA_Accession, Strain, Time, Tissue, Treatment. You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page.
{FORMAT} The file format requests, one of: json, xml, txt, rdata.
Additional query parameters:
Parameter Default Value Description
taxons N/A Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list).
Examples:
Search for samples annotated 'ChIP' in any of the default search fields (in XML format): http://www.geneprof.org/GeneProf/api/search/sample/ChIP.xml
Search for samples annotated with the gene 'sox2': http://www.geneprof.org/GeneProf/api/search/sample/Gene:sox2.json
Search for samples annotated 'human' in any of the default search fields in plain text format: http://www.geneprof.org/GeneProf/api/search/sample/human.txt

Get the GeneProf ID of a Gene (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/gp.id/{REF}/{IDTYPE}/{ID}.{FORMAT}
Summary: Use this web service to find out the GeneProf ID of a certain gene.
Full Description: GeneProf uses well-defined sets of gene annotations based on those from Ensembl. Using this web service, you can get the GeneProf-internal ID of any gene in the reference annotation by matching it against an external name (official gene symbol) or one of the supported accession ID types (e.g. Ensembl Gene IDs, RefSeq IDs, etc. -- use the list ID types service to find out which types are supported for a dataset).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{IDTYPE} The identifier of an annotation column storing IDs or the term any to use any available identifier type. Check the list ID types service to find out which types are supported for a dataset.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Examples:
Get the GeneProf ID of the mouse gene with Ensembl ID ENSMUSG00000059552, as plain text: http://www.geneprof.org/GeneProf/api/gene.info/gp.id/mouse/C_ENSG/ENSMUSG00000059552.txt
Get the GeneProf IDs of all human genes with RefSeq ID NM_005657, as JSON: http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_RSEQ/NM_005657.json
Get the GeneProf IDs of all human genes with any ID matching "NM_005657" (should, in this case, be same as the previous query), as XML: http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/any/NM_005657.xml

Get an External ID/Name of a Gene (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/external.id/{REF}/{IDTYPE}/{ID}.{FORMAT}
Summary: Use this web service to translate a GeneProf gene ID into an external identifier or name.
Full Description: GeneProf uses well-defined sets of gene annotations based on those from Ensembl. Using this web service, you can look up an external name (official gene symbol) or one of the supported accession ID types (e.g. Ensembl Gene IDs, RefSeq IDs, etc. -- use the list ID types service to find out which types are supported for a dataset) for any given internal GeneProf gene ID.
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{IDTYPE} The identifier an annotation column storing IDs. Check the list ID types service to find out which types are supported for a dataset.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
quote (empty) For plain text output only: Enclose all IDs in this sort of quote (e.g. double quote: ").
Examples:
Get the Ensembl Gene ID(s) of the mouse gene #715, as plain text: http://www.geneprof.org/GeneProf/api/gene.info/external.id/mouse/715/C_ENSG.txt
Get the RefSeq ID(s) of the human gene #2981, as JSON: http://www.geneprof.org/GeneProf/api/gene.info/external.id/human/2981/C_RSEQ.json
Get the name(s) of the human gene #2981, as XML: http://www.geneprof.org/GeneProf/api/gene.info/external.id/human/2981/C_NAME.xml

List the supported ID types for a Dataset (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/list.id.types/{REF}.{FORMAT}
Summary: Use this web service to list all the ID types available for a dataset.
Full Description: GeneProf reference datasets provide a number of alternative ID annotations (e.g. Ensembl Gene IDs, RefSeq IDs, UniGene IDs, etc.) for each of the genes in the reference annotation. This service simply lists all the ID types available for a dataset.
Required URL parameters:
Parameter Description
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Examples:
List all the ID types for the mouse reference dataset, as plain text: http://www.geneprof.org/GeneProf/api/gene.info/list.id.types/mouse.txt
List all the ID types for the human reference dataset, as JSON: http://www.geneprof.org/GeneProf/api/gene.info/list.id.types/human.json

Get Gene Expression Values for a Gene (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/expression/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve gene expression values for a gene based on public RNA-seq data in the GeneProf databases.
Full description: GeneProf's databases contain many pre-calculated gene expression values stemming from a reanalyses of a large collection of RNA-seq (and similar) experiments. You use this web service to retrieve all the expression values for a single gene of interest by giving the name of the reference dataset the gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. You may retrieve the values either as raw read counts (the total number of short reads that were aligned to the gene's locus), RPM (reads per million -- the raw counts rescaled to account for differences in library size) or RPKM (reads per kilobase million -- like RPM, but also accounting for transcript length bias). All gene expression values have been calculated using the Calculate Gene Expression module. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
type RPKM The type of values to obtain, one of: RAW | RPM | RPKM
with-sample-info false Include additional annotations about the tissue, cell type, etc. of the expression values.
Examples:
Retrieve gene expression values for the mouse gene #715 in JSON format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.json?with-sample-info=true
Retrieve raw read count values for the mouse gene #715 in XML format: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.xml?type=RAW
Retrieve gene expression values for the mouse gene #715 as a tab-delimited text file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.txt?with-sample-info=true
Retrieve gene expression values for the mouse gene #715 as an RData file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.rdata?with-sample-info=true

Get Targets of a Transcription Factor (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/regulation/binary/by.gene/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve putative target genes for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for the targets discovered in all available ChIP-seq experiments (identified by the ID of a gene).
Full description: GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of putative target genes for a transcription factor (TF) or other DNA-binding protein, by giving the name of the reference dataset the TF gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. The assignment of putative target genes to TFs has been done by calling enriched binding peaks on the aligned ChIP-seq reads using MACS and subsequently assigning the peaks to target genes if they were within a permissible window of the transcription start site (as by current wizard default: 20kb up- and 1kb down-stream of the TSS; in an upcoming release of the web service, you will be able to redefine these threshold dynamically, so watch this space!). The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Map Regions to Genes. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). For some TFs there might be more than one dataset available, in which case the output returned by the web service will contain the status in all available datasets (distinguished by the experimental sample they belong to, see list public samples service).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene/feature (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
ats C_NAME A selection of column IDs (from the reference) to be included in the output.
include-unbound false Include not only putative target genes in the output, but also those genes that show now evidence of regulation.
Examples:
Get all the putative targets of the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/mouse/9885.json
Get all the putative targets of the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/human/36958.xml?include-unbound=true
Get all the putative targets of the mouse TF Nanog as tab-delimited text and include a column for gene name and Ensembl ID (there are TWO ChIP-seq datasets available for this TF!): http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/mouse/14899.txt?ats=C_NAME,C_ENSG
Get all the putative targets of the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/human/36958.rdata

Get Targets by Experiment Sample (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/regulation/binary/by.sample/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve putative target genes for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for the targets discovered in a specific ChIP-seq experiment (identified by the ID of a public sample).
Full description: GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of putative target genes for a transcription factor (TF) or other DNA-binding protein (incl. histone modifications), by giving the identifier of a public GeneProf sample -- use the list public samples or the search public samples service to look up these identifiers. The assignment of putative target genes to TFs has been done by calling enriched binding peaks on the aligned ChIP-seq reads using MACS and subsequently assigning the peaks to target genes if they were within a permissible window of the transcription start site (as by current wizard default: 20kb up- and 1kb down-stream of the TSS; in an upcoming release of the web service, you will be able to redefine these threshold dynamically, so watch this space!). The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Map Regions to Genes. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a public sample (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
ats C_NAME A selection of column IDs (from the reference) to be included in the output.
include-unbound false Include not only putative target genes in the output, but also those genes that show now evidence of regulation.
Examples:
Get all the putative targets of the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/mouse/541.json
Get all the putative targets of the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/human/784.xml?include-unbound=true
Get all the putative targets of the mouse TF Smad1 as tab-delimited text and include a column for gene name and Ensembl ID: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/mouse/541.txt?ats=C_NAME,C_ENSG
Get all the putative targets of the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/human/784.rdata

Get TFAS of a Transcription Factor (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/regulation/tfas/by.gene/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve transcription factor association strength (TFAS) scores for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for the data in all available ChIP-seq experiments (identified by the ID of a gene).
Full description: GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of TFAS scores for a transcription factor (TF) or other DNA-binding protein, by giving the name of the reference dataset the TF gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. 'TFAS' (= transcription factor association strength) scores are continuous values that give an indication of how strongly a transcription factor (or other DNA-binding protein) is associated with a target gene. The TFAS is calculated as a function of the intensity and the distance of all binding sites (ChIP-seq peaks) near a gene, for details, please refer to the publication by Ouyang et al. (PubMed: 19995984). We use as an intensity score the fold-change enrichment of the ChIP-seq signal over the control background as calculated by MACS in conjunction with calling peaks for the input ChIP-seq data. The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Calculate TFAS. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). For some TFs there might be more than one dataset available, in which case the output returned by the web service will contain the status in all available datasets (distinguished by the experimental sample they belong to, see list public samples service).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene/feature (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
ats C_NAME A selection of column IDs (from the reference) to be included in the output.
Examples:
Get all TFAS scores for the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/mouse/9885.json
Get all TFAS scores for the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/human/36958.xml?include-unbound=true
Get all TFAS scores the mouse TF Nanog as tab-delimited text and include a column for gene name and Ensembl ID (there are TWO ChIP-seq datasets available for this TF!): http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/mouse/14899.txt?ats=C_NAME,C_ENSG
Get all TFAS scores for the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/human/36958.rdata

Get TFAS by Experiment Sample (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/regulation/tfas/by.sample/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve transcription factor association strength (TFAS) scores for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for data in a specific ChIP-seq experiment (identified by the ID of a public sample).
Full description: GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of TFAS scores for a transcription factor (TF) or other DNA-binding protein, by giving the identifier of a public GeneProf sample -- use the list public samples or the search public samples service to look up these identifiers. 'TFAS' (= transcription factor association strength) scores are continuous values that give an indication of how strongly a transcription factor (or other DNA-binding protein) is associated with a target gene. The TFAS is calculated as a function of the intensity and the distance of all binding sites (ChIP-seq peaks) near a gene, for details, please refer to the publication by Ouyang et al. (PubMed: 19995984). We use as an intensity score the fold-change enrichment of the ChIP-seq signal over the control background as calculated by MACS in conjunction with calling peaks for the input ChIP-seq data. The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Calculate TFAS. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a public sample (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
ats C_NAME A selection of column IDs (from the reference) to be included in the output.
Examples:
Get TFAS scores for the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/mouse/541.json
Get TFAS scores for the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/human/784.xml?include-unbound=true
Get TFAS scores for the mouse TF Smad1 as tab-delimited text and include a column for gene name and Ensembl ID: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/mouse/541.txt?ats=C_NAME,C_ENSG
Get TFAS scores for the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/human/784.rdata

Get Transcription Factors by Target Gene (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/regulation/binary/by.target/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve transcription factors (and other regulatory inputs) putatively targeting a specific gene, based on public ChIP-seq data in the GeneProf databases.
Full description: GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of transcription factors and other DNA-binding proteins that might possible be regulating a gene of interest, by giving the name of the reference dataset the gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. The assignment of putative target genes to TFs has been done by calling enriched binding peaks on the aligned ChIP-seq reads using MACS and subsequently assigning the peaks to target genes if they were within a permissible window of the transcription start site (as by current wizard default: 20kb up- and 1kb down-stream of the TSS; in an upcoming release of the web service, you will be able to redefine these threshold dynamically, so watch this space!). The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Map Regions to Genes. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
with-sample-info false Include additional annotations about the tissue, cell type, etc. of the expression values.
Examples:
Get information about factors putatively targeting gene #715 in JSON format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.json?with-sample-info=true
Get information about factors putatively targeting gene #715 in XML format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.xml?with-sample-info=true
Get information about factors putatively targeting gene #715 as a tab-delimited text file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.txt
Get information about factors putatively targeting gene #715 as an RData file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.rdata?with-sample-info=true

Get TFAS Scores by Target Gene (TXT, XML, JSON, RDATA)

URL(s):
http://www.geneprof.org/api/gene.info/regulation/tfas/by.target/{REF}/{ID}.{FORMAT}
Summary: Use this web service to retrieve transcription factors association scores between transcription factors (and other regulatory inputs) and a specific target gene of interest, based on public ChIP-seq data in the GeneProf databases.
Full description: GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of TFAS scores quantitating the association between transcription factors (TFs) and other DNA-binding proteins and a gene of interest, by giving the name of the reference dataset the gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. 'TFAS' (= transcription factor association strength) scores are continuous values that give an indication of how strongly a transcription factor (or other DNA-binding protein) is associated with a target gene. The TFAS is calculated as a function of the intensity and the distance of all binding sites (ChIP-seq peaks) near a gene, for details, please refer to the publication by Ouyang et al. (PubMed: 19995984). We use as an intensity score the fold-change enrichment of the ChIP-seq signal over the control background as calculated by MACS in conjunction with calling peaks for the input ChIP-seq data. The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Calculate TFAS. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin).
Required URL parameters:
Parameter Description
{ID} The GeneProf ID of a gene (an integer number).
{REF} The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets.
{FORMAT} The file format requests, one of: json, txt, xml, rdata.
Additional query parameters:
Parameter Default Value Description
with-sample-info false Include additional annotations about the tissue, cell type, etc. of the expression values.
Examples:
Get TFAS scores to gene #715 in JSON format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.json?with-sample-info=true
Get TFAS scores to gene #715 in XML format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.xml?with-sample-info=true
Get TFAS scores to gene #715 as a tab-delimited text file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.txt
Get TFAS scores to gene #715 as an RData file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.rdata?with-sample-info=true

Metadata about a User (XML, JSON)

URL(s): http://www.geneprof.org/api/usr/{ID}.{FORMAT}
Summary: Use this web service to retrieve metadata about a GeneProf user (name, email, user experiments, etc.). In the interest of privacy, the service can only be used to retrieve information about yourself.
Full description: This web service retrieves metadata about registered users of GeneProf. Other than personal details (name, email, etc.), this information contains a list of all experiments owned by the user. In order to not jeopardise the privacy of GeneProf users, we have restricted access to this servlet currently only to your own data and you will need an API key to make use of the service.
Required URL parameters:
Parameter Description
{ID} The identifier of the user of interest (works only with your own user ID, in the interest of privacy).
{FORMAT} The file format requests, one of: json, xml.
Additional query parameters:
Parameter Default Value Description
key N/A A valid WebAPI key. Required for this service.
Examples:
Retrieve metadata about yourself as XML: http://www.geneprof.org/GeneProf/api/usr/MY-USER-ID.xml?key=MY-API-KEY

Data as Plain Text Files (TXT)

URL(s):
http://www.geneprof.org/api/data/{ID}.txt
http://www.geneprof.org/api/data/{ID}.txt.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as plain text (optionally compressed as GZIP). Maximum size of datasets without API key = 1,000,000, with API key = unlimited.
Full description: This web service retrieves the entire contents of an arbitrary GeneProf dataset as a tab-delimited, plain text file. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 1,000,000 entries. With an API key, the maximum size is unlimited.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
Parameter Default Value Description
ats (default displayed columns) A selection of column IDs to be included in the output.
sep \t (TAB) Symbol to be used as a column separator. By default, the output will a tab-separated text file.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.txt.gz
Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.txt.gz?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3

Data as Spreadsheets (XLS)

URL(s):
http://www.geneprof.org/api/data/{ID}.xls
http://www.geneprof.org/api/data/{ID}.xls.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as Excel-compatible spreadsheets (optionally compressed as GZIP). Maximum size of datasets without API key = 50,000, with API key = 50,000.
Full description: This web service retrieves the entire contents of an arbitrary GeneProf dataset as a Excel-compatible spreadsheet. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests and due to size restrictions of XLS documents, the maximum size of datasets retrieved is restricted to 50,000 entries.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
Parameter Default Value Description
ats (default displayed columns) A selection of column IDs to be included in the output.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xls.gz
Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xls.gz?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3

Data as XML (XML)

URL(s):
http://www.geneprof.org/api/data/{ID}.xml
http://www.geneprof.org/api/data/{ID}.xml.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as XML (compressed as GZIP). Maximum size of datasets without API key = 1,000,000, with API key = unlimited.
Full description: This web service retrieves the entire contents of an arbitrary GeneProf dataset as a computer-readable XML file. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 1,000,000 entries. With an API key, the maximum size is unlimited.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
Parameter Default Value Description
ats (default displayed columns) A selection of column IDs to be included in the output.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xml.gz
Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xml.gz?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3

Data as R Binary Files (RData)

URL(s): http://www.geneprof.org/api/data/{ID}.rdata
Summary: Use this web service to retrieve data from a GeneProf dataset as binary files that can be loaded into R. Maximum size of datasets without API key = 1,000,000, with API key = 1,000,000.
Full description: This web service retrieves the entire contents of an arbitrary GeneProf dataset as binary R file. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests and due to size limitations, the maximum size of datasets retrieved is restricted to 1,000,000 entries.
These binary files can be loaded into R simply by issuing the command load(FILENAME). Check out the advanced example below to find out how to load data into R directly from the web services.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
Parameter Default Value Description
ats (default displayed columns) A selection of column IDs to be included in the output.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.rdata
Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.rdata?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3

Get Chromosome Names (XML, JSON, TXT, RDATA)

URL(s): http://www.geneprof.org/api/data/chromosome.names/{ID}.{FORMAT}
Summary: Use this web service to retrieve the IDs and names of all chromosomes in a genomic dataset. This service can only be used for genomic datasets, i.e. for datasets with type GENOMIC_REGIONS or REFERENCE.
Full description: The names different genome databases use to refer to chromosomes, even of well-known organisms, are not always the same. For example, the mitochondrial (pseudo-)chromosome is usally called 'chrMT' in Ensembl, but 'chrM' in the UCSC databases. The data as BED and data as WIG services might therefore require you to rename the experiments in the output, before using them with other applications. This web service retrieves the identifiers and names of all chromosomes used in a genomic dataset. You can inspect those and see whether any change will be required.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Get all chromosomes for the mouse reference dataset in plain text format: http://www.geneprof.org/GeneProf/api/data/chromosome.names/pub_mm_ens58_ncbim37.txt
Get all chromosomes for the human reference dataset in JSON format: http://www.geneprof.org/GeneProf/api/data/chromosome.names/pub_hs_ens59_grch37.json
Get the chromosome names from the ChIP-seq peaks dataset gpDS_11_3_7_2 in XML format: http://www.geneprof.org/GeneProf/api/data/chromosome.names/11_3_7_2.xml

Genomic Data as BED Files (BED)

URL(s):
http://www.geneprof.org/api/data/{ID}.bed.gz
http://www.geneprof.org/api/data/{CHROMNAMES}/{ID}.bed.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as BED (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for genomic datasets, i.e. for datasets with type GENOMIC_REGIONS.
Full description: This web service retrieves the entire contents of a genomic GeneProf dataset in BED file format. This will only work for dataset of type GENOMIC_REGIONS, i.e. those containing genomic data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited. N.B. chromosomes in the output BED can be dynamically renamed in order to make the names compatible with other applications (that's because, unfortunately, not all genome databases use the same names, see also the get chromosome names service).
Required and Optional URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
{CHROMNAMES} An optional parameter that may be used to rename chromosomes in the output. The value should be comma-separated map from chromosome ID to its name in the output, where key and value are to be separated with a hyphen (-), e.g. 1-chr1,2-chr2,3-chr12. Any chromosome not mentioned in the map will not be exported, so you can use this as a filtering mechanism, too. Use the Get Chromosome Names service to get a list of all the available chromosome in a dataset with their default names.
Additional query parameters:
Parameter Default Value Description
filter-column N/A The ID of a column / annotation type holding boolean flags. Only entries for which this boolean flag is true will be exported.
with-track-description true Include a track description header.
only-distinct false Export only one entry if there are multiple with the same coordinates.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve ChIP-seq peaks for FoxA1 from dataset gpDS_11_3_7_2: http://www.geneprof.org/GeneProf/api/data/11_3_7_2.bed.gz
Retrieve only the ChIP-seq peaks on chromosome 3 for FoxA1 from dataset gpDS_11_3_7_2: http://www.geneprof.org/GeneProf/api/data/3-chr3/11_3_7_2.bed.gz
Retrieve gene coordinates from the zebrafish reference dataset without a track header: http://www.geneprof.org/GeneProf/api/data/zebrafish.bed.gz?with-track-description=false
Retrieve only the ChIP-seq peaks for Stat3 (identified by the column $C_11_12_125_2_14_TFBS) from a dataset containing peaks for many different factors (gpDS_11_12_125_2): http://www.geneprof.org/GeneProf/api/data/11_12_125_2.bed.gz?filter-column=C_11_12_125_2_14_TFBS

Genomic Data as WIG Files (WIG)

URL(s):
http://www.geneprof.org/api/data/{ID}.wig.gz
http://www.geneprof.org/api/data/{CHROMNAMES}/{ID}.wig.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as WIG (compressed as GZIP). Maximum size of datasets without API key = 1,000,000,000, with API key = unlimited. This service can only be used for genomic datasets, i.e. for datasets with type GENOMIC_REGIONS.
Full description: This web service retrieves the entire contents of a genomic GeneProf dataset in WIG file format. This will only work for dataset of type GENOMIC_REGIONS, i.e. those containing genomic data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 1,000,000,000 entries. With an API key, the maximum size is unlimited. N.B. chromosomes in the output BED can be dynamically renamed in order to make the names compatible with other applications (that's because, unfortunately, not all genome databases use the same names, see also the get chromosome names service).
Required and Optional URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
{CHROMNAMES} An optional parameter that may be used to rename chromosomes in the output. The value should be comma-separated map from chromosome ID to its name in the output, where key and value are to be separated with a hyphen (-), e.g. 1-chr1,2-chr2,3-chr12. Any chromosome not mentioned in the map will not be exported, so you can use this as a filtering mechanism, too. Use the Get Chromosome Names service to get a list of all the available chromosome in a dataset with their default names.
Additional query parameters:
Parameter Default Value Description
with-track-description true Include a track description header.
only-distinct false Include only one entry in the coverage count if there are multiple with the same coordinates.
frag-length -1 The "fragment length" to calculate the coverage with, use -1 to use the actual size of the regions.
bin-size 25 The bin size / resolution of the tracks.
key N/A An optional WebAPI key, required to access non-public data.
Examples:
Retrieve genomic coverage data from a RNA-seq assay of gene expression in human liver gpDS_11_58_16_2: http://www.geneprof.org/GeneProf/api/data/11_58_16_2.wig.gz
Retrieve genomic coverage data from a ChIP-seq experiment for Smad1 (gpDS_11_12_112_2), using only distinct alignments: http://www.geneprof.org/GeneProf/api/data/11_12_112_2.wig.gz?with-track-description=false&only-distinct=true&frag-length=200

Sequence Data as FASTA Files (FASTA)

URL(s): http://www.geneprof.org/api/data/{ID}.fasta.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as FASTA (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for nucleotide sequence datasets, i.e. for datasets with type SEQUENCES.
Full description: This web service retrieves the entire contents of a nucleotide sequence dataset in FASTA format. This will only work for dataset of type SEQUENCES, i.e. those containing sequence data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
key N/A An optional WebAPI key, required to access non-public data.
Example:
Retrieve unprocessed Tag-seq sequence data from gpDS_11_385_6_1: http://www.geneprof.org/GeneProf/api/data/11_385_6_1.fasta.gz

Sequence Data as FASTQ Files (FASTQ)

URL(s): http://www.geneprof.org/api/data/{ID}.fastq.gz
Summary: Use this web service to retrieve data from a GeneProf dataset as FASTA (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for nucleotide sequence datasets, i.e. for datasets with type SEQUENCES.
Full description: This web service retrieves the entire contents of a nucleotide sequence dataset in FASTQ format. This will only work for dataset of type SEQUENCES, i.e. those containing sequence data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited.
Required URL parameters:
Parameter Description
{ID} The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1).
Additional query parameters:
key N/A An optional WebAPI key, required to access non-public data.
Example:
Retrieve unprocessed Tag-seq sequence data from gpDS_11_385_6_1: http://www.geneprof.org/GeneProf/api/data/11_385_6_1.fastq.gz

Advanced Examples

Using GeneProf Data with Pipes and Unix

If you're running a Unix-like operating system, you're probably familiar with the concept of 'piping' the output of one commandline program into another (cp. this Wikipedia article on Unix pipelines). You can use wget to retrieve GeneProf data as a stream like this:

wget -q -O - -F "{URL}"
Important: Mind the quotes (") around the URL! If you leave these out, URL parameters will not be read properly. The output of wget will be reported to the standard output stream, which means you can pipe it into many Unix command or use it with commandline programs that are able to receive input from the standard input.

Here's an example of how to retrieve a FASTQ sequence file, which is then filtered for sequences containing the nucleotides ACTG (in order) and written to a file called filterd-seqs.fq:

wget -q -O - -F "http://www.geneprof.org/GeneProf/api/data/11_119_5_1.fastq.gz" | gunzip | grep -B1 -A2 -h ACTG | sed '/--/d' > filtered-seqs.fq

You can combine the outputs of several web services to achieve more advanced results. For instance, let's first look up the GeneProf ID of the mouse gene with the Ensembl ID ENSMUSG00000024406 (that's the transcription factor (TF) Pou5f1), and then look for all genes that are putatively regulated by this TF across all ChIP-seq datasets available in the GeneProf database and then check for genes from the "Sox" family:

wget -q -O - -F "http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/mouse/`wget -q -O - -F "http://www.geneprof.org/GeneProf/api/gene.info/gp.id/mouse/C_ENSG/ENSMUSG00000024406.txt"`.txt?include-unbound=true" | grep -i "sox"

Hint: Of course, you're not limited at all to using standard Unix commands! There's nothing holding you back from making use of great bioinformatics commandline tools such as, for example, Biopieces, FASTX-Toolkit or BEDTools.

Using GeneProf Data in R

Hint: Check out the related GeneProf screencast on YouTube!

Image loading..

Another use of GeneProf's web services is for loading data into R. Many types of data can be exported directly as a binary RData objects -- which can be easily loaded into an existing R session like this:

url.con <- url(description='URL-GOES-HERE')
load(url.con)
close(url.con)

Let's put this into a function (called loadGeneProfData) for even easier use:

loadGeneProfData <- function(dl.url) { require(RCurl); url.con <- url(description=dl.url); load(url.con); close(url.con); geneprof.data }

We can use this to write a function that combines two web service calls to look up the GeneProf Gene ID for a gene symbol and then to retrieve gene expression data for this gene:

getExpression <- function(gene.symbol,ref='mouse',sample.info=F) {
 url <- paste('http://www.geneprof.org/GeneProf/api/gene.info/gp.id/', ref, '/C_NAME/',gene.symbol,'.rdata',sep='')
 id <- loadGeneProfData(url)
 url <- paste('http://www.geneprof.org/GeneProf/api/gene.info/expression/', ref, '/', id,'.rdata', ifelse(sample.info,'?with-sample-info=true',''),sep='')
 expression <- loadGeneProfData(url)
 expression
}

Let's combine these methods to retrieve some expression data and generate a few plots:

sox2.expression <- getExpression('sox2')
pou5f1.expression <- getExpression('pou5f1')
par(mfrow=c(1,3))
hist(log2(sox2.expression$RPKM+1))
hist(log2(pou5f1.expression$RPKM+1))
plot(log2(pou5f1.expression$RPKM+1),log2(sox2.expression$RPKM+1))

In the example above, we first get the GeneProf ID's of the mouse genes with the common symbols Sox2 and Pou5f1 (with the getGeneProfID function) and then use this ID to query expression data (in RPKM format, by default) via the Get Gene Expression Values for a Gene servlet described above. We could now do anything we like with these values, for example, we haven chosen here to plot two histograms of the RPKM values for each gene and a scatterplot comparing them.

In addition to the expression values, the same web service makes it possible to retrieve additional annotation data, e.g. for the cell type of each observation. Let's get the additional annotations for one of the genes (the annotations for the other would be the same, so no need to get them twice) and then use these annotations to plot an annotated scatter plot for a selection of cell types (the dots for each cell type will have a different colour in this plot):

pou5f1.expression <- getExpression('pou5f1',sample.info=T)
library(RColorBrewer)
my.types.of.interest <- c('embryonic stem cell','neuronal precursor cell','lung fibroblast','oocyte','sperm','embryoid body')
selection <- pou5f1.expression$Cell_Type %in% my.types.of.interest
par(mfrow=c(1,1))
plot(log2(pou5f1.expression$RPKM+1),log2(sox2.expression$RPKM+1),col=brewer.pal(length(my.types.of.interest),'Set1')[match(pou5f1.expression$Cell_Type,my.types.of.interest)],xlab='Pou5f1',ylab='Sox2',xlim=c(0,12),ylim=c(0,12))
legend('bottomright',my.types.of.interest,fill=brewer.pal(length(my.types.of.interest),'Set1'))

The complete R script for this example is available here.

Make sure to also checkout the GeneProf web services video tutorial!

Loading Genomic Data into a Genome Browser (UCSC, IGV, ..)

GeneProf stores quite a lot of genomic data, e.g. from alignments of short reads or enriched regions, such as peaks detected in ChIP-seq experiments. It is possible to load this data directly into external applications that understand the commonly used WIG and/or BED file formats such as genome browsers, like the UCSC genome browser, the Integrative Genomics Viewer, and others. The key web services required for this task are Genomic Data as BED Files (BED) and Genomic Data as WIG Files (WIG).

Image loading..

We'll try it with the UCSC Genome Browser first. On the genome browser homepage, we need to click the Add/Manage Custom Tracks button to start up the interface for uploading custom tracks. There's a big text box which allows users to enter the URLs under which custom tracks can be found and this is where the URLs for the relevant GeneProf web service calls are meant to go. Let's load in alignments of ChIP-seq datasets (Pol2 ChiP-seq and Input DNA control from this experiment based on data from Sultan et al. (2008)) as WIG tracks to see the coverage of reads in these datasets across the genome as well as the coordinates of ChIP-seq peaks that were identified in the analysis of these tracks. Let's focus on chromosome 3 only, for faster loading times. Copy and paste the URLs below into the UCSC's text box:

http://www.geneprof.org/GeneProf/api/data/3-chr3/11_683_29_3.wig.gz
http://www.geneprof.org/GeneProf/api/data/3-chr3/11_683_30_3.wig.gz
http://www.geneprof.org/GeneProf/api/data/3-chr3/11_683_33_2.bed.gz
Hint: In some cases, the UCSC genome browser uses different chromosome names than GeneProf and other databases (e.g. Ensembl). For instance, the mitochondrial "chromosome" for mouse and human is called chrMT in GeneProf, but chrM in UCSC. It might therefore be necessary to rename chromosome in order to make the tracks work with the genome browser, but don't worrym, that's easily achieved ysing the {CHROMNAMES} parameter of the Genomic Data as BED Files (BED) and Genomic Data as WIG Files (WIG) (also consider using Get Chromosome Names (XML, JSON, TXT, RDATA) to get a list of all the chromosomes for a dataset).

Just click the Submit button and that's it. Your data should be imported into the UCSC browser rather quickly. You can then continue using the genome browser as you are used to, e.g. have a look at the region chr3:37,253,759-37,439,292 with a nice peak at the TSS of GOLGA4.

Fancy another browser? The same URL patterns should work with IGV (see screenshots below) and probably pretty much every other genome browser. Note that, unfortunately, none of the browsers we've tried seem to work very well with URLs that contain query parameters (that's the argument behind the question mark (?), so further configuration of the tracks will not be possible unless you download the tracks first and upload them afterwards.

Image loading..

Using GeneProf Data in Galaxy

Galaxy is an open-source, web-based platform for genomic data analysis that has attracted a vibrant user-base over the recent years. GeneProf data, especially genomic data, can easily be loaded into Galaxy for further analysis.

Image loading..

As a simple example, we'll load two sets of ChIP-seq peaks for the factors Stat3 and Klf4 into Galaxy using a call to the Genomic Data as BED Files (BED) service. We'll then use Galaxy's built-in tools to intersect those regions to discover regions of potential interaction between the two factors.

On the Galaxy homepage, we start by opening the Upload File tool (found under Get Data). We'll upload two sets of ChIP-seq peaks, both contained in the GeneProf dataset gpDS_11_12_125_2 using web service calls to the URLs:

http://www.geneprof.org/GeneProf/api/data/11_12_125_2.bed.gz?filter-column=C_11_12_125_2_13_TFBS
http://www.geneprof.org/GeneProf/api/data/11_12_125_2.bed.gz?filter-column=C_11_12_125_2_14_TFBS

Simply copy & paste both URLs into the text box marked "URL / Text". The files we're retrieving are from the mouse genome and come in BED format, so let's select bed where it says "File Format" and Mouse July 2007 (NCBI37/mm9) where it says "Genome". In the end, click the Execute button to start the upload.

This should only take a moment (depending on how busy the servers are just now) and then two new datasets should be available in the Galaxy workspace for you to play with. For instance, we can now use the Intersect tool (under Operate on Genomic Intervals) to overlap both sets of peaks. In the configuration of this tool, just pick any one of the two datasets under "of:" and the other one under "that intersect:" and the execute the tool. After a few minutes you should be left with a new dataset of all overlapping peaks.

That's just one example, but in the same way you can upload lots of other genomic or sequence data from GeneProf directly into Galaxy enabling you to exploit both tools to their best potential!

Wiring GeneProf Data into Taverna Pipelines

General-purpose pipeline execution engines have many prospective uses and have enjoyed immense popularity in many scientific fields. One of the most well-known solutions in the life sciences is probably Taverna. Using GeneProf's web services it is possible to wire in GeneProf data with other tools into arbitrary pipelines, let's look at a simple example..

Image loading..

The key to using GeneProf data in Taverna is the REST Service template offered by Taverna. You can use this template to connect to the GeneProf web services listed above, filling in wild cards ({mywildcard}) with user inputs as required.

In the example workflow shown above, a concatenation of two web service calls is employed to retrieve gene expression data for an arbitrary gene. We first use the web service Get the GeneProf ID of a Gene to get the internal GeneProf gene ID for a gene based on an external identifier or name. We configure the Taverna module with the URL pattern of the web service leaving space for three wildcards (step 1):

Taverna REST Service URL Template 1:

http://www.geneprof.org/GeneProf/api/gene.info/gp.id/{ref}/{idtype}/{id}.txt

Please also change the 'accept' type to text/plain, because this is the type of data returned by that service. The wild cards are to be filled with user-defined inputs, so we next add three user input modules and connect them up to the web service box (step 2).

The GeneProf web service returns a plain text file with one GeneProf ID per line. We need to parse these into a list of Strings in order to use them futher on. The local Taverna service Split string into string list by regular expression will do the job (step 3): Connect the responseBody output of the REST service module to the string input of the regular expression splitter and define string constant to feed into the regex input. The string constant should read \n, i.e. the regular expression is to split the input along all "new line" characters.

Image loading..

Finally, we can add another REST service module to address the Get Gene Expression Values for a Gene service. Again, we leave space for two wildcards:

Taverna REST Service URL Template 2:

http://www.geneprof.org/GeneProf/api/gene.info/expression/{ref}/{id}.xml

The value of the {id} wild card comes from the parsed output of the regular expression string splitter module and the {id} wild card receives the input from the same user input parameter as the first module (step 4). The 'accept' type of this module should be application/xml since we're requesting XML data this time.

In the end, we define the workflow outputs of the new Taverna pipeline and we can now rename the modules used and annotate inputs and outputs to make the workflow a little nicer.

You can download the complete example Taverna workflow here.

Using GeneProf Data Live on Other Web Pages (HTML / AJAX with jQuery and d3.js)

Other web sites across the globe can dynamically retrieve live GeneProf data and include it in their web pages using AJAX and JSONP cross-domain request, for example via jQuery (also see this StackOverflow post).

Important: It's crucial to use JSONP requests rather than plain JSON requests, since cross-domain requests are not allowed by most (all?) modern browsers due to security issues! All of GeneProf's web services that can retrieve JSON data can also retrieve JSONP data if an additional parameter callback is passed into the request.

As an example, we'll implement a simple HTML page that displays a search form for genes. Using jQuery, the form will request matching genes from the GeneProf web services (Search Genes) and list them on the page along with their GeneProf IDs, a description and a plot for its gene expression values in a selection of cell types.

Instead of listing the entire source code for the page, let's just look at its most important component, the JSONP request (the complete source code is available here):

$.ajax({
    url: 'http://www.geneprof.org/GeneProf/api/search/gene/'+gene+'.json',
    dataType: 'jsonp',
    success:function(json){
        var matchingGenes = json.matches_per_dataset[0].genes;
        ... do something ...
    },
    error:function() {
        ... do something else ...
    }
});

As you'll notice the url attribute points to the URL of the GeneProf web service and dataType is set to jsonp (not json!). There are two important callback functions: success and error, the implementation of which defines in which way the response data is being dealt with.

In our case, we parse out a part of the response (gene names, etc.) and create some HTML code accordingly, and then fire up another AJAX call to another web service to retrieve some expression data for each gene. We then use the d3.js library to render an SVG bar plot showing the average expression in all available observations for a selection of cell types (this requires a compatbile browser -- pretty much anything but IE8 or earlier should do!).

Check out the complete source code of the example page here and have a good read through the jQuery documentation and you'll find you can easily integrate GeneProf data into your own web site, too!

Accessing GeneProf Web Services with Perl, Simple Example

GeneProf web services can be easily integrated with many popular scripting languages, e.g. Python or Perl. Most languages provide libraries for parsing XML files, so that's usually the kind of output you'd want to retrieve from the GeneProf web services.

Let's look at a simple example in Perl: We'll retrieve a bunch of gene expression values (RPKM) for an arbitrary gene and calculate the average gene expression per cell type. To make this as easy as possible we'll be using the REST::Client library.

Complete example script: Perl example script, Required Perl modules: XML::LibXML and REST::Client (along with their dependencies)

So, let's write some Perl scripts..

First, we want to set up a new REST web service client and configure it with the root URL of the GeneProf web services. We then use this client to perform the search and retrieve the data in XML format from the Get Gene Expression Values for a Gene service (we'll arbitrarily pick the human gene with the ID #2981 here as an example, but you may, of course, substitute other values as you please).

use REST::Client; use XML::LibXML;
# set up new client:
my $client = REST::Client->new();
$client->setHost('http://www.geneprof.org/GeneProf/api');
# retrieve expression data from GeneProf (including sample annotation):
$client->GET("/gene.info/expression/human/2981.xml?with-sample-info=true");

REST::Client provides a mechanism to automatically interpret the response as XML, so we can easily use XPath to get a list of all returned observations, which we can iterate one by one:

# use XPath to find all observations in the XML output:
my @observations = $client->responseXpath()->findnodes('//root/values/values_item');
# then loop through, one observation at a time:
foreach my $observation (@observations) {
    ...
}

In order to calculate the average expression per cell type, we need to get (a) the expression value for each observation and (b) the cell type from the sample annotation of the observation. The expression value can be retrieved from the RPKM child node of the observation:

my $rpkm = $observation->getChildrenByTagName('RPKM')->to_literal()->value();

In order to get the Cell_Type sample annotation, we first get the sample child node and then retrieve the value from there:

my $sampleInfo = $observation->getChildrenByTagName('sample')->get_node(0);
my $cellType = $sampleInfo->getChildrenByTagName('Cell_Type')->to_literal->value();

Some observations might not have a cell type annotated, so we need to fill in missing values:

if(!$cellType) { $cellType = 'N/A'; }

Assuming we had already defined as hash map groupAverages, we could now add up the totals and observation count for the current cell type (which we use later on to calculate the average):

$groupAverages{$cellType}{'total'} += $rpkm;
$groupAverages{$cellType}{'n'}++;

After the main loop and the data retrieval work is done, we just need to print out the results:

# print header:
print "Cell Type\tAverage\n";
# print data:
for (keys %groupAverages) {
    my $cellType = $_;
    my $avg = $groupAverages{$cellType}{'total'} / $groupAverages{$cellType}{'n'};
    print "$cellType\t$avg\n";
}

You can download the complete Perl example script here.

Accessing GeneProf Web Services with Perl, Advanced Example

In this advanced example, we'll be using the same techniques as in the simple example above to achieve a more complex task: We'll search GeneProf for all experiments with linked publications in Cell Stem Cell, retrieve metadata about these experiments, from which we'll find out what the main outputs of the analyses were and then we take all the mouse-specific gene-centric data from these datasets and merge them all together into one big table, which will be printed out as a tab-separated text file.

Complete example script: Perl example script, Required Perl modules: XML::LibXML and REST::Client (along with their dependencies)

Image loading..

Step 1: Search for experiments

So in the first step, we need to set up a new REST client and configure it with the root URL of the GeneProf web services. We then use this client to perform the search and retrieve the data in XML format from the Search Experiments service.

use REST::Client; use XML::LibXML;
# set up new client:
my $client = REST::Client->new();
$client->setHost('http://www.geneprof.org/GeneProf/api');
# get a list of all GeneProf experiments linked to publications in 'Cell Stem Cell':
$client->GET('/search/experiment/(citation:"cell stem cell".xml);

REST::Client provides a mechanism to automatically interpret the response as XML, so we can easily use XPath to get a list of all returned experiments, which we can iterate one by one:

# use XPath to find all experiments in the XML output:
my @experiments = $client->responseXpath()->findnodes('//root/experiments/experiments_item');
foreach my $exp (@experiments) {
    # get experiment attributes and print progress:
    my $expId = $exp->getChildrenByTagName('rigid_id')->to_literal;
    my $expName = $exp->getChildrenByTagName('name')->to_literal;
    ...
}

Step 2: Find the main outputs of each experiment

Next, we need to get metadata about each experiment using the Metadata about a GeneProf Experiment service. We'll use the with-outputs=true parameter to include a list of all output datasets in the metadata.

To retrieve the metadata we just use another call with the REST client:

# retrieve metadata about the current experiment:
$client->GET("/exp/$expId.xml?with-outputs=true");

Using XPath we can now easily parse out all the outputs -- we can even directly filter out datasets with data type FEATURES based on the reference dataset pub_mm_ens58_ncbim37 (that's the mouse reference dataset!):

# use XPath to find the IDs of all relevant output datasets:
my @outputs = $client->responseXpath()->findnodes('//root/outputs/outputs_item[data_type="FEATURES" and reference="pub_mm_ens58_ncbim37"]/rigid_id');
# then loop through, one dataset at a time:
foreach my $output (@outputs) {
    my $dsId = $output->to_literal;
    ...
}

Step 3: Retrieve and merge the actual data

The last web service we'll be using is Data as XML. In the previous steps, we've only been retrieving metadata about experiments and datasets, whereas this service retrieves actual (sort of tabular) data, in this case gene expression values and regulatory data specific to genes.

Again, we'll use the previously set up REST client to retrieve the data:

# retrieve the data in XML format:
$client->GET("/data/$dsId.xml");

The response will contain lots of data for genes organised in rows and columns. Add the beginning, the XML file contains additional meta information about the individual columns. We'll parse both separately:

# use XPath to parse out column metadata:
my @columns = $client->responseXpath()->findnodes('//fs/columns/column');
# use XPath to parse out the data items:
my @features = $client->responseXpath()->findnodes('//fs/f');

In order to merge all data together, we'll build up to large hashes-of-hases, called allData and allColumns. Since all datasets are based on the same reference the internal GeneProf IDs can be used as a merge criterion for the data rows. At the same time, the column identifiers should be unique unless the columns are shared between datasets, e.g. C_NAME defines gene names and is identical for all datasets, so it can be safely overridden:

# add all the column annotations to the hash-of-hashes 'allColumns':
foreach my $column (@columns) {
    my @childnodes = $column->childNodes();
    for my $child (@childnodes) {
        $allColumns{ $column->getAttribute('id') }{ $child->nodeName } = $child->to_literal;
    }
}
# add all the data to the hash-of-hashes 'allData':
foreach my $feature (@features) {
    my @childnodes = $feature->childNodes();
    for my $child (@childnodes) {
        $allData{ $feature->getAttribute('id') }{ $child->getAttribute('c') } = $child->to_literal;
    }
}

Step 4: Write out the results

Add this point, the two variables allData and allColumns should point to two large hashes-of-hashes containing all the data with retrieved merged into one, so all that's left to do is write out the data!

We'll write the data as tab-delimited plain text, so let's first write a header line:

my @columnIds = keys %allColumns;
print "GeneProfID";
for my $columnId (@columnIds) {
    print "\t", $allColumns{$columnId}{'label'};
}
print "\n";

And now the data:

for (keys %allData) {
    my $featureId = $_;
    print $featureId;
    for my $columnId (@columnIds) {
        print "\t", $allData{$featureId}{$columnId};
    }
    print "\n";
}

Yo, that's it. You can download the complete Perl example script here.

Accessing GeneProf Web Services with Java

Lastly, let's look at an example using an object-oriented programming language, Java. We'll write a little program that looks up genes by name and then calculates the average gene expression value (RPKM) for each of these genes per cell type -- based on all the RNA-seq data in GeneProf. In this example, we'll only use basic Java classes, but in practice you might want to give any of the REST client libraries out there a try.

Complete example code: Java Code, Compiled Example Program:: GeneProfWebServicesJavaClient

We'll not go through the entire code of the program here (the complete source code is available here), but just focus on the most important parts. So let's start by defining a new class that will act as our web service client:

public class GeneProfWebServicesJavaClient {
    private String host;
    
    public GeneProfWebServicesJavaClient(String host) {
        this.host = host;
    }
}

We can now define some generic methods to retrieve data as XML or plain text from the GeneProf web services (the code's abbreviated a bit -- of course, you should always make sure to close streams and connections properly!):

public Document getXML(String request) throws ... {
    URL url = new URL(host + request);
    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setDoInput(true);
    connection.setRequestMethod("GET");
    connection.setRequestProperty("Content-Type", "application/xml");
    InputStream is = connection.getInputStream();
    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = domFactory.newDocumentBuilder();
    Document doc = builder.parse(is);
    return doc;
}
public String getText(String request) throws ... {
    URL url = new URL(host + request);
    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setDoInput(true);
    connection.setRequestMethod("GET");
    connection.setRequestProperty("Content-Type", "text/plain");
    BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream()));
    StringBuilder sb = new StringBuilder();
    String line;
    while((line = br.readLine()) != null) {
        sb.append(line);
    }
    return sb.toString();
}

Using those generic methods, we can define others that perform more specific operations using the GeneProf web services, like this function which looks up internal GeneProf gene IDs corresponding to a given gene name / symbol:

public List<Integer> getGeneProfIdsForName(String referenceId, String name) throws ... {
    List<Integer> ids = new ArrayList<Integer>();
    for(String id : getText(String.format("/gene.info/gp.id/%s/C_NAME/%s.txt", referenceId, name)).split("\n")) {
        if(id.length()>0) ids.add(Integer.valueOf(id));
    }
    return ids;
}

The remainder of the source code is a bit too bulky to post here, but please just refer to the (commented) source code available here.

Once you compile the code into a JAR file (or download it here), you can run the program like this:

java -jar GeneProfWebServicesJavaClient.jar http://www.geneprof.org/GeneProf/api mouse Trp53 Myc Mycn

Using BioServices to Access GeneProf Data from Python

Thomas Cokelaer over at the EBI has integrated GeneProf's web services into his BioServices tool. BioServices is a Python package that makes it incredibly simple to use various bioinformatics web resources (apart from GeneProf it supports, for example, EUtils, KEGG, PDB or PathwayCommons).

Based on some material from the , Thomas has put together a good set of tutorials that help you get started. The full documentation of all implemented services is available here.

If you use BioServices to access GeneProf data, please remember to cite Thomas' paper (in addition to the GeneProf database paper):

Cokelaer T, Pultz D, Harder LM, Serra-Musach J & Saez-Rodriguez J. BioServices: a common Python package to access biological Web Services programmatically, Bioinformatics, 2013. Pubmed: 24064416.