The GeneProf Web Services enable programmatic access to the public data stored in GeneProf's databases via a simple web service API. GeneProf is a web-based data analysis for RNA-seq and ChIP-seq experiments that is coupled with a database of ready-analysed experiments. As such, GeneProf constitutes a rich and growing, high-quality resource for information about gene expression and regulation and the GeneProf web services further allow computational biologists and software developers to utilise these data from external software and websites.
Check out the introductory chapter of the manual for more information about GeneProf! You might also like to have a look at the 'Concepts Explained' chapter in order to better understand what we mean by the terms 'experiments', 'datasets', 'workflows' and the like.
The GeneProf WebAPI services are RESTful web services, exposing much of GeneProf's databases via a set of URLs with a fixed pattern. Each URL pattern may contain a number of required parameters, which are given as part of the path (indicated in curly brackets in the URLs listed below, e.g. {ID}), and a number of optional parameters which may be passed into the request as query parameters (e.g. ..?opt-param1=true&opt-param2=true). Most web services can return the results in either XML or JSON format.
Some web services expect the ID of a GeneProf reference dataset as one input parameter. These services accept a number of (more memorable) aliases for easier use. You may use any of these aliases in place of the actual dataset IDs, as you please:
| Alias | Dataset ID |
|---|---|
| arabidopsis | pub_at_ensp12_tair10 |
| at | pub_at_ensp12_tair10 |
| ce | pub_ce_ens59_ws210 |
| celegans | pub_ce_ens59_ws210 |
| chick | pub_gg_ens59_washuc2 |
| chicken | pub_gg_ens59_washuc2 |
| danio | pub_dr_ens59_zv8 |
| dm | pub_dm_ens59_bdgp5_13 |
| dmel | pub_dm_ens59_bdgp5_13 |
| dr | pub_dr_ens59_zv8 |
| drosphila | pub_dm_ens59_bdgp5_13 |
| ef | pub_sc_ens59_ef2 |
| fruitfly | pub_dm_ens59_bdgp5_13 |
| gg | pub_gg_ens59_washuc2 |
| hs | pub_hs_ens59_grch37 |
| hsapiens | pub_hs_ens59_grch37 |
| human | pub_hs_ens59_grch37 |
| mm | pub_mm_ens58_ncbim37 |
| mmusculus | pub_mm_ens58_ncbim37 |
| mouse | pub_mm_ens58_ncbim37 |
| os | pub_os_ensp12_mus6 |
| pig | ss_ens66_sscrofa9 |
| rat | pub_rn_ens59_rgsc3_4 |
| rice | pub_os_ensp12_mus6 |
| rn | pub_rn_ens59_rgsc3_4 |
| ss | ss_ens66_sscrofa9 |
| sscrofa | ss_ens66_sscrofa9 |
| tair | pub_at_ensp12_tair10 |
| yeast | pub_sc_ens59_ef2 |
| zebrafish | pub_dr_ens59_zv8 |
If you're using the GeneProf web services in your public-facing tools or websites, we'd kindly ask you to acknowledge GeneProf by including one of the following images with a link back to the GeneProf homepage in the appropriate place:
If you're using the GeneProf web services in conjunction with a publication, please cite the original GeneProf paper:
Most of the web services listed below can be used without any need to register for a GeneProf account, however, using a GeneProf WebAPI key (see below) will give you additional access to your private data stored in GeneProf.
To obtain an API key, you will need to sign up for a (free!) GeneProf user account and obtain a WebAPI key to use these services. After signing into your new user account,you can get one from your user profile page.
When you issue new API keys there are a number of points of concern to you: The API keys are meant to identify you to GeneProf. Each key is a rather long (128 symbols), random string of symbols, which is quite hard to guess, but not impossible per se. Thus, in order to avoid any misuse, keys are usually only valid for a limited period of time (1 day by default, but you can change this as you see fit). Also, by default an API key will only give you access to the public data in GeneProf, but, again, you may change this, if you need to access your own, private data. Furthermore, the number of requests you may send to the API is limited. At the moment, you are allowed to sent up to 50 request per day, but we may increase this in future (if feasible).
| URL(s): | http://www.geneprof.org/api/exp/list.{FORMAT} | |
| Summary: | Use this web service to retrieve a list of GeneProf experiments. | |
| Full Description: | 'Experiments' are what GeneProf calls each individual data analysis project. An experiment typically consists of a set of input data (e.g. raw high-throughput sequencing reads), some experimental sample annotation, an analysis workflow and a selection of main outputs. Please check the manual for further information about experiments. This web service simply retrieves a list of all the experiments available in the database along with a range of metadata. | |
| Required URL parameters: | ||
|---|---|---|
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the experiment metadata and does not support any of the additional output parameters (with-ats, with-samples, etc.)! | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| with-ats | false | Include descriptions for all datasets' annotation types (data columns). |
| with-samples | false | Include information about the sample annotation per experiment. |
| with-inputs | false | Include a listing of all input datasets per experiment. |
| with-outputs | false | Include a listing of the main output datasets per experiment. |
| with-workflow | false | Include the analysis workflow per experiment. |
| with-all-data | false | Include ALL datasets linked with the experiment (large response!) |
| only-user-experiments | false | List only experiments owned by the user identified by the WebAPI key. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve a concise list of all experiments as XML: http://www.geneprof.org/GeneProf/api/exp/list.xml | ||
| Retrieve a list of all experiments including their main outputs as JSON: http://www.geneprof.org/GeneProf/api/exp/list.json?with-outputs=true | ||
| Retrieve a concise list of all experiments as plain text: http://www.geneprof.org/GeneProf/api/exp/list.txt | ||
| Retrieve a concise list of all experiments for R http://www.geneprof.org/GeneProf/api/exp/list.rdata | ||
| URL(s): | http://www.geneprof.org/api/exp/{ID}.{FORMAT} | |
| Summary: | Use this web service to retrieve metadata (names, descriptions, IDs, references, etc) about GeneProf experiments. | |
| Full Description: | 'Experiments' are what GeneProf calls each individual data analysis project. An experiment typically consists of a set of input data (e.g. raw high-throughput sequencing reads), some experimental sample annotation, an analysis workflow and a selection of main outputs. Please check the manual for further information about experiments. This web service retrieves metadata about a specific GeneProf experiment given the experiment's accession ID (a string of the form gpXP_XXXXXX). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the experiment of interest. Either the entire accession ID (e.g. gpXP_000003) or just the numeric part (e.g. 3). | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the experiment metadata and does not support any of the additional output parameters (with-ats, with-samples, etc.)! | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| with-ats | false | Include descriptions for all datasets' annotation types (data columns). |
| with-samples | false | Include information about the sample annotation per experiment. |
| with-inputs | false | Include a listing of all input datasets per experiment. |
| with-outputs | false | Include a listing of the main output datasets per experiment. |
| with-workflow | false | Include the analysis workflow per experiment. |
| with-all-data | false | Include ALL datasets linked with the experiment (large response!) |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve basic metadata about experiment gpXP_000385 as XML: http://www.geneprof.org/GeneProf/api/exp/385.xml | ||
| Retrieve metadata including the analysis workflow for experiment gpXP_000023 as JSON: http://www.geneprof.org/GeneProf/api/exp/gpXP_000023.json?with-workflow=true | ||
| Retrieve basic metadata about experiment gpXP_000385 as a plain text file: http://www.geneprof.org/GeneProf/api/exp/385.txt | ||
| URL(s): | http://www.geneprof.org/api/ds/{ID}.{FORMAT} | |
| Summary: | Use this web service to retrieve metadata (names, descriptions, IDs, etc) about GeneProf datasets. | |
| Full Description: | 'Datasets', in GeneProf, are collections of data of the same type generated as the output of a component of an data analysis workflow. There are six generic types of datasets: FILE, SEQUENCES, GENOMIC_REGIONS, FEATURES, REFERENCE and SPECIAL. Please check the manual for further information about datasets. This web service retrieves metadata about a specific GeneProf dataset given the dataset's accession ID (a string of the form gpDS_XXX_XXX_XXX_XXX). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the dataset metadata and does not support any of the additional output parameters (with-ats)! | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| with-ats | false | Include descriptions for all datasets' annotation types (data columns). |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve metadata about the dataset gpDS_11_385_44_1 as XML: http://www.geneprof.org/GeneProf/api/ds/gpDS_11_385_44_1.xml | ||
| Retrieve metadata about the dataset gpDS_11_12_122_1as JSON: http://www.geneprof.org/GeneProf/api/ds/11_12_122_1.json?with-ats=true | ||
| URL(s): | http://www.geneprof.org/api/ds/pubref.{FORMAT} | |
| Summary: | Use this web service to retrieve a list of public GeneProf-recommended reference datasets. | |
| Full Description: | GeneProf provides a number of recommended reference datasets for several organisms (human, mouse, rat, etc.). These reference datasets provide genomic sequence assemblies and genic annotations that serve as a scaffold for GeneProf's analyses, so most of GeneProf's datasets are based on one of these reference datasets. This web service simply retrieves a list of all the public, recommended reference datasets currently available in the database. | |
| Required URL parameters: | ||
|---|---|---|
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. N.B. the txt and rdata format versions of the output reports a flattened version of the dataset metadata and misses out some information available in the other formats! | |
| Examples: | ||
| Retrieve a list of all reference datasets as XML: http://www.geneprof.org/GeneProf/api/ds/pubref.xml | ||
| Retrieve a list of all reference datasets as JSON: http://www.geneprof.org/GeneProf/api/ds/pubref.json | ||
| Retrieve a list of all reference datasets as plain text: http://www.geneprof.org/GeneProf/api/ds/pubref.txt | ||
| URL(s): | http://www.geneprof.org/api/gene.info/list.samples/{REF}.{FORMAT} | |
| Summary: | Use this web service to retrieve a list of public experiment samples for a GeneProf-recommended reference dataset. | |
| Full Description: | All public data in the GeneProf databases has been annotated with the biological sample of origin, described in terms of cell type, tissue, treatment, and so on. This web service simply retrieves a list of all the public sample annotations in the database for a specific reference dataset (see the List Public Reference Datasets service). | |
| Required URL parameters: | ||
|---|---|---|
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. | |
| Examples: | ||
| Retrieve a list of all samples for mouse as XML: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/mouse.xml | ||
| Retrieve a list of all samples for human as JSON: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/human.json | ||
| Retrieve a list of all samples for human as tab-delimited text: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/human.txt | ||
| Retrieve a list of all samples for mouse as RData: http://www.geneprof.org/GeneProf/api/gene.info/list.samples/human.rdata | ||
| URL(s): | http://www.geneprof.org/api/search/gene/{QUERY}.{FORMAT} |
|
| Summary: | Use this web service to search for genes using search terms against the genes' description, name and accession IDs. | |
| Full Description: | GeneProf uses well-defined sets of gene annotations based on those from Ensembl. Using this web service, you can search for genes of interest using arbitrarily complex search queries against the names and identifiers (from Ensembl, RefSeq and more) of those genes. The search results are categorised by the reference dataset the genes belong to (also see the List Public Reference Datasets service). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {QUERY} | The search term to look for, e.g. a gene name or paper title. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for genes are: Valid search fields are: id, label, description, type and reference. You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page. | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| taxons | N/A | Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list). |
| Examples: | ||
| Search for all genes matching the query 'sox2' (in XML format): http://www.geneprof.org/GeneProf/api/search/gene/sox2.xml | ||
| Search for all genes matching the query 'sox2' (in XML format), but only in human (taxon 9606): http://www.geneprof.org/GeneProf/api/search/gene/sox2.json?taxons=9606 | ||
| Search for genes matching both terms 'brca2' and 'cancer' and only those with reference 'mouse', in plain text format: http://www.geneprof.org/GeneProf/api/search/gene/brca2 AND cancer AND reference:mouse.txt | ||
| URL(s): | http://www.geneprof.org/api/search/experiment/{QUERY}.{FORMAT} | |
| Summary: | Use this web service to search for experiments using search terms against the experiments name, description and citations. | |
| Full Description: | 'Experiments' are what GeneProf calls each individual data analysis project. An experiment typically consists of a set of input data (e.g. raw high-throughput sequencing reads), some experimental sample annotation, an analysis workflow and a selection of main outputs. Please check the manual for further information about experiments. Using this web service, you can search for experiments of interest using arbitrarily complex search queries against the names, descriptions, linked citations, linked reference dataset, and so on of those experiments. The search results are categorised by the reference dataset the experiments belong to (also see the List Public Reference Datasets service). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {QUERY} | The search term to look for, e.g. a gene name or paper title. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for experiments are: Valid search fields are: id, label, description, type, reference, user, dataset, citation, platform and sample. You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page. | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| taxons | N/A | Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list). |
| Examples: | ||
| Search for experiments mentioning 'sox2' anywhere (in XML format): http://www.geneprof.org/GeneProf/api/search/experiment/sox2.xml | ||
| Search for experiments mentioning 'cancer' in their description (in JSON format): http://www.geneprof.org/GeneProf/api/search/experiment/Summary:cancer.json | ||
| Search for experiments mentioning 'cell stem cell' in a linked citation (in plain text format): http://www.geneprof.org/GeneProf/api/search/experiment/citation:("cell stem cell").txt | ||
| URL(s): | http://www.geneprof.org/api/search/dataset/{QUERY}.{FORMAT} |
|
| Summary: | Use this web service to search for datasets using search terms against the dataset name. | |
| Full Description: | 'Datasets', in GeneProf, are collections of data of the same type generated as the output of a component of an data analysis workflow. There are six generic types of datasets: FILE, SEQUENCES, GENOMIC_REGIONS, FEATURES, REFERENCE and SPECIAL. Please check the manual for further information about datasets. Using this web service, you can search for experiments of interest using arbitrarily complex search queries against the names and types of these datasets. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {QUERY} | The search term to look for, e.g. a gene name or cell type. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for samples are: Valid search fields are: id, label, description, datatype, user, experiment .You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page. | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| taxons | N/A | Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list). |
| Examples: | ||
| Search for datasets mentioning 'sox2' (in XML format): http://www.geneprof.org/GeneProf/api/search/dataset/sox2.xml | ||
| Search for datasets mentioning 'gene expression': http://www.geneprof.org/GeneProf/api/search/dataset/gene expression.json | ||
| Search for genomic data for 'sox2' in plain text format: http://www.geneprof.org/GeneProf/api/search/dataset/datatype:GENOMIC_REGIONS AND sox2.txt | ||
| URL(s): | http://www.geneprof.org/api/search/sample/{QUERY}.{FORMAT} | |
| Summary: | Use this web service to search for public experiment samples using search terms against their annotations. | |
| Full Description: | All public data in the GeneProf databases has been annotated with the biological sample of origin, described in terms of cell type, tissue, treatment, and so on. Using this web service, you can search for samples of interest using arbitrarily complex search queries against the annotations of these samples. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {QUERY} | The search term to look for, e.g. a gene name or cell type. You can narrow down the fields to be search by prefixing the query with a field name. Valid fields for samples are: Valid search fields are: id, label, description , Age, Antibody, Cell_Line, Cell_Type, Description, Developmental_Stage, Gender, Gene, Label, Organism, Platform, Sample_Group, SRA_Accession, Strain, Time, Tissue, Treatment. You can also use boolean logic in your queries using the keywords AND and OR, brackets and quotes (") for exact matches of whole phrases. Advanced search options and examples are documents on GeneProf's search page. | |
| {FORMAT} | The file format requests, one of: json, xml, txt, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| taxons | N/A | Only return matches from experiments dealing with organisms matching these NCBI taxonomy IDs (comma-separated list). |
| Examples: | ||
| Search for samples annotated 'ChIP' in any of the default search fields (in XML format): http://www.geneprof.org/GeneProf/api/search/sample/ChIP.xml | ||
| Search for samples annotated with the gene 'sox2': http://www.geneprof.org/GeneProf/api/search/sample/Gene:sox2.json | ||
| Search for samples annotated 'human' in any of the default search fields in plain text format: http://www.geneprof.org/GeneProf/api/search/sample/human.txt | ||
| URL(s): | http://www.geneprof.org/api/gene.info/gp.id/{REF}/{IDTYPE}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to find out the GeneProf ID of a certain gene. | |
| Full Description: | GeneProf uses well-defined sets of gene annotations based on those from Ensembl. Using this web service, you can get the GeneProf-internal ID of any gene in the reference annotation by matching it against an external name (official gene symbol) or one of the supported accession ID types (e.g. Ensembl Gene IDs, RefSeq IDs, etc. -- use the list ID types service to find out which types are supported for a dataset). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {IDTYPE} | The identifier of an annotation column storing IDs or the term any to use any available identifier type. Check the list ID types service to find out which types are supported for a dataset. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Examples: | ||
| Get the GeneProf ID of the mouse gene with Ensembl ID ENSMUSG00000059552, as plain text: http://www.geneprof.org/GeneProf/api/gene.info/gp.id/mouse/C_ENSG/ENSMUSG00000059552.txt | ||
| Get the GeneProf IDs of all human genes with RefSeq ID NM_005657, as JSON: http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_RSEQ/NM_005657.json | ||
| Get the GeneProf IDs of all human genes with any ID matching "NM_005657" (should, in this case, be same as the previous query), as XML: http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/any/NM_005657.xml | ||
| URL(s): | http://www.geneprof.org/api/gene.info/external.id/{REF}/{IDTYPE}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to translate a GeneProf gene ID into an external identifier or name. | |
| Full Description: | GeneProf uses well-defined sets of gene annotations based on those from Ensembl. Using this web service, you can look up an external name (official gene symbol) or one of the supported accession ID types (e.g. Ensembl Gene IDs, RefSeq IDs, etc. -- use the list ID types service to find out which types are supported for a dataset) for any given internal GeneProf gene ID. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {IDTYPE} | The identifier an annotation column storing IDs. Check the list ID types service to find out which types are supported for a dataset. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| quote | (empty) | For plain text output only: Enclose all IDs in this sort of quote (e.g. double quote: "). |
| Examples: | ||
| Get the Ensembl Gene ID(s) of the mouse gene #715, as plain text: http://www.geneprof.org/GeneProf/api/gene.info/external.id/mouse/715/C_ENSG.txt | ||
| Get the RefSeq ID(s) of the human gene #2981, as JSON: http://www.geneprof.org/GeneProf/api/gene.info/external.id/human/2981/C_RSEQ.json | ||
| Get the name(s) of the human gene #2981, as XML: http://www.geneprof.org/GeneProf/api/gene.info/external.id/human/2981/C_NAME.xml | ||
| URL(s): | http://www.geneprof.org/api/gene.info/list.id.types/{REF}.{FORMAT} |
|
| Summary: | Use this web service to list all the ID types available for a dataset. | |
| Full Description: | GeneProf reference datasets provide a number of alternative ID annotations (e.g. Ensembl Gene IDs, RefSeq IDs, UniGene IDs, etc.) for each of the genes in the reference annotation. This service simply lists all the ID types available for a dataset. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Examples: | ||
| List all the ID types for the mouse reference dataset, as plain text: http://www.geneprof.org/GeneProf/api/gene.info/list.id.types/mouse.txt | ||
| List all the ID types for the human reference dataset, as JSON: http://www.geneprof.org/GeneProf/api/gene.info/list.id.types/human.json | ||
| URL(s): | http://www.geneprof.org/api/gene.info/expression/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve gene expression values for a gene based on public RNA-seq data in the GeneProf databases. | |
| Full description: | GeneProf's databases contain many pre-calculated gene expression values stemming from a reanalyses of a large collection of RNA-seq (and similar) experiments. You use this web service to retrieve all the expression values for a single gene of interest by giving the name of the reference dataset the gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. You may retrieve the values either as raw read counts (the total number of short reads that were aligned to the gene's locus), RPM (reads per million -- the raw counts rescaled to account for differences in library size) or RPKM (reads per kilobase million -- like RPM, but also accounting for transcript length bias). All gene expression values have been calculated using the Calculate Gene Expression module. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| type | RPKM | The type of values to obtain, one of: RAW | RPM | RPKM |
| with-sample-info | false | Include additional annotations about the tissue, cell type, etc. of the expression values. |
| Examples: | ||
| Retrieve gene expression values for the mouse gene #715 in JSON format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.json?with-sample-info=true | ||
| Retrieve raw read count values for the mouse gene #715 in XML format: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.xml?type=RAW | ||
| Retrieve gene expression values for the mouse gene #715 as a tab-delimited text file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.txt?with-sample-info=true | ||
| Retrieve gene expression values for the mouse gene #715 as an RData file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/expression/mouse/715.rdata?with-sample-info=true | ||
| URL(s): | http://www.geneprof.org/api/gene.info/regulation/binary/by.gene/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve putative target genes for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for the targets discovered in all available ChIP-seq experiments (identified by the ID of a gene). | |
| Full description: | GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of putative target genes for a transcription factor (TF) or other DNA-binding protein, by giving the name of the reference dataset the TF gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. The assignment of putative target genes to TFs has been done by calling enriched binding peaks on the aligned ChIP-seq reads using MACS and subsequently assigning the peaks to target genes if they were within a permissible window of the transcription start site (as by current wizard default: 20kb up- and 1kb down-stream of the TSS; in an upcoming release of the web service, you will be able to redefine these threshold dynamically, so watch this space!). The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Map Regions to Genes. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). For some TFs there might be more than one dataset available, in which case the output returned by the web service will contain the status in all available datasets (distinguished by the experimental sample they belong to, see list public samples service). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene/feature (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | C_NAME | A selection of column IDs (from the reference) to be included in the output. |
| include-unbound | false | Include not only putative target genes in the output, but also those genes that show now evidence of regulation. |
| Examples: | ||
| Get all the putative targets of the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/mouse/9885.json | ||
| Get all the putative targets of the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/human/36958.xml?include-unbound=true | ||
| Get all the putative targets of the mouse TF Nanog as tab-delimited text and include a column for gene name and Ensembl ID (there are TWO ChIP-seq datasets available for this TF!): http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/mouse/14899.txt?ats=C_NAME,C_ENSG | ||
| Get all the putative targets of the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.gene/human/36958.rdata | ||
| URL(s): | http://www.geneprof.org/api/gene.info/regulation/binary/by.sample/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve putative target genes for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for the targets discovered in a specific ChIP-seq experiment (identified by the ID of a public sample). | |
| Full description: | GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of putative target genes for a transcription factor (TF) or other DNA-binding protein (incl. histone modifications), by giving the identifier of a public GeneProf sample -- use the list public samples or the search public samples service to look up these identifiers. The assignment of putative target genes to TFs has been done by calling enriched binding peaks on the aligned ChIP-seq reads using MACS and subsequently assigning the peaks to target genes if they were within a permissible window of the transcription start site (as by current wizard default: 20kb up- and 1kb down-stream of the TSS; in an upcoming release of the web service, you will be able to redefine these threshold dynamically, so watch this space!). The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Map Regions to Genes. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a public sample (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | C_NAME | A selection of column IDs (from the reference) to be included in the output. |
| include-unbound | false | Include not only putative target genes in the output, but also those genes that show now evidence of regulation. |
| Examples: | ||
| Get all the putative targets of the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/mouse/541.json | ||
| Get all the putative targets of the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/human/784.xml?include-unbound=true | ||
| Get all the putative targets of the mouse TF Smad1 as tab-delimited text and include a column for gene name and Ensembl ID: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/mouse/541.txt?ats=C_NAME,C_ENSG | ||
| Get all the putative targets of the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.sample/human/784.rdata | ||
| URL(s): | http://www.geneprof.org/api/gene.info/regulation/tfas/by.gene/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve transcription factor association strength (TFAS) scores for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for the data in all available ChIP-seq experiments (identified by the ID of a gene). | |
| Full description: | GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of TFAS scores for a transcription factor (TF) or other DNA-binding protein, by giving the name of the reference dataset the TF gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. 'TFAS' (= transcription factor association strength) scores are continuous values that give an indication of how strongly a transcription factor (or other DNA-binding protein) is associated with a target gene. The TFAS is calculated as a function of the intensity and the distance of all binding sites (ChIP-seq peaks) near a gene, for details, please refer to the publication by Ouyang et al. (PubMed: 19995984). We use as an intensity score the fold-change enrichment of the ChIP-seq signal over the control background as calculated by MACS in conjunction with calling peaks for the input ChIP-seq data. The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Calculate TFAS. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). For some TFs there might be more than one dataset available, in which case the output returned by the web service will contain the status in all available datasets (distinguished by the experimental sample they belong to, see list public samples service). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene/feature (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | C_NAME | A selection of column IDs (from the reference) to be included in the output. |
| Examples: | ||
| Get all TFAS scores for the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/mouse/9885.json | ||
| Get all TFAS scores for the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/human/36958.xml?include-unbound=true | ||
| Get all TFAS scores the mouse TF Nanog as tab-delimited text and include a column for gene name and Ensembl ID (there are TWO ChIP-seq datasets available for this TF!): http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/mouse/14899.txt?ats=C_NAME,C_ENSG | ||
| Get all TFAS scores for the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.gene/human/36958.rdata | ||
| URL(s): | http://www.geneprof.org/api/gene.info/regulation/tfas/by.sample/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve transcription factor association strength (TFAS) scores for a transcription factor (or other transcriptional regulator) based on public ChIP-seq data in the GeneProf databases by querying for data in a specific ChIP-seq experiment (identified by the ID of a public sample). | |
| Full description: | GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of TFAS scores for a transcription factor (TF) or other DNA-binding protein, by giving the identifier of a public GeneProf sample -- use the list public samples or the search public samples service to look up these identifiers. 'TFAS' (= transcription factor association strength) scores are continuous values that give an indication of how strongly a transcription factor (or other DNA-binding protein) is associated with a target gene. The TFAS is calculated as a function of the intensity and the distance of all binding sites (ChIP-seq peaks) near a gene, for details, please refer to the publication by Ouyang et al. (PubMed: 19995984). We use as an intensity score the fold-change enrichment of the ChIP-seq signal over the control background as calculated by MACS in conjunction with calling peaks for the input ChIP-seq data. The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Calculate TFAS. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a public sample (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | C_NAME | A selection of column IDs (from the reference) to be included in the output. |
| Examples: | ||
| Get TFAS scores for the mouse TF Smad1 in JSON format: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/mouse/541.json | ||
| Get TFAS scores for the human TF MEIS1 in XML format, also include unbound genes for comparison: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/human/784.xml?include-unbound=true | ||
| Get TFAS scores for the mouse TF Smad1 as tab-delimited text and include a column for gene name and Ensembl ID: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/mouse/541.txt?ats=C_NAME,C_ENSG | ||
| Get TFAS scores for the human TF MEIS1 as an RData file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.sample/human/784.rdata | ||
| URL(s): | http://www.geneprof.org/api/gene.info/regulation/binary/by.target/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve transcription factors (and other regulatory inputs) putatively targeting a specific gene, based on public ChIP-seq data in the GeneProf databases. | |
| Full description: | GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of transcription factors and other DNA-binding proteins that might possible be regulating a gene of interest, by giving the name of the reference dataset the gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. The assignment of putative target genes to TFs has been done by calling enriched binding peaks on the aligned ChIP-seq reads using MACS and subsequently assigning the peaks to target genes if they were within a permissible window of the transcription start site (as by current wizard default: 20kb up- and 1kb down-stream of the TSS; in an upcoming release of the web service, you will be able to redefine these threshold dynamically, so watch this space!). The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Map Regions to Genes. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| with-sample-info | false | Include additional annotations about the tissue, cell type, etc. of the expression values. |
| Examples: | ||
| Get information about factors putatively targeting gene #715 in JSON format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.json?with-sample-info=true | ||
| Get information about factors putatively targeting gene #715 in XML format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.xml?with-sample-info=true | ||
| Get information about factors putatively targeting gene #715 as a tab-delimited text file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.txt | ||
| Get information about factors putatively targeting gene #715 as an RData file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/mouse/715.rdata?with-sample-info=true | ||
| URL(s): | http://www.geneprof.org/api/gene.info/regulation/tfas/by.target/{REF}/{ID}.{FORMAT} |
|
| Summary: | Use this web service to retrieve transcription factors association scores between transcription factors (and other regulatory inputs) and a specific target gene of interest, based on public ChIP-seq data in the GeneProf databases. | |
| Full description: | GeneProf's databases contain lots of information about putative gene regulatory interactions from a reanalyses of a large collection of ChIP-seq experiments. You use this web service to retrieve a list of TFAS scores quantitating the association between transcription factors (TFs) and other DNA-binding proteins and a gene of interest, by giving the name of the reference dataset the gene belongs to and its internal GeneProf gene ID -- use the list reference datasets, get GeneProf ID and/or search genes services to look up these identifiers. 'TFAS' (= transcription factor association strength) scores are continuous values that give an indication of how strongly a transcription factor (or other DNA-binding protein) is associated with a target gene. The TFAS is calculated as a function of the intensity and the distance of all binding sites (ChIP-seq peaks) near a gene, for details, please refer to the publication by Ouyang et al. (PubMed: 19995984). We use as an intensity score the fold-change enrichment of the ChIP-seq signal over the control background as calculated by MACS in conjunction with calling peaks for the input ChIP-seq data. The GeneProf workflow modules corresponding to these two steps are documented here: Find Peaks with MACS and Calculate TFAS. Full details for the analysis pipeline that was used to calculate each value are available from the individual experiments the values come from (the JSON and XML output contain a link to the experiment of origin). | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The GeneProf ID of a gene (an integer number). | |
| {REF} | The identifier of a public GeneProf reference dataset. You may use aliases here. Check the list public references service for all available reference datasets. | |
| {FORMAT} | The file format requests, one of: json, txt, xml, rdata. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| with-sample-info | false | Include additional annotations about the tissue, cell type, etc. of the expression values. |
| Examples: | ||
| Get TFAS scores to gene #715 in JSON format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.json?with-sample-info=true | ||
| Get TFAS scores to gene #715 in XML format, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.xml?with-sample-info=true | ||
| Get TFAS scores to gene #715 as a tab-delimited text file: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.txt | ||
| Get TFAS scores to gene #715 as an RData file, including additional annotation data: http://www.geneprof.org/GeneProf/api/gene.info/regulation/tfas/by.target/mouse/715.rdata?with-sample-info=true | ||
| URL(s): | http://www.geneprof.org/api/usr/{ID}.{FORMAT} | |
| Summary: | Use this web service to retrieve metadata about a GeneProf user (name, email, user experiments, etc.). In the interest of privacy, the service can only be used to retrieve information about yourself. | |
| Full description: | This web service retrieves metadata about registered users of GeneProf. Other than personal details (name, email, etc.), this information contains a list of all experiments owned by the user. In order to not jeopardise the privacy of GeneProf users, we have restricted access to this servlet currently only to your own data and you will need an API key to make use of the service. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the user of interest (works only with your own user ID, in the interest of privacy). | |
| {FORMAT} | The file format requests, one of: json, xml. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| key | N/A | A valid WebAPI key. Required for this service. |
| Examples: | ||
| Retrieve metadata about yourself as XML: http://www.geneprof.org/GeneProf/api/usr/MY-USER-ID.xml?key=MY-API-KEY | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.txt http://www.geneprof.org/api/data/{ID}.txt.gz |
|
| Summary: | Use this web service to retrieve data from a GeneProf dataset as plain text (optionally compressed as GZIP). Maximum size of datasets without API key = 1,000,000, with API key = unlimited. | |
| Full description: | This web service retrieves the entire contents of an arbitrary GeneProf dataset as a tab-delimited, plain text file. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 1,000,000 entries. With an API key, the maximum size is unlimited. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | (default displayed columns) | A selection of column IDs to be included in the output. |
| sep | \t (TAB) | Symbol to be used as a column separator. By default, the output will a tab-separated text file. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.txt.gz | ||
| Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.txt.gz?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3 | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.xls http://www.geneprof.org/api/data/{ID}.xls.gz |
|
| Summary: | Use this web service to retrieve data from a GeneProf dataset as Excel-compatible spreadsheets (optionally compressed as GZIP). Maximum size of datasets without API key = 50,000, with API key = 50,000. | |
| Full description: | This web service retrieves the entire contents of an arbitrary GeneProf dataset as a Excel-compatible spreadsheet. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests and due to size restrictions of XLS documents, the maximum size of datasets retrieved is restricted to 50,000 entries. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | (default displayed columns) | A selection of column IDs to be included in the output. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xls.gz | ||
| Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xls.gz?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3 | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.xml http://www.geneprof.org/api/data/{ID}.xml.gz |
|
| Summary: | Use this web service to retrieve data from a GeneProf dataset as XML (compressed as GZIP). Maximum size of datasets without API key = 1,000,000, with API key = unlimited. | |
| Full description: | This web service retrieves the entire contents of an arbitrary GeneProf dataset as a computer-readable XML file. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 1,000,000 entries. With an API key, the maximum size is unlimited. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | (default displayed columns) | A selection of column IDs to be included in the output. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xml.gz | ||
| Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.xml.gz?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3 | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.rdata | |
| Summary: | Use this web service to retrieve data from a GeneProf dataset as binary files that can be loaded into R. Maximum size of datasets without API key = 1,000,000, with API key = 1,000,000. | |
| Full description: |
This web service retrieves the entire contents of an arbitrary GeneProf dataset as binary R file. The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you
can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests and due to size limitations, the maximum size of datasets retrieved is restricted to 1,000,000 entries.
These binary files can be loaded into R simply by issuing the command load(FILENAME). Check out the advanced example below to find out how
to load data into R directly from the web services.
|
|
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| ats | (default displayed columns) | A selection of column IDs to be included in the output. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve data from all visible columns of the dataset gpDS_11_119_18_1 (example RNA-seq data): http://www.geneprof.org/GeneProf/api/data/11_119_18_1.rdata | ||
| Retrieve only the Ensembl Gene IDs and RPKM values from the same dataset: http://www.geneprof.org/GeneProf/api/data/11_119_18_1.rdata?ats=C_ENSG,C_11_119_16_1_RPKM0,C_11_119_16_1_RPKM1,C_11_119_16_1_RPKM2,C_11_119_16_1_RPKM3 | ||
| URL(s): | http://www.geneprof.org/api/data/chromosome.names/{ID}.{FORMAT} | |
| Summary: | Use this web service to retrieve the IDs and names of all chromosomes in a genomic dataset. This service can only be used for genomic datasets, i.e. for datasets with type GENOMIC_REGIONS or REFERENCE. | |
| Full description: | The names different genome databases use to refer to chromosomes, even of well-known organisms, are not always the same. For example, the mitochondrial (pseudo-)chromosome is usally called 'chrMT' in Ensembl, but 'chrM' in the UCSC databases. The data as BED and data as WIG services might therefore require you to rename the experiments in the output, before using them with other applications. This web service retrieves the identifiers and names of all chromosomes used in a genomic dataset. You can inspect those and see whether any change will be required. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Get all chromosomes for the mouse reference dataset in plain text format: http://www.geneprof.org/GeneProf/api/data/chromosome.names/pub_mm_ens58_ncbim37.txt | ||
| Get all chromosomes for the human reference dataset in JSON format: http://www.geneprof.org/GeneProf/api/data/chromosome.names/pub_hs_ens59_grch37.json | ||
| Get the chromosome names from the ChIP-seq peaks dataset gpDS_11_3_7_2 in XML format: http://www.geneprof.org/GeneProf/api/data/chromosome.names/11_3_7_2.xml | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.bed.gz http://www.geneprof.org/api/data/{CHROMNAMES}/{ID}.bed.gz |
|
| Summary: | Use this web service to retrieve data from a GeneProf dataset as BED (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for genomic datasets, i.e. for datasets with type GENOMIC_REGIONS. | |
| Full description: | This web service retrieves the entire contents of a genomic GeneProf dataset in BED file format. This will only work for dataset of type GENOMIC_REGIONS, i.e. those containing genomic data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited. N.B. chromosomes in the output BED can be dynamically renamed in order to make the names compatible with other applications (that's because, unfortunately, not all genome databases use the same names, see also the get chromosome names service). | |
| Required and Optional URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| {CHROMNAMES} | An optional parameter that may be used to rename chromosomes in the output. The value should be comma-separated map from chromosome ID to its name in the output, where key and value are to be separated with a hyphen (-), e.g. 1-chr1,2-chr2,3-chr12. Any chromosome not mentioned in the map will not be exported, so you can use this as a filtering mechanism, too. Use the Get Chromosome Names service to get a list of all the available chromosome in a dataset with their default names. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| filter-column | N/A | The ID of a column / annotation type holding boolean flags. Only entries for which this boolean flag is true will be exported. |
| with-track-description | true | Include a track description header. |
| only-distinct | false | Export only one entry if there are multiple with the same coordinates. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve ChIP-seq peaks for FoxA1 from dataset gpDS_11_3_7_2: http://www.geneprof.org/GeneProf/api/data/11_3_7_2.bed.gz | ||
| Retrieve only the ChIP-seq peaks on chromosome 3 for FoxA1 from dataset gpDS_11_3_7_2: http://www.geneprof.org/GeneProf/api/data/3-chr3/11_3_7_2.bed.gz | ||
| Retrieve gene coordinates from the zebrafish reference dataset without a track header: http://www.geneprof.org/GeneProf/api/data/zebrafish.bed.gz?with-track-description=false | ||
| Retrieve only the ChIP-seq peaks for Stat3 (identified by the column $C_11_12_125_2_14_TFBS) from a dataset containing peaks for many different factors (gpDS_11_12_125_2): http://www.geneprof.org/GeneProf/api/data/11_12_125_2.bed.gz?filter-column=C_11_12_125_2_14_TFBS | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.wig.gz http://www.geneprof.org/api/data/{CHROMNAMES}/{ID}.wig.gz |
|
| Summary: | Use this web service to retrieve data from a GeneProf dataset as WIG (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for genomic datasets, i.e. for datasets with type GENOMIC_REGIONS. | |
| Full description: | This web service retrieves the entire contents of a genomic GeneProf dataset in WIG file format. This will only work for dataset of type GENOMIC_REGIONS, i.e. those containing genomic data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited. N.B. chromosomes in the output BED can be dynamically renamed in order to make the names compatible with other applications (that's because, unfortunately, not all genome databases use the same names, see also the get chromosome names service). | |
| Required and Optional URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| {CHROMNAMES} | An optional parameter that may be used to rename chromosomes in the output. The value should be comma-separated map from chromosome ID to its name in the output, where key and value are to be separated with a hyphen (-), e.g. 1-chr1,2-chr2,3-chr12. Any chromosome not mentioned in the map will not be exported, so you can use this as a filtering mechanism, too. Use the Get Chromosome Names service to get a list of all the available chromosome in a dataset with their default names. | |
| Additional query parameters: | ||
| Parameter | Default Value | Description |
| with-track-description | true | Include a track description header. |
| only-distinct | false | Include only one entry in the coverage count if there are multiple with the same coordinates. |
| frag-length | -1 | The "fragment length" to calculate the coverage with, use -1 to use the actual size of the regions. |
| bin-size | 25 | The bin size / resolution of the tracks. |
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Examples: | ||
| Retrieve genomic coverage data from a RNA-seq assay of gene expression in human liver gpDS_11_58_16_2: http://www.geneprof.org/GeneProf/api/data/11_58_16_2.wig.gz | ||
| Retrieve genomic coverage data from a ChIP-seq experiment for Smad1 (gpDS_11_12_112_2), using only distinct alignments: http://www.geneprof.org/GeneProf/api/data/11_12_112_2.wig.gz?with-track-description=false&only-distinct=true&frag-length=200 | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.fasta.gz | |
| Summary: | Use this web service to retrieve data from a GeneProf dataset as FASTA (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for nucleotide sequence datasets, i.e. for datasets with type SEQUENCES. | |
| Full description: | This web service retrieves the entire contents of a nucleotide sequence dataset in FASTA format. This will only work for dataset of type SEQUENCES, i.e. those containing sequence data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Example: | ||
| Retrieve unprocessed Tag-seq sequence data from gpDS_11_385_6_1: http://www.geneprof.org/GeneProf/api/data/11_385_6_1.fasta.gz | ||
| URL(s): | http://www.geneprof.org/api/data/{ID}.fastq.gz | |
| Summary: | Use this web service to retrieve data from a GeneProf dataset as FASTA (compressed as GZIP). Maximum size of datasets without API key = 10,000,000, with API key = unlimited. This service can only be used for nucleotide sequence datasets, i.e. for datasets with type SEQUENCES. | |
| Full description: | This web service retrieves the entire contents of a nucleotide sequence dataset in FASTQ format. This will only work for dataset of type SEQUENCES, i.e. those containing sequence data! The dataset of interest is identified by its GeneProf accession ID (something of the form gpDS_XXX_XXX_XXX_X). You can get a list of datasets belonging to a certain experiment of interest using the metadata for an experiment service, or you can use the search datasets service to query datasets globally. In order to avoid overloading of the GeneProf servers by anonymous requests, the maximum size of datasets retrieved without an API key is restricted to 10,000,000 entries. With an API key, the maximum size is unlimited. | |
| Required URL parameters: | ||
|---|---|---|
| Parameter | Description | |
| {ID} | The identifier of the dataset of interest. Either the entire accession ID (e.g. gpDS_11_385_44_1) or just the dataset-specific part (e.g. 11_385_44_1). | |
| Additional query parameters: | ||
| key | N/A | An optional WebAPI key, required to access non-public data. |
| Example: | ||
| Retrieve unprocessed Tag-seq sequence data from gpDS_11_385_6_1: http://www.geneprof.org/GeneProf/api/data/11_385_6_1.fastq.gz | ||
If you're running a Unix-like operating system, you're probably familiar with the concept of 'piping' the output of one commandline program into another (cp. this Wikipedia article on Unix pipelines). You can use wget to retrieve GeneProf data as a stream like this:
Here's an example of how to retrieve a FASTQ sequence file, which is then filtered for sequences containing the nucleotides ACTG (in order) and written to a file called filterd-seqs.fq:
You can combine the outputs of several web services to achieve more advanced results. For instance, let's first look up the GeneProf ID of the mouse gene with the Ensembl ID ENSMUSG00000024406 (that's the transcription factor (TF) Pou5f1), and then look for all genes that are putatively regulated by this TF across all ChIP-seq datasets available in the GeneProf database and then check for genes from the "Sox" family:
Hint: Of course, you're not limited at all to using standard Unix commands! There's nothing holding you back from making use of great bioinformatics commandline tools such as, for example, Biopieces, FASTX-Toolkit or BEDTools.
Another use of GeneProf's web services is for loading data into R. Many types of data can be exported directly as a binary RData objects -- which can be easily loaded into an existing R session like this:
Let's put this into a function (called loadGeneProfData) for even easier use:
We can use this to write a function that combines two web service calls to look up the GeneProf Gene ID for a gene symbol and then to retrieve gene expression data for this gene:
Let's combine these methods to retrieve some expression data and generate a few plots:
In the example above, we first get the GeneProf ID's of the mouse genes with the common symbols Sox2 and Pou5f1 (with the getGeneProfID function) and then use this ID to query expression data (in RPKM format, by default) via the Get Gene Expression Values for a Gene servlet described above. We could now do anything we like with these values, for example, we haven chosen here to plot two histograms of the RPKM values for each gene and a scatterplot comparing them.
In addition to the expression values, the same web service makes it possible to retrieve additional annotation data, e.g. for the cell type of each observation. Let's get the additional annotations for one of the genes (the annotations for the other would be the same, so no need to get them twice) and then use these annotations to plot an annotated scatter plot for a selection of cell types (the dots for each cell type will have a different colour in this plot):
GeneProf stores quite a lot of genomic data, e.g. from alignments of short reads or enriched regions, such as peaks detected in ChIP-seq experiments. It is possible to load this data directly into external applications that understand the commonly used WIG and/or BED file formats such as genome browsers, like the UCSC genome browser, the Integrative Genomics Viewer, and others. The key web services required for this task are Genomic Data as BED Files (BED) and Genomic Data as WIG Files (WIG).
We'll try it with the UCSC Genome Browser first. On the genome browser homepage, we need to click the Add/Manage Custom Tracks button to start up the interface for uploading custom tracks. There's a big text box which allows users to enter the URLs under which custom tracks can be found and this is where the URLs for the relevant GeneProf web service calls are meant to go. Let's load in alignments of ChIP-seq datasets (Pol2 ChiP-seq and Input DNA control from this experiment based on data from Sultan et al. (2008)) as WIG tracks to see the coverage of reads in these datasets across the genome as well as the coordinates of ChIP-seq peaks that were identified in the analysis of these tracks. Let's focus on chromosome 3 only, for faster loading times. Copy and paste the URLs below into the UCSC's text box:
Just click the Submit button and that's it. Your data should be imported into the UCSC browser rather quickly. You can then continue using the genome browser as you are used to, e.g. have a look at the region chr3:37,253,759-37,439,292 with a nice peak at the TSS of GOLGA4.
Fancy another browser? The same URL patterns should work with IGV (see screenshots below) and probably pretty much every other genome browser. Note that, unfortunately, none of the browsers we've tried seem to work very well with URLs that contain query parameters (that's the argument behind the question mark (?), so further configuration of the tracks will not be possible unless you download the tracks first and upload them afterwards.
Galaxy is an open-source, web-based platform for genomic data analysis that has attracted a vibrant user-base over the recent years. GeneProf data, especially genomic data, can easily be loaded into Galaxy for further analysis.
As a simple example, we'll load two sets of ChIP-seq peaks for the factors Stat3 and Klf4 into Galaxy using a call to the Genomic Data as BED Files (BED) service. We'll then use Galaxy's built-in tools to intersect those regions to discover regions of potential interaction between the two factors.
On the Galaxy homepage, we start by opening the Upload File tool (found under Get Data). We'll upload two sets of ChIP-seq peaks, both contained in the GeneProf dataset gpDS_11_12_125_2 using web service calls to the URLs:
Simply copy & paste both URLs into the text box marked "URL / Text". The files we're retrieving are from the mouse genome and come in BED format, so let's select bed where it says "File Format" and Mouse July 2007 (NCBI37/mm9) where it says "Genome". In the end, click the Execute button to start the upload.
This should only take a moment (depending on how busy the servers are just now) and then two new datasets should be available in the Galaxy workspace for you to play with. For instance, we can now use the Intersect tool (under Operate on Genomic Intervals) to overlap both sets of peaks. In the configuration of this tool, just pick any one of the two datasets under "of:" and the other one under "that intersect:" and the execute the tool. After a few minutes you should be left with a new dataset of all overlapping peaks.
That's just one example, but in the same way you can upload lots of other genomic or sequence data from GeneProf directly into Galaxy enabling you to exploit both tools to their best potential!
General-purpose pipeline execution engines have many prospective uses and have enjoyed immense popularity in many scientific fields. One of the most well-known solutions in the life sciences is probably Taverna. Using GeneProf's web services it is possible to wire in GeneProf data with other tools into arbitrary pipelines, let's look at a simple example..
The key to using GeneProf data in Taverna is the REST Service template offered by Taverna. You can use this template to connect to the GeneProf web services listed above, filling in wild cards ({mywildcard}) with user inputs as required.
In the example workflow shown above, a concatenation of two web service calls is employed to retrieve gene expression data for an arbitrary gene. We first use the web service Get the GeneProf ID of a Gene to get the internal GeneProf gene ID for a gene based on an external identifier or name. We configure the Taverna module with the URL pattern of the web service leaving space for three wildcards (step 1):
Please also change the 'accept' type to text/plain, because this is the type of data returned by that service. The wild cards are to be filled with user-defined inputs, so we next add three user input modules and connect them up to the web service box (step 2).
The GeneProf web service returns a plain text file with one GeneProf ID per line. We need to parse these into a list of Strings in order to use them futher on. The local Taverna service Split string into string list by regular expression will do the job (step 3): Connect the responseBody output of the REST service module to the string input of the regular expression splitter and define string constant to feed into the regex input. The string constant should read \n, i.e. the regular expression is to split the input along all "new line" characters.
Finally, we can add another REST service module to address the Get Gene Expression Values for a Gene service. Again, we leave space for two wildcards:
The value of the {id} wild card comes from the parsed output of the regular expression string splitter module and the {id} wild card receives the input from the same user input parameter as the first module (step 4). The 'accept' type of this module should be application/xml since we're requesting XML data this time.
In the end, we define the workflow outputs of the new Taverna pipeline and we can now rename the modules used and annotate inputs and outputs to make the workflow a little nicer.
You can download the complete example Taverna workflow here.
Other web sites across the globe can dynamically retrieve live GeneProf data and include it in their web pages using AJAX and JSONP cross-domain request, for example via jQuery (also see this StackOverflow post).
As an example, we'll implement a simple HTML page that displays a search form for genes. Using jQuery, the form will request matching genes from the GeneProf web services (Search Genes) and list them on the page along with their GeneProf IDs, a description and a plot for its gene expression values in a selection of cell types.
Instead of listing the entire source code for the page, let's just look at its most important component, the JSONP request (the complete source code is available here):
As you'll notice the url attribute points to the URL of the GeneProf web service and dataType is set to jsonp (not json!). There are two important callback functions: success and error, the implementation of which defines in which way the response data is being dealt with.
In our case, we parse out a part of the response (gene names, etc.) and create some HTML code accordingly, and then fire up another AJAX call to another web service to retrieve some expression data for each gene. We then use the d3.js library to render an SVG bar plot showing the average expression in all available observations for a selection of cell types (this requires a compatbile browser -- pretty much anything but IE8 or earlier should do!).
Check out the complete source code of the example page here and have a good read through the jQuery documentation and you'll find you can easily integrate GeneProf data into your own web site, too!
GeneProf web services can be easily integrated with many popular scripting languages, e.g. Python or Perl. Most languages provide libraries for parsing XML files, so that's usually the kind of output you'd want to retrieve from the GeneProf web services.
Let's look at a simple example in Perl: We'll retrieve a bunch of gene expression values (RPKM) for an arbitrary gene and calculate the average gene expression per cell type. To make this as easy as possible we'll be using the REST::Client library.
Complete example script: Perl example script, Required Perl modules: XML::LibXML and REST::Client (along with their dependencies)
So, let's write some Perl scripts..
First, we want to set up a new REST web service client and configure it with the root URL of the GeneProf web services. We then use this client to perform the search and retrieve the data in XML format from the Get Gene Expression Values for a Gene service (we'll arbitrarily pick the human gene with the ID #2981 here as an example, but you may, of course, substitute other values as you please).
REST::Client provides a mechanism to automatically interpret the response as XML, so we can easily use XPath to get a list of all returned observations, which we can iterate one by one:
In order to calculate the average expression per cell type, we need to get (a) the expression value for each observation and (b) the cell type from the sample annotation of the observation. The expression value can be retrieved from the RPKM child node of the observation:
In order to get the Cell_Type sample annotation, we first get the sample child node and then retrieve the value from there:
Some observations might not have a cell type annotated, so we need to fill in missing values:
Assuming we had already defined as hash map groupAverages, we could now add up the totals and observation count for the current cell type (which we use later on to calculate the average):
After the main loop and the data retrieval work is done, we just need to print out the results:
In this advanced example, we'll be using the same techniques as in the simple example above to achieve a more complex task: We'll search GeneProf for all experiments with linked publications in Cell Stem Cell, retrieve metadata about these experiments, from which we'll find out what the main outputs of the analyses were and then we take all the mouse-specific gene-centric data from these datasets and merge them all together into one big table, which will be printed out as a tab-separated text file.
Complete example script: Perl example script, Required Perl modules: XML::LibXML and REST::Client (along with their dependencies)
So in the first step, we need to set up a new REST client and configure it with the root URL of the GeneProf web services. We then use this client to perform the search and retrieve the data in XML format from the Search Experiments service.
REST::Client provides a mechanism to automatically interpret the response as XML, so we can easily use XPath to get a list of all returned experiments, which we can iterate one by one:
Next, we need to get metadata about each experiment using the Metadata about a GeneProf Experiment service. We'll use the with-outputs=true parameter to include a list of all output datasets in the metadata.
To retrieve the metadata we just use another call with the REST client:
Using XPath we can now easily parse out all the outputs -- we can even directly filter out datasets with data type FEATURES based on the reference dataset pub_mm_ens58_ncbim37 (that's the mouse reference dataset!):
The last web service we'll be using is Data as XML. In the previous steps, we've only been retrieving metadata about experiments and datasets, whereas this service retrieves actual (sort of tabular) data, in this case gene expression values and regulatory data specific to genes.
Again, we'll use the previously set up REST client to retrieve the data:
The response will contain lots of data for genes organised in rows and columns. Add the beginning, the XML file contains additional meta information about the individual columns. We'll parse both separately:
In order to merge all data together, we'll build up to large hashes-of-hases, called allData and allColumns. Since all datasets are based on the same reference the internal GeneProf IDs can be used as a merge criterion for the data rows. At the same time, the column identifiers should be unique unless the columns are shared between datasets, e.g. C_NAME defines gene names and is identical for all datasets, so it can be safely overridden:
Add this point, the two variables allData and allColumns should point to two large hashes-of-hashes containing all the data with retrieved merged into one, so all that's left to do is write out the data!
We'll write the data as tab-delimited plain text, so let's first write a header line:
And now the data:
Yo, that's it. You can download the complete Perl example script here.
Lastly, let's look at an example using an object-oriented programming language, Java. We'll write a little program that looks up genes by name and then calculates the average gene expression value (RPKM) for each of these genes per cell type -- based on all the RNA-seq data in GeneProf. In this example, we'll only use basic Java classes, but in practice you might want to give any of the REST client libraries out there a try.
Complete example code: Java Code, Compiled Example Program:: GeneProfWebServicesJavaClient
We'll not go through the entire code of the program here (the complete source code is available here), but just focus on the most important parts. So let's start by defining a new class that will act as our web service client:
We can now define some generic methods to retrieve data as XML or plain text from the GeneProf web services (the code's abbreviated a bit -- of course, you should always make sure to close streams and connections properly!):
Using those generic methods, we can define others that perform more specific operations using the GeneProf web services, like this function which looks up internal GeneProf gene IDs corresponding to a given gene name / symbol:
The remainder of the source code is a bit too bulky to post here, but please just refer to the (commented) source code available here.
Once you compile the code into a JAR file (or download it here), you can run the program like this: