JASPAR is a collection of transcription factor DNA-binding preferences, modeled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences.
JASPAR is the only database with this scope where the data can be used with no restrictions (open-source). For a comprehensive review of models and how they can be used, please see the following reviews
The JASPAR database consists of smaller subsets of profiles known as collections. Each of these collections have different goals as described below. The main collection is known as JASPAR CORE and is the collection most scientists use.
Since JASPAR 4, all matrix models have versions. This is primarily to keep track of improvements - which can be anything from correcting typos to actually making a new model based on new data. Version control works as follows: IDs are based on a stable ID, and a version number, so that the whole ID is [stable ID].[version]. The stable ID follows a certain transcription factor, or other logic unit such as a dimer pair. For instance, the stable ID for the factor GATA1 is MA0035. However, the GATA1 matrix has been updated twice with new data, so there are currently three versions: MA0035.1, MA0035.2 and MA0035.3. Per default, only the latest version is shown, but it is possible to list all versions of a matrix with the same stable ID.
The JASPAR CORE collection contains a curated, non-redundant set of TF binding profiles. All profiles are derived from published collections of experimentally defined transcription factor binding sites for multi-cellular eukaryotes. The TF binding profiles were historically determined from SELEX experiments or the collection of data from the experimentally determined binding regions of actual regulatory regions. More recent profiles are derived from high-throughput techniques such as ChIP-sequencing, Protein Binding Microarray, or High-Throughput SELEX. One of the central goals of the JASPAR CORE is to provide a single, “best” model for each transcription factor. This means that the database is non-redundant in the sense that there are not many models for the same factor (with some few exceptions motivated by the recognition of significantly different motifs).
The prime difference to similar resources (TRANSFAC, etc) consist of the open data access, non-redundancy and quality: JASPAR CORE is a smaller set that is non-redundant and curated.
JASPAR CORE is what most scientists mean when referring to JASPAR in manuscripts.
For convenience, JASPAR CORE is divided by larger groups of species. This distinction is mainly used in the web interface and, optionally, in the download section. Currently these larger taxonomic groups are: vertebrates, planst, insects, nematodes, fungi, plants and urochordates.
What annotation data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. CORE matrices always have a MAnnnn IDs. Version |
Name | The name of the transcription factor. As far as possible, the name is based on the standardized Entrez gene symbols. In the case the model describes a transcription factor hetero-dimer, two names are concatenated, such as RXR-VDR. In a few cases, different splice forms of the same gene have different binding specificity: in this case the splice form information is added to the name, based on the relevant literature. |
Class | Structural class of the transcription factor, based on the TFClass system |
Family | Structural sub-class of the transcription factor, based on the TFClass system |
Species | The species source for the sequences, in Latin. Linked to the NCBI Taxonomic browser. The actual database entries are the NCBI tax IDs – the latin conversion is only in the web interface. |
Tax_group | Group of species, currently consisting of 4 larger groups: vertebrate, insect, plant, chordate |
Acc | A representative protein accession number in Genbank for the transcription factor. Human takes precedence if several exists. |
Type | Methodology used for matrix construction (see below) |
Pubmed ID | a link to the relevant publication reporting the sites used in the mode building |
Pazar_tf_id | A link to the PAZAR database |
Comment | For some matrices, a curator comment is added |
When should it be used?
This is main JASPAR collection and should be used when curated, non-redundant binding profile models for specific factors derived from experimental data are required.
The other JASPAR collections are collections of matrices that do not fit under the JASPAR CORE scope. Examples include splice forms, computationally derived patterns with no linked transcription factors, meta-models etc.
The JASPAR FAM database consists of 11 models describing shared binding properties of structural classes of transcription factors. These types of models can be called “familial profiles”, “consensus matrices” or metamodels. The models have two prime benefits: 1)Since many factors have similar target sequences, we often experience multiple predictions at the same locations that correspond to the same site. This type of models reduce the complexity of the results. 2)The models can be used to classify newly derived profiles (or project what type of structural class its cognate transcription factor belongs to). The construction of the models is based on the JASPAR CORE collection and described in detail in
Sandelin A, Wasserman WW. Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics J Mol Biol. 2004 Apr 23;338(2):207-15. A recent, comprehensive study of familial binding profiles and associated methods is available in (that plos paper by maohney et al)
What data does each entry hold?
Entry | Note |
---|---|
ID | A unique identifier for each model. FAM matrices always have MFnnnn IDs |
Name | The name of model. In this database, models were built by first partitioning JASPAR CORE matrices into structural classes – therefore, the names are essentially structure class names |
PubMed ID | The source article (always J Mol Biol. 2004 Apr 23;338(2):207-15) |
Included models | The JASPAR CORE matrices used to construct the model |
Type | Always “Metamodel” |
When should it be used?
When searching large genomic sequences with no prior knowledge. For classification of new user-supplied profiles.
The JASPAR PHYLOFACTS database consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements.
For a detailed description, see Xie et al., Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals., Nature 434, 338-345 (2005) and supplementary material.
In short, the authors used the following strategy. Promoters (defined as the 4-kb region around the TSS) of human genes from the RefSeq database were aligned against the genomes of mouse, rat and dog. Every consensus sequence of length between 6 and 26, defined over an alphabet of 4 unique (A,C,G,T) and 7 degenerate (R, Y, K, M, S, W, N) nucleotides, was scanned over the alignments. A motif is regarded as conserved when it appears in the alignment both for the human and for the other three mammalian species. The conservation rate p is defined as the number of times a motif is conserved divided by the number of times it occurs in man only. This conservation rate is compared to the expected conservation rate p0, estimated from random motifs, which gives the motif conservation score MCS. Only motifs with an MCS>6 were retained, resulting in a list of 174 highly conserved motifs (see supplementary Table S2 of Xie et al.). The count matrices for these 174 motifs were extracted from the downloaded alignments. They were further annotated according to their resemblance with TRANSFAC and JASPAR CORE motifs. For TRANSFAC, the annotation of Xie et al. was used. For comparing to the JASPAR CORE matrices, the Pearson Correlation Coefficient (PCC) was used to define matrix similarity. All PHYLOFACTS matrices were scanned against the JASPAR CORE matrices, and matrices were regarded as being similar when PCC>0.8. When multiple hits were found, only the one with the highest PCC was retained. .
What data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. PHYLOFACTS matrices always have MFnnnn IDs |
Name | The name of model. In this database, models are based on over-represented words which are unique. The name is simply the consensus sequence. |
Jaspar | The JASPAR CORE motif that has the best similarity score when compared to this model. Only hits with a similarity score over 0.8 are considered. |
Transfac | The transfac (public version) motif that has the best similarity score when compared to this model. Only hits with a similarity score over 0.8 are considered. |
Sysgroup | Group of species. Always “mammals” |
Type | Always “phylogenetic” |
PubMed ID | The source article (always Nature 434, 338-345 (2005)) |
When should it be used?
The JASPAR PHYLOFACTS matrices are a mix of motifs corresponding to motifs for known and undefined transcription factors. They are useful when one expects that other factors might determine promoter characteristics, such as structural aspects and tissue specificity. They are highly complementary to the JASPAR CORE matrices, so are best used in combination with this matrix set.
The deluge of novel data presented recently pertaining transcription start sites (reviewed in (13,14)) motivates computational studies of core promoters. The JASPAR_POLII sub-database holds known 13 DNA patterns linked to RNA polymerase II core promoters, such as the Inr and BRE elements, each based on experimental evidence: each model must be constructed using 5 or more experimentally verified sites. An important difference to the transcription factor profiles in JASPAR CORE is that patters here do not necessarily have a specified protein interactor (See (15) for a review on core promoter patterns). When possible, profiles were extended by two nucleotides more than the core motif. We consistently report positions relative to the TSS as the position of 5’ and 3’ edge of the matrix.
When data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. POLII matrices always have POLnnn IDs |
Name | The reported name of the pattern (not necessarily the binding protein, if this is known) |
Species | The species source for the sequences, in Latin. “-“ generally signifies that several species were used in the model construction PubMed ID |A link to the relevant publication reporting the sites used in the mode building Start relative to TSS |Reported bias (if any) on position relative to the dominant transcription start site in the promoter. This is counted from the 5’ end of the pattern (the left side). As we have added some flanking nucleotides, this sometimes is not the exact numbers shown in the source publications. End relative to TSS | See above. Distance is counted from the 3’ end of the matrix (the right side). |
When should it be used?
When analyzing properties of core promoters.
Highly conserved non-coding elements are a distinctive feature of metazoan genomes. Many of them can be shown to act as long-range enhancers that drive expression of genes that are themselves regulators of core aspects of metazoan development and differentiation. Since they act as regulatory inputs, attempts at deciphering the regulatory content of these elements have started. JASPAR CNE is a collection of 233 matrix profiles derived by Xie et al based on clustering of overrepresented motifs from human conserved non-coding elements. While the biochemical and biological role of most of these patterns is still unknown, Xie et al. have shown that the most abundant ones correspond to known DNA-binding proteins, most notably insulator-binding protein CTCF. These matrix profiles will be useful for further characterization of regulatory inputs in long-range developmental gene regulation in vertebrates.
What data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. NCRNA matrices always have CNnnnn IDs |
Name | The name of model. |
Consensus sequence | the consensus sequence of the motif - important as it is the basis for clustering over-represented sites in this study |
PubMed ID | The source article (always Xie et al) |
When should it be used?
When analyzing properties of potential enhancers.
This small collection contains matrix profiles of human canonical and non-canonical splice sites, as matching donor:acceptor pairs. It currently contains only 6 highly reliable profiles obtained from human genome made by Chong et al. In the future, we shall include additional eukaryotic species, as well as new models for exonic splicing enhancers (ESE) and inhibitors (ESI).
What data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. SPLICE matrices always have SPnnnn IDs |
Name | The name of model. |
PubMed ID | The source article (always Chong et al ) |
When should it be used?
When analyzing splice sites and alternative splicing
All the PBM collections are built by using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE.
The PBM, collection is the set derived by Badis et al from binding preferences of 104 mouse transcription factors. One profile (IRC900814) was excluded because the transcription factor could not be identified.
What data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. SPLICE matrices always have PHnnnn IDs |
Name | The name of model. |
Class | Structural class of the transcription factor, based on the TFClass system |
Family | Structural sub-class of the transcription factor, based on the TFClass system |
Species | The species source for the sequences, in Latin. Linked to the NCBI Taxonomic browser. The actual database entries are the NCBI tax IDs – the latin conversion is only in the web interface. |
Tax_group | Group of species, currently consisting of 4 larger groups: vertebrate, insect, plant, chordate |
PubMed ID | A link to the relevant publication reporting the sites used in the mode building |
Type | Methodology used for matrix construction |
Comment | For some matrices, a curator comment is added |
When should it be used?
Where it is important that each matrix was derived using the same protocol
All the PBM collections are built by using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE.
The PBM, collection is the set derived by Berger et al including 176 profiles from mouse homeodomains
What data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. SPLICE matrices always have PHnnnn IDs |
Name | The name of model. |
Class | Structural class of the transcription factor, based on the TFClass system |
Family | Structural sub-class of the transcription factor, based on the TFClass system |
Species | The species source for the sequences, in Latin. Linked to the NCBI Taxonomic browser. The actual database entries are the NCBI tax IDs – the latin conversion is only in the web interface. |
Tax_group | Group of species, currently consisting of 4 larger groups: vertebrate, insect, plant, chordate |
PubMed ID | A link to the relevant publication reporting the sites used in the mode building |
Type | Methodology used for matrix construction |
Comment | For some matrices, a curator comment is added |
When should it be used?
Where it is important that each matrix was derived using the same protocol, focused on homeobox factors
All the PBM collections are built by using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE.
The PBM HLH, collection is the set derived by Grove et al. It holds 19 C. elegans bHLH transcription factor models
What data does each entry hold?
Entry | Note |
---|---|
ID | a unique identifier for each model. SPLICE matrices always have PHnnnn IDs |
Name | The name of model. |
Class | Structural class of the transcription factor, based on the TFClass system |
Family | Structural sub-class of the transcription factor, based on the TFClass system |
Species | The species source for the sequences, in Latin. Linked to the NCBI Taxonomic browser. The actual database entries are the NCBI tax IDs – the latin conversion is only in the web interface. |
Tax_group | Group of species, currently consisting of 4 larger groups: vertebrate, insect, plant, chordate |
PubMed ID | A link to the relevant publication reporting the sites used in the mode building |
Type | Methodology used for matrix construction |
Comment | For some matrices, a curator comment is added |
When should it be used?
Where it is important that each matrix was derived using the same protocol, focused on bHLH factors
The JASPAR database can now be reached remotely through a new Web Service interface. Current functionality includes retrieval of profiles by name, by identifier and by searching profile annotations. Profiles can be retrieved as position frequency matrices, position weight matrices or information content matrices. The purpose of providing an external application programming interface (API) is to simplify the utilization of JASPAR in distributed applications and in scientific workflows created in workflow editors like Triana, BPEL, or Taverna. Other benefits include platform- and language independent access, as well as constant up-to-date access to the database over time. The API is implemented as a WS-I compliant Web service, identical to the technology used for the services made available through the EMBRACE Network of Excellence, and the Web service technology chosen by the European Bioinformatics Institute (EBI) . The WSDL describing this service can be found here. Further information about the Web service is available in the WSDL file, including example clients in Java and Python.
Please check this Tools page.
JASPAR is downloadable in two different formats, from the DOWNLOAD link in the start page:
flat files resulting from the TFBS::DB::MatrixDir function in the perl API, which are easily parsable
flat files corresponding top the mysql tables used internally in the database. The create table statements are here
In the DOWNLOAD directory, most matrix collections have a SITE subdirectory, which for each model lists all sites used for the model construction as a fasta file. The alignments are implicit – the used sub-parts of sequences are in capitals. Note that in the majority of cases, this is an interpretation – we use pattern finders to find the most likely alignment, but this might not always be the most correct. This is the principal reason we make these collections available – users can make their own models based on the raw files.
The start page has four major “tabs” that determines the way in which you will interact with respective JASPAR database. First, use the “ SELECT A JASPAR SUB-DATABASE” (this will also give a brief summary of respective database. After this, you can either use the
BROWSE tab: The whole selected database will be shown (see the below for information about the browse page), sorted by the selected attribute (default attribute is ID). As many users are interested in the JASPAR CORE collection, you can click the “Browse the JASPAR CORE collection right away” just under the JASPAR image.
SEARCH BY tab: selection of subsets of profiles using user set criteria, use the search by fields.
Multiple inputs are acceptable, if submitted with commas in between ‘,’ (this will in effect be interpreted as an OR statement)
The criteria will be interpreted from top to bottom by using the boolean statement at each row:
an AND statement will perform an intersect between two query results
an OR statement will perform a union of two query results
an NOT statement will filter out results from the first query
ALIGN to custom matrix tab: In some cases, it is beneficial to assess similarity to input data (as with using BLAST for sequence data comparison when using Genbank). The input profile can consist of actual counts or be normalized (each column sum =1). Log-odds matrices should be avoided. For an example of the input format, press the “fill in an example matrix”. The A[ ] etc characters are optional, so:
A [13 13 3 1 54 1 1 1 0 3 2 5 ]
C [13 39 5 53 0 1 50 1 0 37 0 17 ]
G [17 2 37 0 0 52 3 0 53 8 37 12 ]
T [11 0 9 0 0 0 0 52 1 6 15 20 ]
is equivalent to
13 13 3 1 54 1 1 1 0 3 2 5
13 39 5 53 0 1 50 1 0 37 0 17
17 2 37 0 0 52 3 0 53 8 37 12
11 0 9 0 0 0 0 52 1 6 15 20
All profiles in the selected database will be compared to the input profile, using a modified Needleman-Wunsch algorithm described in
Sandelin A, Hoglund A, Lenhard B, Wasserman WW. Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes. Funct Integr Genomics. 2003 Jul;3(3):125-34 and sorted by raw comparison score (for reference, the maximum score is 2*the width of the smallest matrix in the compared pair). Both the score and fraction of potential maximal score is reported.
This page presents a list of selected profiles. At the top, it is posisble to select a subset of the matrices, much like in the start page described above.
Note that the columns of this page will differ between databases, but ID and Name attributes will always be shown, as well as a sequence logo for each pattern. For detailed information regarding any profile model, press the view link – this will give a pop-up window with detailed matrix information (see below).
At the right-hand side, a number of functional analyses can be made, with selected profiles (see the selection field in the left-most column). Currently, its is possible to i) Cluster selected matrices into familial binding profiles using the STAMP tool, ii) create permuted matrices using column shuffling o the selected matrices iii) create random matrices using a Bayesian sampling procedure, iv) perform basic sequence analysis (scanning an input sequence with matrices).
These features are described in detail below, in section d) : extended functionality.
These pop-up pages are results of clicking on the logos of chosen models in the browse page, and show detailed information about the model: both annotation data (which is different in different databases – see respective database entry above), and a sequence logo, a count matrix and hits/bp statistics:
Sequence logos are graphical representations of the matrix model, based on information content. The information content of a matrix column ranges from 0 (no base preference) and 2 (only 1 base used). A sequence logo is basically a barplot showing the total information content in each position, where the bar is replaced by stacked letters (A,C,G,T), which are sized and sorted relative to their occurrence. See Schneider et al for a more comprehensive description.
The “Make SVG button” gives an SVG version logo suitable for publication images (SVG is a vector format that does not have the pixel edges of the .png’s used in browsers – it can be read by many drawing programs and most web-browsers with proper plugins)
The underlying model showing the DNA pattern. In most databases, the cell numbers indicate the number of sequences having base x in column y. These matrices can be used for a number of different analyses, including site searching, if suitably converted, See Wasserman and Sandelin for a review.
The reverse complement button make a reverse complement version of the matrix (as DNA is two-stranded, the two models are functionally equivalent). If the amtrix if reverse-complemented, the logo will change accordingly.
For some transcription factors, there are multiple models – usually this is due to new data becoming available. Clicking on this link gives a liting of all the models for the factor in question.
In order to visualize the binding properties of each JASPAR matrix we calculate the average number of hits per 1000 base pairs on three distinctly different sequence sets. We do this by converting the count matrix to a log-odds matrix using a uniform background model over the four bases. For a series of threshold values ([1, 0.95, … , 0.65, 0.60]) of the scoring range of the log-odds matrix we count the number of hits equal to or greater than the current threshold. We count the number of hits treating each sequence set as one string and then convert this number to a mean value per 1000 base pairs on both strands, that is, we search both the leading strand and the reverse complement. All means are for practical purposes rounded to one decimal.
We use three distinct sequence sets, known promoters, CpG islands and random DNA respectively. The known promoters consist of all plant, arthropod and vertebrate promoters in the -1000 to +100 region from the EPD database [ref 1]. This sequence set totals 4735 promoters concatenated into one string. The CpG sequence set consists of all regions from the UCSC genome browser (hg18) with an epigenetic score above 0.5 (See Bock et al). This totals 8,559,418 nucleotides. Finally the random DNA sequences are randomly picked 1000 base pair windows from hg18 across all chromosomes and totals 8,000,000 nucleotides. The randomly picked DNA is not repeat-masked or in any way filtered.
Using a subset of profiles, a submitted sequence can be analyzed. Sensitivity and specificity will be affected by the relative score threshold, by default 80% (See Wasserman and Sandelin for a review on scoring of matrices to sequences) . This is the most basic form of sequence analysis: dedicated systems such as ConSite are preferable for anything more than a casual analysis.
The CLUSTER button provides the user with a means of investigating the relationship between the various matrices. This functionality is provided by the STAMP tool available as a webservice at http://www.benoslab.pitt.edu/stamp/.
Hierarchical clustering is performed on a selected set of matrices using the UPGMA algorithm with a Pearson Correlation Coefficient distance metric. Then the optimal number of clusters is selected using a log variant of the Calinski and Harabasz statistic (See this link for details).Finally the clusters are partioned and a familial binding profile is created for each cluster using an iterative refinement, multiple alignment method. Further details can be found in the STAMP manuscript.
This option simply shuffles the columns in matrices. This can either be done by just shuffling columns within each selected matrix, or by shuffling columns almong all selected matrices.
This feature of the database enables the users to generate random Position Frequency Matrices (PFMs) from selected profiles.
We assume that each column in the profile is independent and described by a mixture of Dirichlet multinomials in which the letters are drawn from a multinomial and the multinomial parameters are drawn from a mixture of Dirichlets. Within this model each column has its own set of multinomial parameters but the higher level parameters – those of the mixture prior is assumed to be common to all Jaspar matrices. We can therefore use a maximum likelihood approach to learn these from the observed column counts of all Jaspar matrices. The maximum likelihood approach automatically ensures that matrices receive a weight relative to the number of counts it contains.
Drawing samples from the prior distribution will generate PWMs with the same statistical properties as the Jaspar matrices as a whole. PWMs with statistical properties like those of the selected profiles can be obtained by drawing from a posterior distribution which is proportional to the prior times a multinomial likelihood term with counts taken from one of the columns of the selected profiles.
Each 4-dimensional column is sampled by the following three-step procedure: 1. draw the mixture component according to the distribution of mixing proportions, 2. draw an input column randomly from the concatenated selected profiles and 3. draw the probability vector over nucleotides from a 4-dimensional Dirichlet distribution. The parameter vector alpha of the Dirichlet is equal to the sum of the count (of the drawn input) and the parameters of the Dirichlet prior (of the drawn component). .
Draws from a Dirichlet can be obtained in the following way from Gamma distributed samples:
(X1,X2,X3,X4) = (Y1/V,Y2/V,Y3/V,Y4/V) ~ Dir(α1,α2,α3,α4)
where V = sum(Yi) ~ Gamma(shape = sum(αi), scale = 1).
For both and random generating of matrices you have the choice between three different output formats:
Raw - Each matrix is separated by a fasta like header starting with the > symbol and then a matrix ID. The count for each base (ACGT) is specified on its own space separated line where each element corresponds to one column. The order of the lines for the bases is A,C,G and finally T.
13 13 3 1 54 1 1 1 0 3 2 5
13 39 5 53 0 1 50 1 0 37 0 17
17 2 37 0 0 52 3 0 53 8 37 12
11 0 9 0 0 0 0 52 1 6 15 20
JASPAR - This is similar to the raw format, having an identical header. The lines for each base however starts with a label for the nucleotide (A,C,G or T) and then the columns follow enclosed in brackets: [].
A [13 13 3 1 54 1 1 1 0 3 2 5 ]
C [13 39 5 53 0 1 50 1 0 37 0 17 ]
G [17 2 37 0 0 52 3 0 53 8 37 12 ]
T [11 0 9 0 0 0 0 52 1 6 15 20 ]
TRANSFAC - This is a TRANSFAC-like format having a header starting with “DE” then the matrix ID, the matrix name and the matrix class. The data itself is transposed as compared to the other formats, meaning that each line correspond to a column in the matrix. The column lines start with a number denoting the column index (counting
from 0). After that follows tab separated counts for each base in that column in the order: A,C,G and T. After the lines with the counts follows a final line containing the string: “XX”.
DE MA0048 NHLH1 bHLH
00 13 13 17 11
01 13 39 2 0
02 3 5 37 9
03 1 53 0 0
04 54 0 0 0
05 1 1 52 0
06 1 50 3 0
07 1 1 0 52
08 0 0 53 1
09 3 37 8 6
10 2 0 37 15
11 5 17 12 20
XX
It depends on what you have used it for. If you simply want to acknowledge you used the last version, use
Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., et al. (2015). JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2016 44: D110-D115.
Otherwise:
This is due to historical reasons. JASPAR CORE was originally built in order to create familial binding profiles for as many structural classes of transcription factor classes as possible. In some experimental literature, only matrices and not sequences are available. For this project, we were forced to include some matrices to gain coverage of certain binding site classes. For recent additions, it is a requirement to have the sequences available.
There are two principal explanations. The most likely is that we were not aware of your work: please let us know! The other possible reason is that the publication did not live up to the demands of the curators. As we have human curation of all JASPAR CORE matrices, this is to some degree an arbitrary call – we are happy to discuss it with you.
We appreciate that other services wants to link to JASPAR. However, if your are using the CPU-intensive services (matrix comparison, randomization or clustering), please ask the maintainers (see contact information below) before you do this – otherwise your server might be rejected without warning. In that case, we strongly suggest setting up a local JASPAR database, as the database and resources are freely available.
JASPAR was originally the name of a master student project algorithm for comparing matrix profiles, an obscure tribute to an even more obscure dialog from the Black Adder episode “The Black Seal” between the Seven Most Evil Men in the Kingdom:
…and with all haste, we will meet at Old Jaspar’s tavern
How is old Jaspar these days?
Dead.
How?
I killed him.
[Loud cheer].
We appreciate feedback – criticism as well as suggestions for new content. Development and supervision of the JASPAR project is coordinated by Albin Sandelin, Boris Lenhard and Wyeth Wasserman.