Skip to content

MCScanX No Matches: Gene ID Mismatch in A. thaliana FASTA & GFF? #83

Description

@Fantacooke2

Hi everyone,

I'm running BLASTP (all_vs_all) as part of my analysis. I merged all FASTA files from different species into a single file (merge.fast) and used it both as a query and as the database. The BLASTP execution itself went fine.

However, I noticed a discrepancy in gene naming for Arabidopsis thaliana between the protein FASTA file and the annotation GFF file—both downloaded from NCBI. A subset of each file is provided below for reference.

FASTA File Sample:

>NP_001030613.1 hypothetical protein 1 [Arabidopsis thaliana]
...
>NP_001030614.1 Phosphoglycerate mutase-like family protein [Arabidopsis thaliana]
...
>NP_001030615.2 ECA1-like gametogenesis related family protein [Arabidopsis thaliana]
...

GFF File Sample:

NC_003070.9	RefSeq	gene	3631	5899	.	+	.	ID=gene-AT1G01010;Dbxref=Araport:AT1G01010,TAIR:AT1G01010,GeneID:839580
NC_003070.9	RefSeq	mRNA	3631	5899	.	+	.	ID=rna-NM_099983.2;Parent=gene-AT1G01010;Dbxref=Araport:AT1G01010,GenBank:NM_099983.2
...

I also converted the GFF file into a 4-column format as required for MCScanX:

at003070.9	AT1G01010	3631	5899
at003070.9	AT1G01020	6788	9130
...

Issue:

When I ran MCScanX using:

./MCScanX ../synteny/ortho_mc/blast.tsv

I got the following result:

Reading BLAST file and pre-processing  
Generating BLAST list  
0 matches imported (0 discarded)  
0 pairwise comparisons  
0 alignments generated  
Pairwise collinear blocks written to /synteny/ortho_mc/.collinearity [0.000 seconds elapsed]  
Writing multiple syntenic blocks to HTML files  
Done! [0.000 seconds elapsed]  

It seems like no matches were imported.

Possible Cause & Question:

I suspect that the discrepancy in gene naming conventions between the protein FASTA file (NP_ accessions) and the GFF file (ATxGxxxxx locus IDs) might be the issue.

Does anyone know of a method, tool, or reference file to map NCBI protein accessions (NP_) to TAIR/Araport gene locus IDs (ATxGxxxxx)?
Or is there a better way to resolve this issue for MCScanX?

Any help or pointers would be greatly appreciated!

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions