MCScanX No Matches: Gene ID Mismatch in A. thaliana FASTA & GFF?

Hi everyone,  

I'm running **BLASTP (all_vs_all)** as part of my analysis. I merged all FASTA files from different species into a single file (`merge.fast`) and used it both as a query and as the database. The BLASTP execution itself went fine.  

However, I noticed a discrepancy in gene naming for **Arabidopsis thaliana** between the **protein FASTA file** and the **annotation GFF file**—both downloaded from NCBI. A subset of each file is provided below for reference.  

### **FASTA File Sample:**  
```
>NP_001030613.1 hypothetical protein 1 [Arabidopsis thaliana]
...
>NP_001030614.1 Phosphoglycerate mutase-like family protein [Arabidopsis thaliana]
...
>NP_001030615.2 ECA1-like gametogenesis related family protein [Arabidopsis thaliana]
...
```

### **GFF File Sample:**  
```
NC_003070.9	RefSeq	gene	3631	5899	.	+	.	ID=gene-AT1G01010;Dbxref=Araport:AT1G01010,TAIR:AT1G01010,GeneID:839580
NC_003070.9	RefSeq	mRNA	3631	5899	.	+	.	ID=rna-NM_099983.2;Parent=gene-AT1G01010;Dbxref=Araport:AT1G01010,GenBank:NM_099983.2
...
```



I also converted the GFF file into a **4-column format** as required for MCScanX:  
```
at003070.9	AT1G01010	3631	5899
at003070.9	AT1G01020	6788	9130
...
```

### **Issue:**  
When I ran MCScanX using:  
```
./MCScanX ../synteny/ortho_mc/blast.tsv
```
I got the following result:  
```
Reading BLAST file and pre-processing  
Generating BLAST list  
0 matches imported (0 discarded)  
0 pairwise comparisons  
0 alignments generated  
Pairwise collinear blocks written to /synteny/ortho_mc/.collinearity [0.000 seconds elapsed]  
Writing multiple syntenic blocks to HTML files  
Done! [0.000 seconds elapsed]  
```
It seems like **no matches were imported**.  

### **Possible Cause & Question:**  
I suspect that the discrepancy in **gene naming conventions** between the **protein FASTA file (NP_ accessions)** and the **GFF file (ATxGxxxxx locus IDs)** might be the issue.  

Does anyone know of a method, tool, or reference file to **map NCBI protein accessions (NP_) to TAIR/Araport gene locus IDs (ATxGxxxxx)?**  
Or is there a better way to resolve this issue for MCScanX?  

Any help or pointers would be **greatly appreciated**!  

Thanks in advance!  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCScanX No Matches: Gene ID Mismatch in A. thaliana FASTA & GFF? #83

FASTA File Sample:

GFF File Sample:

Issue:

Possible Cause & Question:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MCScanX No Matches: Gene ID Mismatch in A. thaliana FASTA & GFF? #83

Description

FASTA File Sample:

GFF File Sample:

Issue:

Possible Cause & Question:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions