lf.optimise() function error: ValueError: Initial parameter values must evaluate to a finite value, not -inf #2358
-
|
I was trying to create a max likelihood function using large Pfam trees and their corresponding alignments, and I keep getting the same error for each Pfam I try; ValueError: Initial parameter values must evaluate to a finite value, not -inf. I have tried a couple of test files of much smaller alignments and trees, and the code works just fine. I would like to know why the Cogent3 algorithm keeps calculating this -inf value. Thank you. from cogent3 import load_tree, load_aligned_seqs
from cogent3.evolve.substitution_model import EmpiricalProteinMatrix
from cogent3.parse.paml_matrix import PamlMatrixParser
#Open fasta file, assign molecule type
aln = load_aligned_seqs("Pfam.fasta", moltype="protein")
#Load substitution model
matrix_file = open("Q.pfam")
empirical_matrix , empirical_frequencies = PamlMatrixParser(matrix_file)
sm = EmpiricalProteinMatrix(empirical_matrix, empirical_frequencies)
#Open tree file
tree = load_tree("Pfam.tree")
#Create likelihood function
lf = sm.make_likelihood_function(tree)
lf.set_alignment(aln)
#Optimize and show function
lf.optimise(max_restarts=5, tolerance=1e-9, show_progress=False)
lf |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
|
Hi @VinnyCa , |
Beta Was this translation helpful? Give feedback.
-
|
Hi @VinnyCa , I don't have a solution yet and this is a tricky one to figure out! I have established that there appears to be a sensitivity to the number of sequences. If you add the following statements to your code num = 300
names = list(aln.names)
# uncomment following for reverse order
# names = list(aln.names)[::-1]
names = names[:num]
aln = aln.take_seqs(names)
tree = tree.get_sub_tree(names)
# your other code
# the following ensures it exits quickly if there's no errors (but that's not a properly optimised function)
lf.optimise(max_evaluations=100, limit_action="ignore")then both work. Changing From these, clearly there's some aspect of the data that's causing this issue. It's unclear what -- is it the subtree tree or a subset of the sequences. It would help me out if you can look at the sequences in more detail to see if that's where the issues lie. First, make sure you have installed the latest version of Cogent 3. Then add the following statements to the top of your scripts. import os
os.environ["COGENT3_NEW_TYPE"] = "1"This forces your code to use the new type objects, which you need to do in order to use the following method. Look and see if any specific sequences are causing this problem by looking at the sequence lengths (e.g. You can also try random subsampling to get the smallest possible set to trigger the problem. If I get time next week I'll try again. |
Beta Was this translation helpful? Give feedback.
Hi @VinnyCa ,
I don't have a solution yet and this is a tricky one to figure out!
I have established that there appears to be a sensitivity to the number of sequences. If you add the following statements to your code
then both work. Changing
numI found that PF00013 fails at 614 sequences for original order, and 496 for revers…