Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Segments in Generated PDB Files Don't Match Reference PDB File #265

Open
patricia-rocha opened this issue Aug 9, 2024 · 2 comments

Comments

@patricia-rocha
Copy link

While performing motif scaffolding with RFdifussion, I have noticed that the fixed segments in the generated PDB files do not exactly match those in the reference PDB file. For example, I'm using the following contig [B1-94/14-20/B103-111] and I have a PDB file in which the sequence includes Gly, Ser, Ser at positions 52, 52A, 53. However, in the generated PDB files, only Gly and Ser are present at positions 52 and 53, with the insertion at 52A being discarded.

Questions:

  1. Is this behavior expected, or is it a bug?
  2. Does anyone know a fix or workaround to ensure that the fixed segments, including insertions, are accurately preserved in the generated PDB files?
@roccomoretti
Copy link
Member

I'm unsure how well RFdiffusion works with insertion codes. You can try explicitly including things in the contig (something like [B1-52/B52A-52A/B53-94/14-20/B103-111]), but I haven't tried that at all. You may be better off just renumbering the input file to remove the insertion codes(*). RFdiffusion will renumber/relabel the output structure anyway.

*) Automated methods exist. I'd personally use Rosetta to do it, because that's what I'm familiar with. Other programs will also do it, though.

@patricia-rocha
Copy link
Author

Hi @roccomoretti

After further investigation, I identified the cause of the behavior I described. When RFdiffusion parses the PDB files using the parse_pdb_lines function, it only considers the sequence numbers, excluding any insertion codes. As a result, in my example, (Gly, 52), (Ser, 52A), (Ser, 53) becomes (Gly, 52), (Ser, 52), (Ser, 53). Then, duplicate sequence numbers are removed, leading to the final sequence being (Gly, 52), (Ser, 53), with the insertion (Ser, 52A) discarded.

To address this, I have opted to renumber my files without insertion codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants