Maybe take a walk? #122

MatthewRalston · 2024-02-20T18:51:26Z

Here, the kmerdb project will be pivoting after the 0.7.6 release to use a modified .kdb format and no backwards compatibility is explicitly planned.

The goal of the refactor/pivot is to introduce networkx and/or cugraph to the possible toolkits used to facilitate the implementation of an assembly algorithm AND/OR a .kdbg format specification for exact .fasta assembly or approximate 'Eulerian' walk (.fastq) through the rows specified in the "Assembly algorithm prototype" Github milestone.

The text was updated successfully, but these errors were encountered:

MatthewRalston · 2024-03-05T02:16:04Z

Today, progress was made on generating the graph for the Eulerian walk. Metadata format/schema largely remains the same, and so far the main schema consists of three col. N1 n2 and w. The w is specified for the Eulerian path, however that might be implemented.

Complete the metadata writing subroutine
discourage the direct dumping of a jank structure >:)
reconstruct the reader class
don't commit atomically until the writing and refactoring is completed

Started reading the wiki article on plessy v Ferguson. I know Americans on the left love to shout from the rooftops that kumbaya for all is here. It's '24 after all. And yet, 50% of the populace was not dissuaded by the use of flagrantly racist language and rhetoric by the ex President Donnie grump (voldy 2.0)

And what's more startling I guess is the disparities between celebrities on television, and the reality of many as and other minority and also white persons in different housing districts than the elite. whoa but just on the deep hip hop dose on the st life thing is making my head ache. if you know you know. people get ignorant about waste and ignorant about love. tytgs.

I was fired for missing one ducking email and bc I wrestle about forgiving you for not defending me on that issue. You're privileged and I got dumped. Steve and Deb, you're no different. Doesn't matter what you thought now or then, I stuck up with your group when you needed an extra head. I got the work done.

Go to hell.

That's what I think of the goddak establishment.

Here I am door dashing and begging my parents to cut my interest rates so I can afford to eat. You pos won't ever understand that.and I hope you never have to. But miss me with that kumbaya ish rn.

on the upside, fuggin hate my brand but love the game. different directions both re self study, metrics, profiling, and graphics. still need a more concrete problem to make the feature on the algorithm biorxiv. that's what's got me stuck in loops re money.

that would be the real assembly algo and the future goal, but we might only have time for a networkx cpu strategy and then a cugraph assembler could leverage the indexing structure .kdb.gi to produce tuples rapidly to python to transfer to the gpu for a cugraph graph traversal after trimming. the networkx assembler would leverage the same thing. this is essentialy milestone 2

because id rather take the right whip out into the country and gather field samples, than to get stuck in a wfh situation churning my money on finding better digital samples when i'd rather do something combining field, wet bench, and then maybe some fastq exploration with maybe a model of the graph and the best case scenarios re: known genome (eco, bsub, cdiff, etc) full assembly (n50, ng50, contig count, orf count/gene count, pfam stats, other orthology/paralogy metrics, contig diversity, read diversity), and/or approximate Eulerian walk (after edge and node trimming strategies, followed by like.... idk yet.)

MatthewRalston · 2024-03-05T23:56:19Z

Is this where I migrate plans from issue to milestone and/or documentation by modifying obj in comments and then official planning checkmarks?

First block could be '\n' deltimited rows. count vec (n1) and index (n2==n1) [n2 is the 4**k dimensional 1-tuple/vector)
Second block could be delimited with uWu. edge 2-tuple vector/array (n1) and weight (n2) np.array [n1 is the number of possible edges (WOOF), n2 driver variable may be --sparse or (default: --)inclusive. Inclusive makes the full matrix (don't want) in flattened form and then compressed. Sparse storage may make the n2 more reasonable, but the adjacency structure in unstructured form may make indexing worse. If the algorithm for accessing index rapidly is written in Python (or Cython), then accessing the index table(.kdbgi) should be trivial. If this feature is developed more, an in-memory solver may be next. If the --sparse option is developed.
Third block also closed with uwu.

[ --sparse: Data is in the "adjacency list structure" (collapsed or sparse, and, preferred) ]

Just remove the edges where weight=0

[ DEFAULT: Data is the adjacency matrix (full rank of the n1xn1 matrix)

Human readability

[ Adapt the index function for .kdb.gi or .kdb.i file. ] sike. to backlog if yee dare
[ Describe the graph edge list format in the readme, quickstart and website in the github-pages branch ]

MatthewRalston · 2024-03-18T17:58:51Z

#122 #123 #124 #125 #126

Neighbor construction working out well. A dictionary of dictionaries is being used to focus on local "neighbor" space only: i.e. the 8 adjacent kmers to any id.

This has been spun off in a utility function in kmer.py.

Adding some more documentation to the kmerdb.graph module as it prototypes most of what is required to write and read (some validations) .kdbg files.

Edge list and data structure still in planning.

MatthewRalston · 2024-07-31T02:11:31Z

Need to revive this stale issue. Where it left off was I was looking at Networkx and visualizers. I got sidetracked on dot format, and PyDot, and I'd like to add that support.

- PyDot support OR direct interop
- NetworkX
- Cython
- cugraph routines

Cugraph may be needed in the overall assembly algorithm, to simplify or accelerate traversals with depth-first-search, and associate inter-node metrics, scores, and optimizer.

Of course, in order to implement or refine any method of this sort, I need first to be able to check structure and progress made from naive approaches, during the score formulation, weighting, and refinement stage.

MatthewRalston added this to the Assembly algorithm prototype milestone Feb 20, 2024

MatthewRalston self-assigned this Feb 20, 2024

MatthewRalston added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed labels Feb 20, 2024

MatthewRalston mentioned this issue Mar 5, 2024

3-tuple i think #124

Merged

MatthewRalston mentioned this issue Mar 20, 2024

Assert subsequent shredded kmers are neighborly, skip if fails (fastq centered) #123

Open

MatthewRalston mentioned this issue Mar 29, 2024

Expanded row metadata for graph format #130

Open

MatthewRalston modified the milestones: edge list tuple, Interface Revision Jul 14, 2024

MatthewRalston added bug Something isn't working good first issue Good for newcomers question Further information is requested wontfix This will not be worked on dependencies Pull requests that update a dependency file labels Jul 31, 2024

MatthewRalston modified the milestones: Interface Revision, v0.8 D2, node, edge, path formats, node/edge labeling, tetranucleotides Jul 31, 2024

MatthewRalston changed the title ~~Graph algorithms~~ Maybe take a walk? Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe take a walk? #122

Maybe take a walk? #122

MatthewRalston commented Feb 20, 2024 •

edited

Loading

MatthewRalston commented Mar 5, 2024 •

edited

Loading

MatthewRalston commented Mar 5, 2024 •

edited

Loading

MatthewRalston commented Mar 18, 2024

MatthewRalston commented Jul 31, 2024

Maybe take a walk? #122

Maybe take a walk? #122

Comments

MatthewRalston commented Feb 20, 2024 • edited Loading

MatthewRalston commented Mar 5, 2024 • edited Loading

MatthewRalston commented Mar 5, 2024 • edited Loading

MatthewRalston commented Mar 18, 2024

MatthewRalston commented Jul 31, 2024

MatthewRalston commented Feb 20, 2024 •

edited

Loading

MatthewRalston commented Mar 5, 2024 •

edited

Loading

MatthewRalston commented Mar 5, 2024 •

edited

Loading