kmer counter
- 0 Collaborators
Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing ...learn more
Project status: Under Development
            Intel Technologies
            
              
                Other
              
            
          
Overview / Usage
A basic task in bioinformatics is the analysis of many DNA
Sequences. Most analyses are based on indexing the sequences based on k-long sub-sequences
(K-mers). Most of the algorithms for DNA sequence analysis suffer from excessive memory usage and runtime. Today, with the technology advances of reading DNA sequences, efficiency of these kind this kind of algorithms is a very important mission
Methodology / Approach
we will use an extant k-mer counting algorithm as a pattern detector that used in bioinformatics studies. We will create a new algorithm that based on our faculty advisor.
The input will be a DNA strings and each k-mer x ∈{A,C,G,T}^k
Our workflow will take place in the following steps:
- Finding the best algorithm for the project: first, we must find a k-mer counting algorithm that will match our goal, and stand in a two critical assumptions: it must use minimizer and the algorithm must be directed to k < 13.
- Understanding how to make the swap (new integration) between minimizer and UHSs (maybe need to study a new programming language) without changing any other part.
- Compare the two algorithms, the one with the Docks and the one with the minimizers and proving the efficiency of using UHSs.
Technologies Used
java, linux, python