This week I have been looking further into alignment algorithms and have tested a variation of Smith-Waterman. I found this version on stack Overflow. However testing it I found that it was using too much memory for what I needed to do.
I also tested reading in files in the format of FASTA and tested with stripping a title from the header of the file.
I also looked at Dynamic time warping (DTW) which is used to for measuring similarities in sequences. It is often used for speech recognition as it deals with variants with speed or the time time taken.
I experimented with a Java package called Jsequence, but could only get it to compare the two sequences and display the resulting difference and a numerical value for how different the two are.
I researched ClustalW but found that the majority people did not like it as it can make mistakes early on and due to it's iterative behaviour it is unlikely to correct them nearer completion. It was suggested that I could look at the MUSCLE algorithm. I found that MUSCLE is the roughly the same speed but allows for re optimisation of columns due so is more likely to correct it's self and be more accurate.
Another suggestion was MAFFT. This is the same speed as MUSCLE but allows multiple inputs of sequences.
I also looked at TCoffee which like MAFFT allows multiple sequences and is more accurate but also takes a lot long to complete.
I also tested reading in files in the format of FASTA and tested with stripping a title from the header of the file.
I also looked at Dynamic time warping (DTW) which is used to for measuring similarities in sequences. It is often used for speech recognition as it deals with variants with speed or the time time taken.
I experimented with a Java package called Jsequence, but could only get it to compare the two sequences and display the resulting difference and a numerical value for how different the two are.
I researched ClustalW but found that the majority people did not like it as it can make mistakes early on and due to it's iterative behaviour it is unlikely to correct them nearer completion. It was suggested that I could look at the MUSCLE algorithm. I found that MUSCLE is the roughly the same speed but allows for re optimisation of columns due so is more likely to correct it's self and be more accurate.
Another suggestion was MAFFT. This is the same speed as MUSCLE but allows multiple inputs of sequences.
I also looked at TCoffee which like MAFFT allows multiple sequences and is more accurate but also takes a lot long to complete.