Ino de Bruijn Investigated kmer composition some more. Turns out low purity contigs seem to mostly have false kmers.  almost 10 years ago

Commit id: 4b729a2824f992f418d34c21cec232a4de81c708

deletions | additions      

       

{  "metadata": {  "name": "",  "signature": "sha256:cc8a524cf47b895c70d593384f46f250e734d4ce7ee895aa4bf6a6d4613ab211" "sha256:e6fa1c359ab68448d7b94e0aa79a8e8dd1f8d49618ff0b50aa3f7859347106b0"  },  "nbformat": 3,  "nbformat_minor": 0, 

"language": "python",  "metadata": {},  "outputs": [],  "prompt_number": 56 1  },  {  "cell_type": "code", 

"output_type": "stream",  "stream": "stdout",  "text": [  "commit ea42e79505c9ad5e0f136037e2746abc4f7cd4c7\r\n", 2b4eef9ffd7530406816c30b281b635306a9b0db\r\n",  "Author: Ino de Bruijn \r\n",  "Date: Mon Tue  Jul 14 20:18:45 15 18:25:21  2014 +0200\r\n", "\r\n",  " Add jellyfish interface\r\n" Further work on counting kmers. Can output tsv now of contig with kmer origin percentage\r\n"  ]  }  ], 

"language": "python",  "metadata": {},  "outputs": [],  "prompt_number": 132 3  },  {  "cell_type": "code", 

"language": "python",  "metadata": {},  "outputs": [],  "prompt_number": 134 4  },  {  "cell_type": "code", 

"outputs": [],  "prompt_number": 137  },  {  "cell_type": "code",  "collapsed": false,  "input": [  "jfr.write_refs_kmer_count_of_fasta(\"/media/milou/glob/projects/masmvali-partdeux/reassembly-filtered-reads/Sample_1ng_even/ref-jellyfish-test/low-purity/contigs-velvetnoscaf31-min500-purity-below-09.fa\",\n",  " 31,\n",  " \"/media/milou/glob/projects/masmvali-partdeux/reassembly-filtered-reads/Sample_1ng_even/ref-jellyfish-test/low-purity/val/contig-ref-kmer-composition-31.tsv\")"  ],  "language": "python",  "metadata": {},  "outputs": [],  "prompt_number": 6  },  {  "cell_type": "markdown",  "metadata": {},  "source": [  "Started at 18:00" 14:00"  ]  },  { 

],  "prompt_number": 113  },  {  "cell_type": "markdown",  "metadata": {},  "source": [  "After inspecting output.\n",  "\n",  "- Low purity contains sometimes more false kmers than kmers from other species\n",  "- What is the LCA for each incorrect kmer? Maybe should be using Kraken instead"  ]  },  {  "cell_type": "code",  "collapsed": false,