Steven Roberts ipynb  over 9 years ago

Commit id: 39fd54d43e80eb02b7e29487211cc7825385357c

deletions | additions      

       

{  "metadata": {  "name": "",  "signature": "sha256:a48e61a48adc563169fa3bed46e537b56eec6fd454ab7103bda0bed9e44a7557" "sha256:170932c4f56fbebfdec9c5af663de2db4f48c1ae2692c73a7ca2d787f1a5a115"  },  "nbformat": 3,  "nbformat_minor": 0, 

"cell_type": "code",  "collapsed": false,  "input": [  "%%bash \n",  "head "\n",  "!head  ./data/2014.06.20*" ],  "language": "python",  "metadata": {},  "outputs": [  {  "output_type": "stream",  "stream": "stderr", "stdout",  "text": [  "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",  "head: ./data/2014.06.20*: No such file or directory\n" "==> ./data/2014.06.20.gt1.8_gt3adjactentProbes.gff <==\r\n",  "scaffold100\tOysterV9\tprobes\t636974\t637034\t1\t.\t.\tscaffold100:+:636974\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t637084\t637143\t1\t.\t.\tscaffold100:+:637084\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t637209\t637276\t1\t.\t.\tscaffold100:+:637209\r",  "\r\n",  "scaffold102\tOysterV9\tprobes\t303196\t303251\t1\t.\t.\tscaffold102:+:303196\r",  "\r\n",  "scaffold102\tOysterV9\tprobes\t303336\t303394\t1\t.\t.\tscaffold102:+:303336\r",  "\r\n",  "scaffold102\tOysterV9\tprobes\t303521\t303576\t1\t.\t.\tscaffold102:+:303521\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t634021\t634085\t1\t.\t.\tscaffold1032:+:634021\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t634156\t634218\t1\t.\t.\tscaffold1032:+:634156\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t634281\t634333\t1\t.\t.\tscaffold1032:+:634281\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t700173\t700230\t1\t.\t.\tscaffold1032:+:700173\r",  "\r\n",  "\r\n",  "==> ./data/2014.06.20.lt1.8_gt3adjactentProbes.gff <==\r\n",  "scaffold100\tOysterV9\tprobes\t804272\t804322\t1\t.\t.\tscaffold100:+:804272\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804407\t804457\t1\t.\t.\tscaffold100:+:804407\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804532\t804582\t1\t.\t.\tscaffold100:+:804532\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804652\t804703\t1\t.\t.\tscaffold100:+:804652\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804787\t804844\t1\t.\t.\tscaffold100:+:804787\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804897\t804960\t1\t.\t.\tscaffold100:+:804897\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805157\t805213\t1\t.\t.\tscaffold100:+:805157\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805287\t805337\t1\t.\t.\tscaffold100:+:805287\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805402\t805452\t1\t.\t.\tscaffold100:+:805402\r",  "\r\n",  "scaffold1016\tOysterV9\tprobes\t202657\t202722\t1\t.\t.\tscaffold1016:+:202657\r",  "\r\n"  ]  }  ],  "prompt_number": 13 2  },  {  "cell_type": "code",  "collapsed": false,  "input": [], [  "!wc -l ./data/2014.06.20*"  ],  "language": "python",  "metadata": {},  "outputs": [ 

"output_type": "stream",  "stream": "stdout",  "text": [  "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such " 164 ./data/2014.06.20.gt1.8_gt3adjactentProbes.gff\r\n",  " 362 ./data/2014.06.20.lt1.8_gt3adjactentProbes.gff\r\n",  " 526 total\r\n"  ]  }  ],  "prompt_number": 4  },  {  "cell_type": "code",  "collapsed": false,  "input": [  "!head ./data/2014.06.20.lt1.8_gt3adjactentProbes.gff\n"  ],  "language": "python",  "metadata": {},  "outputs": [  {  "output_type": "stream",  "stream": "stdout",  "text": [  "scaffold100\tOysterV9\tprobes\t804272\t804322\t1\t.\t.\tscaffold100:+:804272\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804407\t804457\t1\t.\t.\tscaffold100:+:804407\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804532\t804582\t1\t.\t.\tscaffold100:+:804532\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804652\t804703\t1\t.\t.\tscaffold100:+:804652\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804787\t804844\t1\t.\t.\tscaffold100:+:804787\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804897\t804960\t1\t.\t.\tscaffold100:+:804897\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805157\t805213\t1\t.\t.\tscaffold100:+:805157\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805287\t805337\t1\t.\t.\tscaffold100:+:805287\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805402\t805452\t1\t.\t.\tscaffold100:+:805402\r",  "\r\n",  "scaffold1016\tOysterV9\tprobes\t202657\t202722\t1\t.\t.\tscaffold1016:+:202657\r",  "\r\n"  ]  }  ],  "prompt_number": 5  },  {  "cell_type": "heading",  "level": 1,  "metadata": {},  "source": [  "Merging Adjacent "  ]  },  {  "cell_type": "code",  "collapsed": false,  "input": [  "!bedtools merge -d "  ],  "language": "python",  "metadata": {},  "outputs": [  {  "output_type": "stream",  "stream": "stdout",  "text": [  "bedtools: flexible tools for genome arithmetic and DNA sequence analysis.\r\n",  "usage: bedtools [options]\r\n",  "\r\n",  "The bedtools sub-commands include:\r\n",  "\r\n",  "[ Genome arithmetic ]\r\n",  " intersect Find overlapping intervals in various ways.\r\n",  " window Find overlapping intervals within a window around an interval.\r\n",  " closest Find the closest, potentially non-overlapping interval.\r\n",  " coverage Compute the coverage over defined intervals.\r\n",  " map Apply a function to a column for each overlapping interval.\r\n",  " genomecov Compute the coverage over an entire genome.\r\n",  " merge Combine overlapping/nearby intervals into a single interval.\r\n",  " cluster Cluster (but don't merge) overlapping/nearby intervals.\r\n",  " complement Extract intervals _not_ represented by an interval file.\r\n",  " subtract Remove intervals based on overlaps b/w two files.\r\n",  " slop Adjust the size of intervals.\r\n",  " flank Create new intervals from the flanks of existing intervals.\r\n",  " sort Order the intervals in a file.\r\n",  " random Generate random intervals in a genome.\r\n",  " shuffle Randomly redistrubute intervals in a genome.\r\n",  " sample Sample random records from  file or directory\r\n",  "pwd: error retrieving current directory: getcwd: cannot access parent directories: No such using reservoir sampling.\r\n",  " annotate Annotate coverage of features from multiple files.\r\n",  "\r\n",  "[ Multi-way  file or directory\r\n",  "pwd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\r\n" comparisons ]\r\n",  " multiinter Identifies common intervals among multiple interval files.\r\n",  " unionbedg Combines coverage intervals from multiple BEDGRAPH files.\r\n",  "\r\n",  "[ Paired-end manipulation ]\r\n",  " pairtobed Find pairs that overlap intervals in various ways.\r\n",  " pairtopair Find pairs that overlap other pairs in various ways.\r\n",  "\r\n",  "[ Format conversion ]\r\n",  " bamtobed Convert BAM alignments to BED (& other) formats.\r\n",  " bedtobam Convert intervals to BAM records.\r\n",  " bamtofastq Convert BAM records to FASTQ records.\r\n",  " bedpetobam Convert BEDPE intervals to BAM records.\r\n",  " bed12tobed6 Breaks BED12 intervals into discrete BED6 intervals.\r\n",  "\r\n",  "[ Fasta manipulation ]\r\n",  " getfasta Use intervals to extract sequences from a FASTA file.\r\n",  " maskfasta Use intervals to mask sequences from a FASTA file.\r\n",  " nuc Profile the nucleotide content of intervals in a FASTA file.\r\n",  "\r\n",  "[ BAM focused tools ]\r\n",  " multicov Counts coverage from multiple BAMs at specific intervals.\r\n",  " tag Tag BAM alignments based on overlaps with interval files.\r\n",  "\r\n",  "[ Statistical relationships ]\r\n",  " jaccard Calculate the Jaccard statistic b/w two sets of intervals.\r\n",  " reldist Calculate the distribution of relative distances b/w two files.\r\n",  " fisher Calculate Fisher statistic b/w two feature files.\r\n",  "\r\n",  "[ Miscellaneous tools ]\r\n",  " overlap Computes the amount of overlap from two intervals.\r\n",  " igv Create an IGV snapshot batch script.\r\n",  " links Create a HTML page of links to UCSC locations.\r\n",  " makewindows Make interval \"windows\" across a genome.\r\n",  " groupby Group by common cols. & summarize oth. cols. (~ SQL \"groupBy\")\r\n",  " expand Replicate lines based on lists of values in columns.\r\n",  "\r\n",  "[ General help ]\r\n",  " --help Print this help menu.\r\n",  " --version What version of bedtools are you using?.\r\n",  " --contact Feature requests, bugs, mailing lists, etc.\r\n",  "\r\n"  ]  }  ],  "prompt_number": 10 6  },  {  "cell_type": "code",         

{  "metadata": {  "name": "",  "signature": "sha256:659150d8e947a26b9e0c73c180e2e957dad8ed33b66e9f0c18a2361e4e8f5666" "sha256:170932c4f56fbebfdec9c5af663de2db4f48c1ae2692c73a7ca2d787f1a5a115"  },  "nbformat": 3,  "nbformat_minor": 0, 

"cell_type": "code",  "collapsed": false,  "input": [  "%%bash \n",  "head "\n",  "!head  ./data/2014.06.20*" ],  "language": "python",  "metadata": {},  "outputs": [  {  "output_type": "stream",  "stream": "stderr", "stdout",  "text": [  "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",  "head: ./data/2014.06.20*: No such file or directory\n" "==> ./data/2014.06.20.gt1.8_gt3adjactentProbes.gff <==\r\n",  "scaffold100\tOysterV9\tprobes\t636974\t637034\t1\t.\t.\tscaffold100:+:636974\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t637084\t637143\t1\t.\t.\tscaffold100:+:637084\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t637209\t637276\t1\t.\t.\tscaffold100:+:637209\r",  "\r\n",  "scaffold102\tOysterV9\tprobes\t303196\t303251\t1\t.\t.\tscaffold102:+:303196\r",  "\r\n",  "scaffold102\tOysterV9\tprobes\t303336\t303394\t1\t.\t.\tscaffold102:+:303336\r",  "\r\n",  "scaffold102\tOysterV9\tprobes\t303521\t303576\t1\t.\t.\tscaffold102:+:303521\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t634021\t634085\t1\t.\t.\tscaffold1032:+:634021\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t634156\t634218\t1\t.\t.\tscaffold1032:+:634156\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t634281\t634333\t1\t.\t.\tscaffold1032:+:634281\r",  "\r\n",  "scaffold1032\tOysterV9\tprobes\t700173\t700230\t1\t.\t.\tscaffold1032:+:700173\r",  "\r\n",  "\r\n",  "==> ./data/2014.06.20.lt1.8_gt3adjactentProbes.gff <==\r\n",  "scaffold100\tOysterV9\tprobes\t804272\t804322\t1\t.\t.\tscaffold100:+:804272\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804407\t804457\t1\t.\t.\tscaffold100:+:804407\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804532\t804582\t1\t.\t.\tscaffold100:+:804532\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804652\t804703\t1\t.\t.\tscaffold100:+:804652\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804787\t804844\t1\t.\t.\tscaffold100:+:804787\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804897\t804960\t1\t.\t.\tscaffold100:+:804897\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805157\t805213\t1\t.\t.\tscaffold100:+:805157\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805287\t805337\t1\t.\t.\tscaffold100:+:805287\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805402\t805452\t1\t.\t.\tscaffold100:+:805402\r",  "\r\n",  "scaffold1016\tOysterV9\tprobes\t202657\t202722\t1\t.\t.\tscaffold1016:+:202657\r",  "\r\n"  ]  }  ],  "prompt_number": 13 2  },  {  "cell_type": "code",  "collapsed": false,  "input": [  "pwd" "!wc -l ./data/2014.06.20*"  ],  "language": "python",  "metadata": {}, 

"output_type": "stream",  "stream": "stdout",  "text": [  "Traceback (most recent call last):\n",  " File \"/Users/sr320/anaconda/lib/python2.7/site-packages/IPython/core/ultratb.py\", line 776, in structured_traceback\n", 164 ./data/2014.06.20.gt1.8_gt3adjactentProbes.gff\r\n",  " records = _fixed_getinnerframes(etb, context, tb_offset)\n", 362 ./data/2014.06.20.lt1.8_gt3adjactentProbes.gff\r\n",  " File \"/Users/sr320/anaconda/lib/python2.7/site-packages/IPython/core/ultratb.py\", line 230, in wrapped\n",  " return f(*args, **kwargs)\n",  " File \"/Users/sr320/anaconda/lib/python2.7/site-packages/IPython/core/ultratb.py\", line 259, in _fixed_getinnerframes\n",  " records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))\n",  " File \"/Users/sr320/anaconda/lib/python2.7/inspect.py\", line 1044, in getinnerframes\n",  " framelist.append((tb.tb_frame,) + getframeinfo(tb, context))\n",  " File \"/Users/sr320/anaconda/lib/python2.7/inspect.py\", line 1004, in getframeinfo\n",  " filename = getsourcefile(frame) or getfile(frame)\n",  " File \"/Users/sr320/anaconda/lib/python2.7/inspect.py\", line 454, in getsourcefile\n",  " if hasattr(getmodule(object, filename), '__loader__'):\n",  " File \"/Users/sr320/anaconda/lib/python2.7/inspect.py\", line 483, in getmodule\n",  " file = getabsfile(object, _filename)\n",  " File \"/Users/sr320/anaconda/lib/python2.7/inspect.py\", line 467, in getabsfile\n",  " return os.path.normcase(os.path.abspath(_filename))\n",  " File \"/Users/sr320/anaconda/lib/python2.7/posixpath.py\", line 371, in abspath\n",  " cwd = os.getcwd()\n",  "OSError: [Errno 2] No such file or directory\n" 526 total\r\n"  ]  }  ],  "prompt_number": 4  }, {  "cell_type": "code",  "collapsed": false,  "input": [  "!head ./data/2014.06.20.lt1.8_gt3adjactentProbes.gff\n"  ],  "language": "python",  "metadata": {},  "outputs": [  {  "output_type": "stream",  "stream": "stderr", "stdout",  "text": [  "ERROR: Internal Python error in the inspect module.\n",  "Below is the traceback from this internal error.\n",  "\n",  "\n",  "Unfortunately, your original traceback can not be constructed.\n",  "\n" "scaffold100\tOysterV9\tprobes\t804272\t804322\t1\t.\t.\tscaffold100:+:804272\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804407\t804457\t1\t.\t.\tscaffold100:+:804407\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804532\t804582\t1\t.\t.\tscaffold100:+:804532\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804652\t804703\t1\t.\t.\tscaffold100:+:804652\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804787\t804844\t1\t.\t.\tscaffold100:+:804787\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t804897\t804960\t1\t.\t.\tscaffold100:+:804897\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805157\t805213\t1\t.\t.\tscaffold100:+:805157\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805287\t805337\t1\t.\t.\tscaffold100:+:805287\r",  "\r\n",  "scaffold100\tOysterV9\tprobes\t805402\t805452\t1\t.\t.\tscaffold100:+:805402\r",  "\r\n",  "scaffold1016\tOysterV9\tprobes\t202657\t202722\t1\t.\t.\tscaffold1016:+:202657\r",  "\r\n"  ]  }  ],  "prompt_number": 5  }, {  "cell_type": "heading",  "level": 1,  "metadata": {},  "source": [  "Merging Adjacent "  ]  },  {  "cell_type": "code",  "collapsed": false,  "input": [  "!bedtools merge -d "  ],  "language": "python",  "metadata": {},  "outputs": [  {  "ename": "OSError",  "evalue": "[Errno 2] No such file or directory",  "output_type": "pyerr",  "traceback": "" "stream",  "stream": "stdout",  "text": [  "bedtools: flexible tools for genome arithmetic and DNA sequence analysis.\r\n",  "usage: bedtools [options]\r\n",  "\r\n",  "The bedtools sub-commands include:\r\n",  "\r\n",  "[ Genome arithmetic ]\r\n",  " intersect Find overlapping intervals in various ways.\r\n",  " window Find overlapping intervals within a window around an interval.\r\n",  " closest Find the closest, potentially non-overlapping interval.\r\n",  " coverage Compute the coverage over defined intervals.\r\n",  " map Apply a function to a column for each overlapping interval.\r\n",  " genomecov Compute the coverage over an entire genome.\r\n",  " merge Combine overlapping/nearby intervals into a single interval.\r\n",  " cluster Cluster (but don't merge) overlapping/nearby intervals.\r\n",  " complement Extract intervals _not_ represented by an interval file.\r\n",  " subtract Remove intervals based on overlaps b/w two files.\r\n",  " slop Adjust the size of intervals.\r\n",  " flank Create new intervals from the flanks of existing intervals.\r\n",  " sort Order the intervals in a file.\r\n",  " random Generate random intervals in a genome.\r\n",  " shuffle Randomly redistrubute intervals in a genome.\r\n",  " sample Sample random records from file using reservoir sampling.\r\n",  " annotate Annotate coverage of features from multiple files.\r\n",  "\r\n",  "[ Multi-way file comparisons ]\r\n",  " multiinter Identifies common intervals among multiple interval files.\r\n",  " unionbedg Combines coverage intervals from multiple BEDGRAPH files.\r\n",  "\r\n",  "[ Paired-end manipulation ]\r\n",  " pairtobed Find pairs that overlap intervals in various ways.\r\n",  " pairtopair Find pairs that overlap other pairs in various ways.\r\n",  "\r\n",  "[ Format conversion ]\r\n",  " bamtobed Convert BAM alignments to BED (& other) formats.\r\n",  " bedtobam Convert intervals to BAM records.\r\n",  " bamtofastq Convert BAM records to FASTQ records.\r\n",  " bedpetobam Convert BEDPE intervals to BAM records.\r\n",  " bed12tobed6 Breaks BED12 intervals into discrete BED6 intervals.\r\n",  "\r\n",  "[ Fasta manipulation ]\r\n",  " getfasta Use intervals to extract sequences from a FASTA file.\r\n",  " maskfasta Use intervals to mask sequences from a FASTA file.\r\n",  " nuc Profile the nucleotide content of intervals in a FASTA file.\r\n",  "\r\n",  "[ BAM focused tools ]\r\n",  " multicov Counts coverage from multiple BAMs at specific intervals.\r\n",  " tag Tag BAM alignments based on overlaps with interval files.\r\n",  "\r\n",  "[ Statistical relationships ]\r\n",  " jaccard Calculate the Jaccard statistic b/w two sets of intervals.\r\n",  " reldist Calculate the distribution of relative distances b/w two files.\r\n",  " fisher Calculate Fisher statistic b/w two feature files.\r\n",  "\r\n",  "[ Miscellaneous tools ]\r\n",  " overlap Computes the amount of overlap from two intervals.\r\n",  " igv Create an IGV snapshot batch script.\r\n",  " links Create a HTML page of links to UCSC locations.\r\n",  " makewindows Make interval \"windows\" across a genome.\r\n",  " groupby Group by common cols. & summarize oth. cols. (~ SQL \"groupBy\")\r\n",  " expand Replicate lines based on lists of values in columns.\r\n",  "\r\n",  "[ General help ]\r\n",  " --help Print this help menu.\r\n",  " --version What version of bedtools are you using?.\r\n",  " --contact Feature requests, bugs, mailing lists, etc.\r\n",  "\r\n"  ]  }  ],  "prompt_number": 14 6  },  {  "cell_type": "code",         

In direct comparison of methylation for an individual oyster prior to and following heat shock approximately 10k differentially methylated features were identified. For all oysters a majority of the features were hypomethylated. Specifically, for oysters #2, #4, and #6 the number of hypomethylated features was 7224, 6560, and 7645, respectively.   In order to examine regions that were differentially methylated across all three oysters features were identified where there were at least 3 adjacent probes with signficantly differentially methylated. A total of 362 features were identified as hypomethylated and 164 features as hypermethylated.