バイオインフォで扱うデータってcolumnが多い場合が多々あります。
例えば、10X Genomicsの公開データ(bam)をsamtools
で見てみると。
kimoton@DESKTOP-BL78EM7:~$ samtools view http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/2.1.0/pbmc8k/pbmc8k_possorted_genome_bam.bam | head -1 [knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged. [bam_header_read] EOF marker is absent. The input is probably truncated. ST-K00126:314:HFYL2BBXX:7:2103:14996:4725 272 1 10001 1 3S95M * 0 0 CCGTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC <)-7-)7<JF7<FA7---<<A-FFFF-A<<AA----FFAJJJAAAJJAFJJJF<JJJA---FFA-AFFFFJ<FJF-FFFA7FFFJ7JFFAJ7FAFF<A NH:i:3 HI:i:3 AS:i:91 nM:i:1 RE:A:I BC:Z:TCGTCACG QT:Z:AAFFFJJJ CR:Z:TTAACTCGTAGAAGGA CY:Z:AAFFFJJJJJJJJJJJ CB:Z:TTAACTCGTAGAAGGA-1 UR:Z:GTCCGGCGAC UY:Z:JJJJJJJJJJ UB:Z:GTCCGGCGAC RG:Z:pbmc8k:MissingLibrary:1:HFYL2BBXX:7
横長でとても見にくい。 データの内容がわかりづらい。。
こんな時はdatamash transpose
に渡してやりましょう。
kimoton@DESKTOP-BL78EM7:~$ samtools view http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/2.1.0/pbmc8k/pbmc8k_possorted_genome_bam.bam | head -1 | datamash transpose [knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged. [bam_header_read] EOF marker is absent. The input is probably truncated. ST-K00126:314:HFYL2BBXX:7:2103:14996:4725 272 1 10001 1 3S95M * 0 0 CCGTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC <)-7-)7<JF7<FA7---<<A-FFFF-A<<AA----FFAJJJAAAJJAFJJJF<JJJA---FFA-AFFFFJ<FJF-FFFA7FFFJ7JFFAJ7FAFF<A NH:i:3 HI:i:3 AS:i:91 nM:i:1 RE:A:I BC:Z:TCGTCACG QT:Z:AAFFFJJJ CR:Z:TTAACTCGTAGAAGGA CY:Z:AAFFFJJJJJJJJJJJ CB:Z:TTAACTCGTAGAAGGA-1 UR:Z:GTCCGGCGAC UY:Z:JJJJJJJJJJ UB:Z:GTCCGGCGAC RG:Z:pbmc8k:MissingLibrary:1:HFYL2BBXX:7
転置してるだけですけど、とっても見やすくなりました。
Rのt()
と同じですね。同じですけど、linuxコマンドとしてパイプで繋げられるのはとっても便利。
逆順にもできます。
kimoton@DESKTOP-BL78EM7:~$ samtools view http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/2.1.0/pbmc8k/pbmc8k_possorted_genome_bam.bam | head -1 | datamash reverse | datamash transpose [knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged. [bam_header_read] EOF marker is absent. The input is probably truncated. RG:Z:pbmc8k:MissingLibrary:1:HFYL2BBXX:7 UB:Z:GTCCGGCGAC UY:Z:JJJJJJJJJJ UR:Z:GTCCGGCGAC CB:Z:TTAACTCGTAGAAGGA-1 CY:Z:AAFFFJJJJJJJJJJJ CR:Z:TTAACTCGTAGAAGGA QT:Z:AAFFFJJJ BC:Z:TCGTCACG RE:A:I nM:i:1 AS:i:91 HI:i:3 NH:i:3 <)-7-)7<JF7<FA7---<<A-FFFF-A<<AA----FFAJJJAAAJJAFJJJF<JJJA---FFA-AFFFFJ<FJF-FFFA7FFFJ7JFFAJ7FAFF<A CCGTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC 0 0 * 3S95M 1 10001 1 272 ST-K00126:314:HFYL2BBXX:7:2103:14996:4725
インストール法
sudo apt-get install datamash
RHEL系
wget http://files.housegordon.org/datamash/bin/datamash-1.0.6-1.el6.x86_64.rpm sudo rpm -i datamash-1.0.6-1.el6.x86_64.rpm
brew install datamash