Juliaでpermutation

本日のお題。

与えられた配列の全permutationを取得する方法について、pythonとjuliaで書いてみる。
1から書いてもいいのですが、実用性を考えてパッケージを使用した方法をご紹介。

Python編

itertoolsパッケージのproduct関数を使用します。
引数にはiterableなオブジェクト及び、取得する配列の長さを与えます。

>>> itertools.product('AGCT', repeat=2)
<itertools.permutations object at 0x7fecc133ce08>

返り値はgeneratorなので、list()関数によりリスト型に変換、
さらにsorted()関数により辞書式順序にソートします。

>>> sorted(list(itertools.product('AGCT', repeat=2)))
[('A', 'A'), ('A', 'C'), ('A', 'G'), ('A', 'T'), ('C', 'A'), ('C', 'C'), ('C', 'G'), ('C', 'T'), ('G', 'A'), ('G', 'C'), ('G', 'G'), ('G', 'T'), ('T', 'A'), ('T', 'C'), ('T', 'G'), ('T', 'T')]

rosalindの入力ファイルに対応させると、こんな感じ。

#!/usr/bin/env python

##############################################################################
#
# Name: lexf.py
# Date: 2019-03-23
# Author: kimoton
# Description:
# Problem
# Assume that an alphabet A has a predetermined order; that is, we write the
# alphabet as a permutation A=(a1,a2,…,ak), where a1<a2<⋯<ak. For instance, the
# English alphabet is organized as (A,B,…,Z).
#
# Given two strings s and t having the same length n, we say that s precedes
# t in the lexicographic order (and write s<Lext) if the first symbol s[j]
# that doesn't match t[j] satisfies sj<tj in A.
#
# Given: A collection of at most 10 symbols defining an ordered alphabet, and a
# positive integer n (n≤10).
#
# Return: All strings of length n that can be formed from the alphabet, ordered
# lexicographically (use the standard order of symbols in the English
# alphabet).
#
# Sample Dataset
# A C G T
# 2
# Sample Output
# AA
# AC
# AG
# AT
# CA
# CC
# CG
# CT
# GA
# GC
# GG
# GT
# TA
# TC
# TG
# TT
##############################################################################

import sys
import itertools
import operator


def get_lines(_file):
    with open(_file, "r") as rf:
        lines = rf.readlines()
    return lines


def get_all_permutations(components_list, length):
    """
    get all possible permutations by components_list
    """
    args = repeated(components_list, length)
    all_perm = vec(collect(product(args...)))
    sorted_all_perm = sorted(all_perm)
    return sorted_all_perm


def main():
    if len(sys.argv) != 2:
        print('usage {0:s} input_file'.format(sys.argv[0]))
        sys.exit(1)
    input_file = sys.argv[1]
    line1, line2 = get_lines(input_file)
    components_list = line1.split()
    length = int(line2)
    permutations = get_all_permutations(components_list, length)
    for i in permutations:
        print("".join(i))
    sys.exit(0)


if __name__ == '__main__':
    main()

Julia編

重複順列の生成にはBaseパッケージのproduct関数を使用します。
Baseパッケージはデフォルトで入っているのでインストールの必要はありません。

引数の受け取りにはArgParseパッケージを使用します。

julia> Pkg.add("ArgParse")
julia> using ArgParse

ポイント

  • Iterators.repeat関数により配列の複製を行います。
  • 可変長引数関数を使って与える引数の数を可変長にします。
  • vec関数によりArray型→ Vector型への変換を行います。
  • sort関数により辞書式順序にソートします。
julia> components_list = ["A", "T", "G", "C"]
4-element Array{String,1}:
 "A"
 "T"
 "G"
 "C"

julia> len = 2
2

julia>args = Iterators.repeat(components_list, len)
Base.Iterators.Take{Base.Iterators.Repeated{Array{String,1}}}(Base.Iterators.Repeated{Array{String,1}}(["A", "T", "G", "C"]), 2)

julia>all_perm = vec(collect(Iterators.product(args...)))
16-element Array{Tuple{String,String},1}:
 ("A", "A")
 ("T", "A")
 ("G", "A")
 ("C", "A")
 ("A", "T")
 ("T", "T")
 ("G", "T")
 ("C", "T")
 ("A", "G")
 ("T", "G")
 ("G", "G")
 ("C", "G")
 ("A", "C")
 ("T", "C")
 ("G", "C")
 ("C", "C")

julia> sort(vec(collect(Iterators.product(args...))))
16-element Array{Tuple{String,String},1}:
 ("A", "A")
 ("A", "C")
 ("A", "G")
 ("A", "T")
 ("C", "A")
 ("C", "C")
 ("C", "G")
 ("C", "T")
 ("G", "A")
 ("G", "C")
 ("G", "G")
 ("G", "T")
 ("T", "A")
 ("T", "C")
 ("T", "G")
 ("T", "T")

rosalindの入力ファイルに対応させると、こんな感じ。

#!/usr/bin/env julia

#=
#
# Name: lexf.jl
# Date: 2019-03-27
# Author: kimoton
# Description:
# Problem
# Assume that an alphabet A has a predetermined order; that is, we write the
# alphabet as a permutation A=(a1,a2,…,ak), where a1<a2<⋯<ak. For instance, the
# English alphabet is organized as (A,B,…,Z).
#
# Given two strings s and t having the same length n, we say that s precedes
# t in the lexicographic order (and write s<Lext) if the first symbol s[j]
# that doesn't match t[j] satisfies sj<tj in A.
#
# Given: A collection of at most 10 symbols defining an ordered alphabet, and a
# positive integer n (n≤10).
#
# Return: All strings of length n that can be formed from the alphabet, ordered
# lexicographically (use the standard order of symbols in the English
                         # alphabet).
#
# Sample Dataset
# A C G T
# 2
# Sample Output
# AA
# AC
# AG
# AT
# CA
# CC
# CG
# CT
# GA
# GC
# GG
# GT
# TA
# TC
# TG
# TT
=#

using ArgParse
using Base

"""
get all possible permutations by components_list
"""
function get_lines(input_file)
    rf = open(input_file, "r")
    lines = readlines(rf)
    return lines
end


"""
get all possible permutations by components_list
"""
function get_all_permutations(component_list, length)
    args = Iterators.repeat([component_list], length)
    all_perm = vec(collect(Iterators.product(args...)))
    sorted_all_perm = sort(all_perm)
    return sorted_all_perm
end


function main()
    s = ArgParseSettings()
    @add_arg_table s begin
        "input"
        help = "input file"
        required = true
    end
    args = parse_args(ARGS, s)
    input = args["input"]
    lines = get_lines(input)
    component_list = [String(i) for i in split(lines[1], " ")]
    length = parse(Int64, lines[2])
    permutations = get_all_permutations(component_list, length)
    println(join([join(i, "") for i in sort(permutations)], "\n"))
end


if occursin(PROGRAM_FILE, @__FILE__)
    main()
end

実行時間の比較

python

$ time python lexf.py test.txt
AA
AC
AG
AT
CA
CC
CG
CT
GA
GC
GG
GT
TA
TC
TG
TT
0.03user 0.28system 0:00.31elapsed 100%CPU (0avgtext+0avgdata 4960maxresident)k
0inputs+0outputs (0major+11053minor)pagefaults 0swaps

julia

$ time julia lexf.jl test.txt
AA
AC
AG
AT
CA
CC
CG
CT
GA
GC
GG
GT
TA
TC
TG
TT
6.48user 0.93system 0:06.60elapsed 112%CPU (0avgtext+0avgdata 201344maxresident)k
0inputs+0outputs (0major+56893minor)pagefaults 0swaps

比較結果

なんということでしょう。。

Pythonの方が216倍近く早く処理できてしまいました。
多分実装の仕方が悪いんでしょうね。。

Julia慣れせねば。