本日のお題。
与えられた配列の全permutationを取得する方法について、pythonとjuliaで書いてみる。
1から書いてもいいのですが、実用性を考えてパッケージを使用した方法をご紹介。
Python編
itertoolsパッケージのproduct関数を使用します。
引数にはiterableなオブジェクト及び、取得する配列の長さを与えます。
>>> itertools.product('AGCT', repeat=2) <itertools.permutations object at 0x7fecc133ce08>
返り値はgeneratorなので、list()
関数によりリスト型に変換、
さらにsorted()
関数により辞書式順序にソートします。
>>> sorted(list(itertools.product('AGCT', repeat=2))) [('A', 'A'), ('A', 'C'), ('A', 'G'), ('A', 'T'), ('C', 'A'), ('C', 'C'), ('C', 'G'), ('C', 'T'), ('G', 'A'), ('G', 'C'), ('G', 'G'), ('G', 'T'), ('T', 'A'), ('T', 'C'), ('T', 'G'), ('T', 'T')]
rosalindの入力ファイルに対応させると、こんな感じ。
#!/usr/bin/env python ############################################################################## # # Name: lexf.py # Date: 2019-03-23 # Author: kimoton # Description: # Problem # Assume that an alphabet A has a predetermined order; that is, we write the # alphabet as a permutation A=(a1,a2,…,ak), where a1<a2<⋯<ak. For instance, the # English alphabet is organized as (A,B,…,Z). # # Given two strings s and t having the same length n, we say that s precedes # t in the lexicographic order (and write s<Lext) if the first symbol s[j] # that doesn't match t[j] satisfies sj<tj in A. # # Given: A collection of at most 10 symbols defining an ordered alphabet, and a # positive integer n (n≤10). # # Return: All strings of length n that can be formed from the alphabet, ordered # lexicographically (use the standard order of symbols in the English # alphabet). # # Sample Dataset # A C G T # 2 # Sample Output # AA # AC # AG # AT # CA # CC # CG # CT # GA # GC # GG # GT # TA # TC # TG # TT ############################################################################## import sys import itertools import operator def get_lines(_file): with open(_file, "r") as rf: lines = rf.readlines() return lines def get_all_permutations(components_list, length): """ get all possible permutations by components_list """ args = repeated(components_list, length) all_perm = vec(collect(product(args...))) sorted_all_perm = sorted(all_perm) return sorted_all_perm def main(): if len(sys.argv) != 2: print('usage {0:s} input_file'.format(sys.argv[0])) sys.exit(1) input_file = sys.argv[1] line1, line2 = get_lines(input_file) components_list = line1.split() length = int(line2) permutations = get_all_permutations(components_list, length) for i in permutations: print("".join(i)) sys.exit(0) if __name__ == '__main__': main()
Julia編
重複順列の生成にはBaseパッケージのproduct関数を使用します。
Baseパッケージはデフォルトで入っているのでインストールの必要はありません。
引数の受け取りにはArgParseパッケージを使用します。
julia> Pkg.add("ArgParse") julia> using ArgParse
ポイント
Iterators.repeat
関数により配列の複製を行います。- 可変長引数関数を使って与える引数の数を可変長にします。
vec
関数によりArray型→ Vector型への変換を行います。sort
関数により辞書式順序にソートします。
julia> components_list = ["A", "T", "G", "C"] 4-element Array{String,1}: "A" "T" "G" "C" julia> len = 2 2 julia>args = Iterators.repeat(components_list, len) Base.Iterators.Take{Base.Iterators.Repeated{Array{String,1}}}(Base.Iterators.Repeated{Array{String,1}}(["A", "T", "G", "C"]), 2) julia>all_perm = vec(collect(Iterators.product(args...))) 16-element Array{Tuple{String,String},1}: ("A", "A") ("T", "A") ("G", "A") ("C", "A") ("A", "T") ("T", "T") ("G", "T") ("C", "T") ("A", "G") ("T", "G") ("G", "G") ("C", "G") ("A", "C") ("T", "C") ("G", "C") ("C", "C") julia> sort(vec(collect(Iterators.product(args...)))) 16-element Array{Tuple{String,String},1}: ("A", "A") ("A", "C") ("A", "G") ("A", "T") ("C", "A") ("C", "C") ("C", "G") ("C", "T") ("G", "A") ("G", "C") ("G", "G") ("G", "T") ("T", "A") ("T", "C") ("T", "G") ("T", "T")
rosalindの入力ファイルに対応させると、こんな感じ。
#!/usr/bin/env julia #= # # Name: lexf.jl # Date: 2019-03-27 # Author: kimoton # Description: # Problem # Assume that an alphabet A has a predetermined order; that is, we write the # alphabet as a permutation A=(a1,a2,…,ak), where a1<a2<⋯<ak. For instance, the # English alphabet is organized as (A,B,…,Z). # # Given two strings s and t having the same length n, we say that s precedes # t in the lexicographic order (and write s<Lext) if the first symbol s[j] # that doesn't match t[j] satisfies sj<tj in A. # # Given: A collection of at most 10 symbols defining an ordered alphabet, and a # positive integer n (n≤10). # # Return: All strings of length n that can be formed from the alphabet, ordered # lexicographically (use the standard order of symbols in the English # alphabet). # # Sample Dataset # A C G T # 2 # Sample Output # AA # AC # AG # AT # CA # CC # CG # CT # GA # GC # GG # GT # TA # TC # TG # TT =# using ArgParse using Base """ get all possible permutations by components_list """ function get_lines(input_file) rf = open(input_file, "r") lines = readlines(rf) return lines end """ get all possible permutations by components_list """ function get_all_permutations(component_list, length) args = Iterators.repeat([component_list], length) all_perm = vec(collect(Iterators.product(args...))) sorted_all_perm = sort(all_perm) return sorted_all_perm end function main() s = ArgParseSettings() @add_arg_table s begin "input" help = "input file" required = true end args = parse_args(ARGS, s) input = args["input"] lines = get_lines(input) component_list = [String(i) for i in split(lines[1], " ")] length = parse(Int64, lines[2]) permutations = get_all_permutations(component_list, length) println(join([join(i, "") for i in sort(permutations)], "\n")) end if occursin(PROGRAM_FILE, @__FILE__) main() end
実行時間の比較
python
$ time python lexf.py test.txt AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT 0.03user 0.28system 0:00.31elapsed 100%CPU (0avgtext+0avgdata 4960maxresident)k 0inputs+0outputs (0major+11053minor)pagefaults 0swaps
julia
$ time julia lexf.jl test.txt AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT 6.48user 0.93system 0:06.60elapsed 112%CPU (0avgtext+0avgdata 201344maxresident)k 0inputs+0outputs (0major+56893minor)pagefaults 0swaps
比較結果
なんということでしょう。。
Pythonの方が216倍近く早く処理できてしまいました。
多分実装の仕方が悪いんでしょうね。。
Julia慣れせねば。