Skip to content

Commit 855e3a6

Browse files
committed
v3.5
1 parent 631591f commit 855e3a6

File tree

5 files changed

+154
-6
lines changed

5 files changed

+154
-6
lines changed

JGA_metadata.xlsx

13 Bytes
Binary file not shown.

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
## 日本語
44

55
* 生命情報・DDBJ センター
6-
* 公開日: 2025-05-19
7-
* version: v3.4
6+
* 公開日: 2025-06-23
7+
* version: v3.5
88

99
[Bioinformation and DDBJ Center](https://www.ddbj.nig.ac.jp/index-e.html) のデータベースに登録するためのメタデータ XML を生成、チェックするツール。
1010
* [DDBJ Sequence Read Archive (DRA)](https://www.ddbj.nig.ac.jp/dra/submission.html): Submission、Experiment、Run と Analysis (任意) XML を生成・チェックするためのエクセルとスクリプト
@@ -13,6 +13,7 @@
1313

1414
## 履歴
1515

16+
* 2025-06-23: v3.5 Non-ASCII, DRA Experiment Library Layout
1617
* 2025-05-19: v3.4 5000 以上のオブジェクト数でワーニング
1718
* 2025-02-27: v3.3 Organization bug fix
1819
* 2025-01-16: v3.2 TEL 削除
@@ -331,8 +332,8 @@ TBD
331332
## English
332333

333334
* Bioinformation and DDBJ Center
334-
* release: 2025-05-19
335-
* version: v3.4
335+
* release: 2025-06-23
336+
* version: v3.5
336337

337338
These files are Excel, container images and tools for generation and validation of metadata XML files for databases of [Bioinformation and DDBJ Center](https://www.ddbj.nig.ac.jp/index-e.html).
338339
* [DDBJ Sequence Read Archive (DRA)](https://www.ddbj.nig.ac.jp/dra/submission-e.html): generate and check Submission, Experiment and Run XML files.
@@ -341,6 +342,7 @@ These files are Excel, container images and tools for generation and validation
341342

342343
## History
343344

345+
* 2025-06-23: v3.5 Non-ASCII, DRA Experiment Library Layout
344346
* 2025-05-19: v3.4 Warning to objects > 5000
345347
* 2025-02-27: v3.3 Organization bug fix
346348
* 2025-01-16: v3.2 TEL removed

exe/excel2xml_dra

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ require 'date'
2020
# 2023-12-21 version 2.2 center name changes
2121
# 2024-07-05 version 3.0 new sheet names with DB prefixes
2222
# 2025-05-19 version 3.1 Warnings to >5,000 objects
23+
# 2025-06-23 version 3.2 Non-ASCII, error when layout is missing
2324
#
2425

2526
# Options
@@ -85,6 +86,30 @@ def clean_number(num)
8586

8687
end
8788

89+
## hash や array の全ての値が ascii かどうか
90+
def collect_nonascii(value, path = [])
91+
case value
92+
when String
93+
if value.ascii_only?
94+
nil
95+
else
96+
# 非ASCII文字だけを抽出
97+
nonascii_chars = value.scan(/[^\x00-\x7F]/).uniq
98+
{ path: path.join("."), value: value, nonascii: nonascii_chars }
99+
end
100+
when Array
101+
value.each_with_index.map do |v, i|
102+
collect_nonascii(v, path + ["[#{i}]"])
103+
end.compact
104+
when Hash
105+
value.map do |k, v|
106+
collect_nonascii(v, path + [k.to_s])
107+
end.compact
108+
else
109+
nil
110+
end
111+
end
112+
88113
## Settings
89114
# XML instruction
90115
instruction = '<?xml version="1.0" encoding="UTF-8"?>'
@@ -279,6 +304,50 @@ puts "\nWARNING: More than 5,000 Experiments. Split your submission to keep the
279304
puts "\nWARNING: More than 5,000 Runs. Split your submission to keep the number of Runs <= 5,000" if runs_a.size > 5000
280305
puts "\nWARNING: More than 5,000 Analyses. Split your submission to keep the number of Analyses <= 5,000" if analyses_a.size > 5000
281306

307+
## Non-ASCII
308+
## Non-ASCII check
309+
submission_nonascii = collect_nonascii(submission_a).flatten.compact
310+
if !submission_a.empty? && !submission_nonascii.empty?
311+
submission_nonascii.each do |r|
312+
puts "Non-ASCII found at Submission: '#{r[:value]}' (chars: #{r[:nonascii].join})"
313+
end
314+
end
315+
316+
experiment_nonascii = collect_nonascii(experiments_a).flatten.compact
317+
if !experiments_a.empty? && !submission_nonascii.empty?
318+
experiment_nonascii.each do |r|
319+
puts "Non-ASCII found at Experiment: '#{r[:value]}' (chars: #{r[:nonascii].join})"
320+
end
321+
end
322+
323+
run_nonascii = collect_nonascii(runs_a).flatten.compact
324+
if !runs_a.empty? && !run_nonascii.empty?
325+
run_nonascii.each do |r|
326+
puts "Non-ASCII found at Experiment: '#{r[:value]}' (chars: #{r[:nonascii].join})"
327+
end
328+
end
329+
330+
analysis_nonascii = collect_nonascii(analyses_a).flatten.compact
331+
if !analyses_a.empty? && !analysis_nonascii.empty?
332+
analysis_nonascii.each do |r|
333+
puts "Non-ASCII found at Analysis: '#{r[:value]}' (chars: #{r[:nonascii].join})"
334+
end
335+
end
336+
337+
runfile_nonascii = collect_nonascii(run_files_a).flatten.compact
338+
if !run_files_a.empty? && !runfile_nonascii.empty?
339+
runfile_nonascii.each do |r|
340+
puts "Non-ASCII found at Run file: '#{r[:value]}' (chars: #{r[:nonascii].join})"
341+
end
342+
end
343+
344+
analysis_nonascii = collect_nonascii(analyses_a).flatten.compact
345+
if !analyses_a.empty? && !analysis_nonascii.empty?
346+
analysis_nonascii.each do |r|
347+
puts "Non-ASCII found at Analysis: '#{r[:value]}' (chars: #{r[:nonascii].join})"
348+
end
349+
end
350+
282351
## Create XML
283352
prefix = submission_id + "_"
284353

@@ -428,9 +497,11 @@ unless analysis_only
428497
elsif exp[9] =~ /paired/
429498
layout.PAIRED
430499
paired_seq = "paired end sequencing"
431-
else
500+
elsif exp[9] =~ /single/
432501
layout.SINGLE
433502
paired_seq = "sequencing"
503+
else
504+
raise "Invalid Library Layout: #{exp[0]}"
434505
end
435506
} # layout
436507

exe/excel2xml_jga

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ require 'optparse'
1111
#
1212

1313
# Update history
14+
# 2025-06-23 non-ascii check
1415
# 2025-01-16 phone removed
1516
# 2024-07-05 DB prefix added to sheet name
1617
# 2024-07-04 Record all sample attributes in the sample attribute elements
@@ -69,6 +70,30 @@ def clean_number(num)
6970

7071
end
7172

73+
## hash や array の全ての値が ascii かどうか
74+
def collect_nonascii(value, path = [])
75+
case value
76+
when String
77+
if value.ascii_only?
78+
nil
79+
else
80+
# 非ASCII文字だけを抽出
81+
nonascii_chars = value.scan(/[^\x00-\x7F]/).uniq
82+
{ path: path.join("."), value: value, nonascii: nonascii_chars }
83+
end
84+
when Array
85+
value.each_with_index.map do |v, i|
86+
collect_nonascii(v, path + ["[#{i}]"])
87+
end.compact
88+
when Hash
89+
value.map do |k, v|
90+
collect_nonascii(v, path + [k.to_s])
91+
end.compact
92+
else
93+
nil
94+
end
95+
end
96+
7297
### Settings
7398
# instruction
7499
instruction = '<?xml version="1.0" encoding="UTF-8"?>'
@@ -633,6 +658,56 @@ puts "Dataset aliases duplication: #{dataset_aliases_duplicated_a.join(",")}" if
633658
# objects into an array
634659
metadata_a = [submission_a, study_a, sample_a, experiment_a, data_a, analysis_a, dataset_a]
635660

661+
## Non-ASCII check
662+
submission_nonascii = collect_nonascii(submission_h).flatten.compact
663+
if !submission_h.empty? && !submission_nonascii.empty?
664+
submission_nonascii.each do |r|
665+
puts "Non-ASCII found at Submission: '#{r[:value]}' (chars: #{r[:nonascii].join})"
666+
end
667+
end
668+
669+
study_nonascii = collect_nonascii(study_h).flatten.compact
670+
if !study_h.empty? && !study_nonascii.empty?
671+
study_nonascii.each do |r|
672+
puts "Non-ASCII found at Study: '#{r[:value]}' (chars: #{r[:nonascii].join})"
673+
end
674+
end
675+
676+
dataset_nonascii = collect_nonascii(datasets_a).flatten.compact
677+
if !datasets_a.empty? && !dataset_nonascii.empty?
678+
dataset_nonascii.each do |r|
679+
puts "Non-ASCII found at Dataset: '#{r[:value]}' (chars: #{r[:nonascii].join})"
680+
end
681+
end
682+
683+
sample_nonascii = collect_nonascii(samples_a).flatten.compact
684+
if !samples_a.empty? && !sample_nonascii.empty?
685+
sample_nonascii.each do |r|
686+
puts "Non-ASCII found at Sample: '#{r[:value]}' (chars: #{r[:nonascii].join})"
687+
end
688+
end
689+
690+
experiment_nonascii = collect_nonascii(experiments_a).flatten.compact
691+
if !experiments_a.empty? && !experiment_nonascii.empty?
692+
experiment_nonascii.each do |r|
693+
puts "Non-ASCII found at Experiment: '#{r[:value]}' (chars: #{r[:nonascii].join})"
694+
end
695+
end
696+
697+
data_nonascii = collect_nonascii(datas_a).flatten.compact
698+
if !datas_a.empty? && !data_nonascii.empty?
699+
data_nonascii.each do |r|
700+
puts "Non-ASCII found at Data: '#{r[:value]}' (chars: #{r[:nonascii].join})"
701+
end
702+
end
703+
704+
analysis_nonascii = collect_nonascii(analyses_a).flatten.compact
705+
if !analyses_a.empty? && !analysis_nonascii.empty?
706+
analysis_nonascii.each do |r|
707+
puts "Non-ASCII found at Analysis: '#{r[:value]}' (chars: #{r[:nonascii].join})"
708+
end
709+
end
710+
636711
### Create XML
637712
prefix = submission_id + "_"
638713

@@ -909,7 +984,7 @@ study_f.puts xml_study.STUDY_SET{|study_set|
909984
# Sample
910985
sample_f.puts xml_sample.SAMPLE_SET{|sample_set|
911986

912-
for sam in samples_a
987+
for sam in samples_a
913988

914989
sample_set.SAMPLE("accession" => "", "center_name" => center_name, "alias" => sam[0]){|sample|
915990
sample.TITLE(sam[2].to_s.strip)

metadata_dra.xlsx

-62.8 KB
Binary file not shown.

0 commit comments

Comments
 (0)