Skip to content

Commit a05d39b

Browse files
committed
Improve comparison performance with pre-computed priority
Closes: #151 Resolves: #148
1 parent 4a06000 commit a05d39b

12 files changed

+298
-142
lines changed

CHANGELOG.md

+27-24
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,31 @@
11
# Changelog
22

3-
## NEXT / YYYY-MM-DD
4-
5-
- 1 deprecation:
6-
7-
- Deprecated `MIME::Type#priority_compare`. In a future release, this will be
8-
will be renamed to `MIME::Type#<=>`. This method is used in tight loops, so
9-
there is no warning message for either `MIME::Type#priority_compare` or
10-
`MIME::Type#<=>`.
11-
12-
- 1 enhancement:
13-
14-
- Improved the performance of sorting by eliminating the complex comparison
15-
flow from `MIME::Type#priority_compare`. The old version shows under 600
16-
i/s, and the new version shows over 900 i/s. In sorting the full set of MIME
17-
data, there are three differences between the old and new versions; after
18-
comparison, these differences are considered acceptable.
19-
20-
- 1 bug fix:
21-
22-
- Simplified the default compare implementation (`MIME::Type#<=>`) to use the
23-
new `MIME::Type#priority_compare` operation and simplify the fallback to
24-
`String` comparison. This _may_ result in exceptions where there had been
25-
none, as explicit support for several special values (which should have
26-
caused errors in any case) have been removed.
3+
## 3.7.0.pre2 / YYYY-MM-DD
4+
5+
- Deprecated `MIME::Type#priority_compare`. In a future release, this will be
6+
will be renamed to `MIME::Type#<=>`. This method is used in tight loops, so
7+
there is no warning message for either `MIME::Type#priority_compare` or
8+
`MIME::Type#<=>`.
9+
10+
- Improved the performance of sorting by eliminating the complex comparison flow
11+
from `MIME::Type#priority_compare`. The old version shows under 600 i/s, and
12+
the new version shows over 900 i/s. In sorting the full set of MIME data,
13+
there are three differences between the old and new versions; after
14+
comparison, these differences are considered acceptable.
15+
16+
- Simplified the default compare implementation (`MIME::Type#<=>`) to use the
17+
new `MIME::Type#priority_compare` operation and simplify the fallback to
18+
`String` comparison. This _may_ result in exceptions where there had been
19+
none, as explicit support for several special values (which should have caused
20+
errors in any case) have been removed.
21+
22+
- When sorting the result of `MIME::Types#type_for`, provided a priority boost
23+
if one of the target extensions is the type's preferred extension. This means
24+
that for the case in [#148][issue-148], when getting the type for `foo.webm`,
25+
the type `video/webm` will be returned before the type `audio/webm`, because
26+
`.webm` is the preferred extension for `video/webm` but not `audio/webm`
27+
(which has a preferred extension of `.weba`). Added tests to ensure MIME types
28+
are retrieved in a stable order (which is alphabetical).
2729

2830
## 3.6.2 / 2025-03-25
2931

@@ -375,6 +377,7 @@ there are some validation changes and updated code with formatting.
375377
[issue-127]: https://github.com/mime-types/ruby-mime-types/issues/127
376378
[issue-134]: https://github.com/mime-types/ruby-mime-types/issues/134
377379
[issue-136]: https://github.com/mime-types/ruby-mime-types/issues/136
380+
[issue-148]: https://github.com/mime-types/ruby-mime-types/issues/148
378381
[issue-166]: https://github.com/mime-types/ruby-mime-types/issues/166
379382
[issue-177]: https://github.com/mime-types/ruby-mime-types/issues/177
380383
[mime-types-data]: https://github.com/mime-types/mime-types-data

Rakefile

+10-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ require "rubygems"
22
require "hoe"
33
require "rake/clean"
44
require "minitest"
5+
require "minitest/test_task"
56

67
Hoe.plugin :halostatue
78
Hoe.plugin :rubygems
@@ -10,6 +11,7 @@ Hoe.plugins.delete :debug
1011
Hoe.plugins.delete :newb
1112
Hoe.plugins.delete :publish
1213
Hoe.plugins.delete :signing
14+
Hoe.plugins.delete :test
1315

1416
spec = Hoe.spec "mime-types" do
1517
developer("Austin Ziegler", "[email protected]")
@@ -24,7 +26,7 @@ spec = Hoe.spec "mime-types" do
2426
val.merge!({"rubygems_mfa_required" => "true"})
2527
}
2628

27-
extra_deps << ["mime-types-data", "~> 3.2015"]
29+
extra_deps << ["mime-types-data", "~> 3.2025", ">= 3.2025.0506.pre2"]
2830
extra_deps << ["logger", ">= 0"]
2931

3032
extra_dev_deps << ["hoe", "~> 4.0"]
@@ -65,6 +67,8 @@ Minitest::TestTask.create :coverage do |t|
6567
RUBY
6668
end
6769

70+
task default: :test
71+
6872
namespace :benchmark do
6973
task :support do
7074
%w[lib support].each { |path|
@@ -174,6 +178,11 @@ namespace :convert do
174178
task docs: "convert:docs:run"
175179
end
176180

181+
task :version do
182+
require "mime/types/version"
183+
puts MIME::Types::VERSION
184+
end
185+
177186
namespace :deps do
178187
task :top, [:number] => "benchmark:support" do |_, args|
179188
require "deps"

lib/mime/type.rb

+114-52
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,8 @@ def to_s
133133
def initialize(content_type) # :yields: self
134134
@friendly = {}
135135
@obsolete = @registered = @provisional = false
136-
@preferred_extension = @docs = @use_instead = nil
136+
@preferred_extension = @docs = @use_instead = @__sort_priority = nil
137+
137138
self.extensions = []
138139

139140
case content_type
@@ -164,6 +165,8 @@ def initialize(content_type) # :yields: self
164165
self.xrefs ||= {}
165166

166167
yield self if block_given?
168+
169+
update_sort_priority
167170
end
168171

169172
# Indicates that a MIME type is like another type. This differs from
@@ -182,60 +185,54 @@ def like?(other)
182185
# simplified type (the simplified type will be used if comparing against
183186
# something that can be treated as a String with #to_s). In comparisons, this
184187
# is done against the lowercase version of the MIME::Type.
188+
#
189+
# Note that this implementation of #<=> is deprecated and will be changed
190+
# in the next major version to be the same as #priority_compare.
191+
#
192+
# Note that MIME::Types no longer compare against nil.
185193
def <=>(other)
186-
if other.nil?
187-
-1
188-
elsif other.respond_to?(:simplified)
194+
return priority_compare(other) if other.is_a?(MIME::Type)
195+
simplified <=> other
196+
end
197+
198+
# Compares the +other+ MIME::Type using a pre-computed sort priority value,
199+
# then the simplified representation for an alphabetical sort.
200+
#
201+
# For the next major version of MIME::Types, this method will become #<=> and
202+
# #priority_compare will be removed.
203+
def priority_compare(other)
204+
if (cmp = __sort_priority <=> other.__sort_priority) == 0
189205
simplified <=> other.simplified
190206
else
191-
filtered = "silent" if other == :silent
192-
filtered ||= "true" if other == true
193-
filtered ||= other.to_s
194-
195-
simplified <=> MIME::Type.simplified(filtered)
207+
cmp
196208
end
197209
end
198210

199-
# Compares the +other+ MIME::Type based on how reliable it is before doing a
200-
# normal <=> comparison. Used by MIME::Types#[] to sort types. The
201-
# comparisons involved are:
202-
#
203-
# 1. self.simplified <=> other.simplified (ensures that we
204-
# do not try to compare different types)
205-
# 2. IANA-registered definitions < other definitions.
206-
# 3. Complete definitions < incomplete definitions.
207-
# 4. Current definitions < obsolete definitions.
208-
# 5. Obselete with use-instead names < obsolete without.
209-
# 6. Obsolete use-instead definitions are compared.
211+
# Uses a modified pre-computed sort priority value based on whether one of the provided
212+
# extensions is the preferred extension for a type.
210213
#
211-
# While this method is public, its use is strongly discouraged by consumers
212-
# of mime-types. In mime-types 3, this method is likely to see substantial
213-
# revision and simplification to ensure current registered content types sort
214-
# before unregistered or obsolete content types.
215-
def priority_compare(other)
216-
pc = simplified <=> other.simplified
217-
if pc.zero? || !(extensions & other.extensions).empty?
218-
pc =
219-
if (reg = registered?) != other.registered?
220-
reg ? -1 : 1 # registered < unregistered
221-
elsif (comp = complete?) != other.complete?
222-
comp ? -1 : 1 # complete < incomplete
223-
elsif (obs = obsolete?) != other.obsolete?
224-
obs ? 1 : -1 # current < obsolete
225-
elsif obs && ((ui = use_instead) != (oui = other.use_instead))
226-
if ui.nil?
227-
1
228-
elsif oui.nil?
229-
-1
230-
else
231-
ui <=> oui
232-
end
233-
else
234-
0
235-
end
214+
# This is an internal function. If an extension provided is a preferred extension either
215+
# for this instance or the compared instance, the corresponding extension has its top
216+
# _extension_ bit cleared from its sort priority. That means that a type with between
217+
# 0 and 8 extensions will be treated as if it had 9 extensions.
218+
def __extension_priority_compare(other, exts) # :nodoc:
219+
tsp = __sort_priority
220+
221+
if exts.include?(preferred_extension) && tsp & 0b1000 != 0
222+
tsp = tsp & 0b11110111 | 0b0111
223+
end
224+
225+
osp = other.__sort_priority
226+
227+
if exts.include?(other.preferred_extension) && osp & 0b1000 != 0
228+
osp = osp & 0b11110111 | 0b0111
236229
end
237230

238-
pc
231+
if (cmp = tsp <=> osp) == 0
232+
simplified <=> other.simplified
233+
else
234+
cmp
235+
end
239236
end
240237

241238
# Returns +true+ if the +other+ object is a MIME::Type and the content types
@@ -270,6 +267,13 @@ def hash
270267
simplified.hash
271268
end
272269

270+
# The computed sort priority value. This is _not_ intended to be used by most
271+
# callers.
272+
def __sort_priority # :nodoc:
273+
update_sort_priority if !instance_variable_defined?(:@__sort_priority) || @__sort_priority.nil?
274+
@__sort_priority
275+
end
276+
273277
# Returns the whole MIME content-type string.
274278
#
275279
# The content type is a presentation value from the MIME type registry and
@@ -324,6 +328,7 @@ def extensions
324328

325329
##
326330
def extensions=(value) # :nodoc:
331+
clear_sort_priority
327332
@extensions = Set[*Array(value).flatten.compact].freeze
328333
MIME::Types.send(:reindex_extensions, self)
329334
end
@@ -350,9 +355,7 @@ def preferred_extension
350355

351356
##
352357
def preferred_extension=(value) # :nodoc:
353-
if value
354-
add_extensions(value)
355-
end
358+
add_extensions(value) if value
356359
@preferred_extension = value
357360
end
358361

@@ -405,9 +408,17 @@ def use_instead
405408
attr_writer :use_instead
406409

407410
# Returns +true+ if the media type is obsolete.
408-
attr_accessor :obsolete
411+
#
412+
# :attr_accessor: obsolete
413+
attr_reader :obsolete
409414
alias_method :obsolete?, :obsolete
410415

416+
##
417+
def obsolete=(value)
418+
clear_sort_priority
419+
@obsolete = !!value
420+
end
421+
411422
# The documentation for this MIME::Type.
412423
attr_accessor :docs
413424

@@ -465,11 +476,27 @@ def xref_urls
465476
end
466477

467478
# Indicates whether the MIME type has been registered with IANA.
468-
attr_accessor :registered
479+
#
480+
# :attr_accessor: registered
481+
attr_reader :registered
469482
alias_method :registered?, :registered
470483

484+
##
485+
def registered=(value)
486+
clear_sort_priority
487+
@registered = !!value
488+
end
489+
471490
# Indicates whether the MIME type's registration with IANA is provisional.
472-
attr_accessor :provisional
491+
#
492+
# :attr_accessor: provisional
493+
attr_reader :provisional
494+
495+
##
496+
def provisional=(value)
497+
clear_sort_priority
498+
@provisional = !!value
499+
end
473500

474501
# Indicates whether the MIME type's registration with IANA is provisional.
475502
def provisional?
@@ -552,6 +579,7 @@ def encode_with(coder)
552579
coder["registered"] = registered?
553580
coder["provisional"] = provisional? if provisional?
554581
coder["signature"] = signature? if signature?
582+
coder["sort-priority"] = __sort_priority || 0b11111111
555583
coder
556584
end
557585

@@ -560,6 +588,7 @@ def encode_with(coder)
560588
#
561589
# This method should be considered a private implementation detail.
562590
def init_with(coder)
591+
@__sort_priority = 0
563592
self.content_type = coder["content-type"]
564593
self.docs = coder["docs"] || ""
565594
self.encoding = coder["encoding"]
@@ -573,6 +602,8 @@ def init_with(coder)
573602
self.use_instead = coder["use-instead"]
574603

575604
friendly(coder["friendly"] || {})
605+
606+
update_sort_priority
576607
end
577608

578609
def inspect # :nodoc:
@@ -628,6 +659,37 @@ def simplify_matchdata(matchdata, remove_x = false, joiner: "/")
628659

629660
private
630661

662+
def clear_sort_priority
663+
@__sort_priority = nil
664+
end
665+
666+
# Update the __sort_priority value. Lower numbers sort better, so the
667+
# bitmapping may seem a little odd. The _best_ sort priority is 0.
668+
#
669+
# | bit | meaning | details |
670+
# | --- | --------------- | --------- |
671+
# | 7 | obsolete | 1 if true |
672+
# | 6 | provisional | 1 if true |
673+
# | 5 | registered | 0 if true |
674+
# | 4 | complete | 0 if true |
675+
# | 3 | # of extensions | see below |
676+
# | 2 | # of extensions | see below |
677+
# | 1 | # of extensions | see below |
678+
# | 0 | # of extensions | see below |
679+
#
680+
# The # of extensions is marked as the number of extensions subtracted from
681+
# 16, to a minimum of 0.
682+
def update_sort_priority
683+
extension_count = @extensions.length
684+
obsolete = (instance_variable_defined?(:@obsolete) && @obsolete) ? 1 << 7 : 0
685+
provisional = (instance_variable_defined?(:@provisional) && @provisional) ? 1 << 6 : 0
686+
registered = (instance_variable_defined?(:@registered) && @registered) ? 0 : 1 << 5
687+
complete = extension_count.nonzero? ? 0 : 1 << 4
688+
extension_count = [0, 16 - extension_count].max
689+
690+
@__sort_priority = obsolete | registered | provisional | complete | extension_count
691+
end
692+
631693
def content_type=(type_string)
632694
match = MEDIA_TYPE_RE.match(type_string)
633695
fail InvalidContentType, type_string if match.nil?

0 commit comments

Comments
 (0)