Releases: philss/floki
v0.38.1
Performance
This version contains major performance improvements in the following functions:
Floki.filter_out/2.Floki.find/2- with some improvements to specific selectors, like classes
and attribute selectors.Floki.text/2.
Those functions are not only faster, but are now using less memory. Please check
the PRs related to this release if you want to better understand the numbers.
- Speed up
do_classes_matches?- #649 - Make
filter_outfaster - #650 - Speed up attribute lookup - #651
- Enable tuple traversal optimization for multiple selectors - #652
- Optimize
HTMLTree.to_tupleconversion usingEnum.reduce- #657 - Optimize
Finder.get_descendant_ids/2memory usage and speed - #660 - Optimize
Finder.get_siblings/2memory usage and speed - #663 - Optimize
FlatText.get/3memory usage and speed - #664 - Optimize class matching in
Floki.Selector- #665
All the improvements in this version were made by Barna Kovacs - @preciz,
so shout out and thanks to him!
Pull requests
- Fix build status badge by @sgerrand in #632
- Update v0.38.0 changelog item for breaking changes by @kianmeng in #634
- Bump earmark from 1.4.47 to 1.4.48 by @dependabot[bot] in #635
- Speed up do_classes_matches? by @preciz in #649
- Make filter_out faster by @preciz in #650
- Speed up attribute lookup by @preciz in #651
- Enable tuple traversal optimization for multiple selectors by @preciz in #652
- Optimize HTMLTree to_tuple conversion using Enum.reduce by @preciz in #657
- Optimize Finder.get_descendant_ids/2 memory usage and speed by @preciz in #660
- Fix unused require Logger warning in parser_test.exs by @preciz in #661
- Optimize Finder.get_siblings/2 memory usage and speed by @preciz in #663
- Optimize FlatText.get/3 memory usage and speed by @preciz in #664
- Optimize class matching in Floki.Selector by @preciz in #665
- Release v0.38.1 by @philss in #662
New Contributors
Full Changelog: v0.38.0...v0.38.1
v0.38.0
Added
-
This version adds initial support for the
:haspseudo-selector.
It is a great addition that enables finding elements containing
matching children.Examples for selectors:
"div:has(h1)""div:has(h1, p, span)""div:has(p.foo)""div:has(img[src='https://example.com'])""tr:has(*:fl-contains('TEST'))"
Note that combinators like
">"are not allowed yet.Thank you @bvobart for this feature!
Fixed
-
Add
:styleoption documentation toFloki.text/2.
Thanks @s3cur3 for the fix. -
Fix deprecation warnings for upcoming Elixir 1.19.
-
Prevent from crashing when selector is an empty string.
Removed
-
Remove support for Elixir 1.14 and OTP 23.
-
Remove deprecated functions and function clauses
that were accepting strings (binaries).Affected functions:
parse/1- removed functionmap/2- removed functionattr/4- removed clausefind/2- removed clausetext/3- removed clausetext/3- removed clauseattribute/2- removed clausefilter_out/2- removed clause
HTML must be parsed before searching. Functions like
Floki.find/2,
Floki.attribute/2, and other HTML manipulation functions no longer work
directly with HTML strings. The HTML must be parsed first using
Floki.parse_fragment/2orFloki.parse_document/2.Before:
html = "<div class='foobar'><p>Hello</p></div>" Floki.find(html, "p") Floki.attribute(html, "div", "class")
After:
html = "<div class='foobar'><p>Hello</p></div>" parsed_html = Floki.parse_fragment!(html) Floki.find(parsed_html, "p") Floki.attribute(parsed_html, "div", "class")
Pull requests
- Implement parsing rules for the
:haspseudo class selector by @philss in #623 - feat: implement :has pseudo-selector functionality by @bvobart in #624
- Bump ex_doc from 0.37.3 to 0.38.2 by @dependabot in #625
- Bump credo from 1.7.11 to 1.7.12 by @dependabot in #619
- Bump benchee from 1.3.1 to 1.4.0 by @dependabot in #618
- Add
:styleflag to text/2` docs by @s3cur3 in #627 - Remove support for Elixir 1.14 by @philss in #626
- Remove deprecations by @philss in #628
- Remove deprecation warnings for the upcoming Elixir 1.19 by @philss in #630
- Prevent
find/2from crashing with empty selector by @philss in #631 - Prepare to release v0.38 by @philss in #629
New Contributors
Full Changelog: v0.37.1...v0.38.0
v0.37.1
Fixed
Move regex declaration from module tag to inside function. This is a fix to be compatible with the upcoming OTP 28.
Pull requests
- Add Elixir 1.18 to the CI workflow by @philss in #607
- Bump ex_doc from 0.35.1 to 0.36.1 by @dependabot in #606
- Bump ex_doc from 0.36.1 to 0.37.1 by @dependabot in #611
- Fix versions we describe in README.md by @philss in #616
- Bump credo from 1.7.10 to 1.7.11 by @dependabot in #608
- Bump ex_doc from 0.37.1 to 0.37.3 by @dependabot in #615
- Bump fast_html from 2.4.0 to 2.4.1 by @dependabot in #609
Full Changelog: v0.37.0...v0.37.1
v0.37.0
Added
- Add
Floki.css_escape/1- thanks @SteffenDE.
Fixed
- Fix bug propagating identity encoder in
raw_html/2- thanks @andyleclair.
Removed
- Remove support for Elixir 1.13 and OTP 22.
Pull requests
- Drop support for Elixir 1.13 by @philss in #595
- Bump credo from 1.7.8 to 1.7.9 by @dependabot in #596
- Bump credo from 1.7.9 to 1.7.10 by @dependabot in #597
- Bump fast_html from 2.3.0 to 2.4.0 by @dependabot in #599
- Bump dialyxir from 1.4.4 to 1.4.5 by @dependabot in #600
- Bump ex_doc from 0.34.2 to 0.35.1 by @dependabot in #602
- Fix bug propagating identity encoder in
raw_html/2by @andyleclair in #603
New Contributors
- @andyleclair made their first contribution in #603
Full Changelog: v0.36.3...v0.37.0
v0.36.3
This release contains some performance improvements, thanks to @ypconstante.
Fixed
-
Stop
Floki.get_by_id/2traversal on first match. Thanks @ypconstante. -
Remove extra whitespaces from nodes without attributes on
Floki.raw_html/1.
Thank you @ypconstante. -
Fix
Floki.raw_html/1typespecs. Thanks @davydog187.
Pull requests
- chore: fix some typos by @tianzedavid in #564
- Bump ex_doc from 0.32.1 to 0.32.2 by @dependabot in #566
- Bump credo from 1.7.5 to 1.7.6 by @dependabot in #565
- Bump html5ever from 0.15.0 to 0.16.1 by @dependabot in #567
- Find without html tree for some pseudo classes by @ypconstante in #568
- Remove stack usage on Finder by @ypconstante in #569
- Remove extra whitespace on node without attributes by @ypconstante in #571
- Stop get_by_id traversal on first match by @ypconstante in #572
- Bump ex_doc from 0.32.2 to 0.33.0 by @dependabot in #570
- Bump ex_doc from 0.33.0 to 0.34.0 by @dependabot in #573
- Bump benchee from 1.3.0 to 1.3.1 by @dependabot in #574
- Bump credo from 1.7.6 to 1.7.7 by @dependabot in #575
- Add OTP 27 and Elixir 1.17 to the CI matrix by @philss in #576
- Bump ex_doc from 0.34.0 to 0.34.1 by @dependabot in #577
- Bump jason from 1.4.1 to 1.4.3 by @dependabot in #579
- Optimize traverse_and_update without accumulator by @ypconstante in #584
- Bump ex_doc from 0.34.1 to 0.34.2 by @dependabot in #581
- Bump jason from 1.4.3 to 1.4.4 by @dependabot in #585
- Bump earmark from 1.4.46 to 1.4.47 by @dependabot in #582
- Copy and paste the benchmark commands by @kianmeng in #587
- Ignore .cover folder by @kianmeng in #586
- Use accumulator on raw_html by @ypconstante in #588
- Optimize raw html padding for small depths by @ypconstante in #589
- Bump credo from 1.7.7 to 1.7.8 by @dependabot in #590
- Bump dialyxir from 1.4.3 to 1.4.4 by @dependabot in #591
- Fix raw_html/2 typespec by @davydog187 in #593
New Contributors
- @tianzedavid made their first contribution in #564
Full Changelog: v0.36.2...v0.36.3
v0.36.2
Added
- Implement the
Inspectprotocol for theFloki.HTMLTreestruct.
This struct is currently private. Thank you @vittoriabitton.
Fixed
-
Fix regression to respect config option
:encodeinFloki.raw_html/2.
Thanks @Sgoettschkes. -
Make the
Floki.raw_html/2treat the contents of the<title>tag as plain text.
The idea is to align withparse_document/2.
Thank you @aymanosman.
Pull requests
- fix typespec of get_by_id/2 by @SteffenDE in #549
- Bump ex_doc from 0.31.1 to 0.31.2 by @dependabot in #553
- raw_html treats the content of title tags as plain text by @aymanosman in #555
- Implement Inspect protocol for HTMLTree by @vittoriabitton in #547
- Bump ex_doc from 0.31.2 to 0.32.0 by @dependabot in #559
- Bump ex_doc from 0.32.0 to 0.32.1 by @dependabot in #560
- fix: read encode_raw_html config as default for raw_html encode option by @Sgoettschkes in #561
- Release 0.36.2 by @JohnnyCurran in #563
New Contributors
- @aymanosman made their first contribution in #555
- @Sgoettschkes made their first contribution in #561
- @JohnnyCurran made their first contribution in #563
Full Changelog: v0.36.0...v0.36.2
v0.36.1
Fixed
- Fix typespec of
get_by_id/2.
Pull requests
- fix typespec of get_by_id/2 by @SteffenDE in #549
- Bump ex_doc from 0.31.1 to 0.31.2 by @dependabot in #553
Full Changelog: v0.36.0...v0.36.1
v0.36.0
Added
- Add
Floki.get_by_id/1that returns one element by ID ornil.
Thanks @SteffenDE.
Changed
- Improve options validation with
Keyword.validate!/2.
This is not a change in APIs, but the error messages and opts validation should be standardized now.
Thanks @vittoriabitton.
Removed
- Drop support for Elixir v1.12.
Pull requests
- Use adjacent_sibling instead of sibling by @ypconstante in #544
- Update Elixir version requirement to 1.13 by @vittoriabitton in #541
- Improve opts validation with Keyword.validate!/2 by @vittoriabitton in #542
- Bump credo from 1.7.4 to 1.7.5 by @dependabot in #546
- add Floki.find_by_id/2 by @SteffenDE in #548
- Find without html tree for the remaining combinators by @ypconstante in #545
New Contributors
- @vittoriabitton made their first contribution in #541
- @SteffenDE made their first contribution in #548
Full Changelog: v0.35.4...v0.36.0
v0.35.4
Fixed
- Fix regression in the order of elements in
Floki.find/2
Pull requests
- Polymorphism on Finder.find by @ypconstante in #522
- Run all selector test with tuple list and html tree by @ypconstante in #523
- Bump fast_html from 2.2.0 to 2.3.0 by @dependabot in #530
- Reduce number of function call on traverse by @ypconstante in #531
- Optimize Floki.children by @ypconstante in #533
- Find without build html tree by @ypconstante in #534
- Bump credo from 1.7.3 to 1.7.4 by @dependabot in #535
- Optimize type selector by using pattern match by @ypconstante in #536
- Raw HTML code clean up by @ypconstante in #538
- Always return find elements in the correct order by @ypconstante in #540
- Find using descendant selector without html tree by @ypconstante in #537
- Release v0.35.4 by @philss in #543
Full Changelog: v0.35.3...v0.35.4
v0.35.3
This release has great performance improvements, thanks to the PRs
from @ypconstante!
Most of the main functions, such as Floki.raw_html/2 and Floki.find/2 are
faster and are using less memory. It's something like twice as fast, and half
usage of memory for find/2, for example.
Fixed
-
Add
:leexto Mix compilers. Fixes the build when running with dev version of Elixir.
Thanks @wojtekmach. -
Fix
Floki.raw_html/2when a tree using attributes as maps is given.
Thanks @SupaMic. -
Add a guard to
Floki.find/2so people can have a better error message when an
invalid input is given. Thanks @Hajto. -
Fix parsers to consider IO data as inputs. This may change in the next version
of Floki, as I plan to drop support for IO data.
Thanks @ypconstante.
Removed
- Remove outdated Gleam wrapper code. The external functions syntax in Gleam
has changed. So now
the wrapper is not needed anymore.
Thanks @michallepicki.
Pull requests
- Add :leex to Mix compilers by @wojtekmach in #502
- Update raw_html.ex to handle :attributes_as_maps option by @SupaMic in #498
- Bump benchee from 1.1.0 to 1.2.0 by @dependabot in #499
- Remove outdated gleam wrapper code by @michallepicki in #500
find/2input protection proposal. by @Hajto in #497- Bump ex_doc from 0.30.9 to 0.31.0 by @dependabot in #503
- Enable parse of IO data html by @ypconstante in #504
- Optimize RawHTML.build_attrs/2 by @ypconstante in #505
- Optimize parse_flag by @ypconstante in #506
- Use recursion instead of Enum.flat_map on find_selectors by @ypconstante in #508
- Optimize HTMLTree.build by @ypconstante in #511
- Always use optimal encoding function by @ypconstante in #512
- Bump credo from 1.7.1 to 1.7.2 by @dependabot in #513
- Optimize selectors matching by @ypconstante in #510
- Bump benchee from 1.2.0 to 1.3.0 by @dependabot in #514
- Use stack on Finder by @ypconstante in #518
- Call self_closing_tags only once on raw_html by @ypconstante in #517
- Optimize id matching by @ypconstante in #519
- Bump dialyxir from 1.4.2 to 1.4.3 by @dependabot in #516
- Include Elixir v1.16 in the build matrix by @philss in #521
- Bump credo from 1.7.2 to 1.7.3 by @dependabot in #520
- Bump ex_doc from 0.31.0 to 0.31.1 by @dependabot in #527
- Optimize leftpad on raw_html by @ypconstante in #526
- Fix pretty raw_html with encoded text by @ypconstante in #525
- Move data extraction for selector matching by @ypconstante in #524
- Improve benchmark files by @philss in #528
- Prepare release v0.35.3 by @philss in #529
New Contributors
- @SupaMic made their first contribution in #498
- @michallepicki made their first contribution in #500
- @Hajto made their first contribution in #497
- @ypconstante made their first contribution in #504
Full Changelog: v0.35.2...v0.35.3