Skip to content

Parser #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions BetterMD/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
from .elements import *
from .html import CustomHTML
from .markdown import CustomMarkdown
from .rst import CustomRst
from .parse import Collection, HTMLParser, MDParser, RSTParser

def from_html(html:'str'):
return Symbol.from_html(html)

def from_md(md:'str'):
return Symbol.from_md(md)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Symbol undefined in parsing functions

Both from_html and from_md functions reference Symbol, but it's not explicitly imported. While it might be included via the star import from .elements, this could lead to confusion and potential issues if the import structure changes.

Make the dependency on Symbol explicit:

from .elements import *
+from .elements.symbol import Symbol  # Explicitly import Symbol
from .parse import Collection, HTMLParser, MDParser, RSTParser

def from_html(html:'str'):
    return Symbol.from_html(html)

def from_md(md:'str'):
    return Symbol.from_md(md)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def from_html(html:'str'):
return Symbol.from_html(html)
def from_md(md:'str'):
return Symbol.from_md(md)
from .elements import *
from .elements.symbol import Symbol # Explicitly import Symbol
from .parse import Collection, HTMLParser, MDParser, RSTParser
def from_html(html:'str'):
return Symbol.from_html(html)
def from_md(md:'str'):
return Symbol.from_md(md)
🧰 Tools
🪛 Ruff (0.8.2)

5-5: Symbol may be undefined, or defined from star imports

(F405)


8-8: Symbol may be undefined, or defined from star imports

(F405)

25 changes: 15 additions & 10 deletions BetterMD/elements/__init__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
from .symbol import Symbol

from .a import A
from .blockquote import Blockquote
from .body import Body
from .br import Br
from .code import Code
from .div import Div
from .h import H1,H2,H3,H4,H5,H6
from .head import Head
from .hr import Hr
from .html import HTML
from .img import Img
from .input import Input
from .li import OL, UL, LI
from .text import Text
from .div import Div
from .p import P
from .span import Span
from .img import Img
from .text_formatting import Strong, Em, Code
from .br import Br
from .blockquote import Blockquote
from .hr import Hr
from .table import Table, Tr, Td, Th
from .input import Input
from .code import Code
from .table import Table, Tr, Td, Th, THead, TBody
from .text_formatting import Strong, Em, Strong as B, Em as I
from .text import Text
from .title import Title
45 changes: 37 additions & 8 deletions BetterMD/elements/a.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,52 @@
from BetterMD.rst.custom_rst import CustomRst
from .symbol import Symbol
from ..rst import CustomRst
from ..markdown import CustomMarkdown
from ..html import CustomHTML
import re
import typing as t

class MD(CustomMarkdown['A']):
if t.TYPE_CHECKING:
from ..parse import Collection

class MD(CustomMarkdown):
def to_md(self, inner, symbol, parent):
return f"[{" ".join([e.to_md() for e in inner])}]({symbol.get_prop("href")})"

def verify(self, text:'str'):
if re.findall("\[([^\]]+)\]\((https?:\/\/[^\s)]+)\)", text):
# Case 1: Inline link
return True

elif re.findall("<(https?:\/\/[^\s>]+)>", text):
# Case 2: Automatic Links
return True

elif re.findall("\[([^\]]+)\]\[([^\]]+)\]\s*\n?\[([^\]]+)\]:\s*(https?:\/\/[^\s]+)", text):
# Case 3: Reference Links
return True

return False

class HTML(CustomHTML['A']):
def to_html(self, inner, symbol, parent):
return f"<a href={symbol.get_prop('href')}>{" ".join([e.to_html() for e in inner])}</a>"

class RST(CustomRst['A']):
def to_rst(self, inner, symbol, parent):
return f"`{' '.join([e.to_rst() for e in inner])} <{symbol.get_prop('href')}>`_"

class A(Symbol):
prop_list = ["href"]

refs = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use a class variable for shared references with caution.

The refs = {} dictionary here is shared among all instances of class A. Any modifications to refs on one instance will affect other instances. Consider changing the design if different instances of A should have separate dictionaries.

-class A(Symbol):
-    ...
-    refs = {}
+class A(Symbol):
+    ...
+    # If you need a unique dictionary on each instance, move refs to __init__
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.refs = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
refs = {}
class A(Symbol):
...
# If you need a unique dictionary on each instance, move refs to __init__
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.refs = {}

md = MD()
html = HTML()
rst = RST()
html = "a"
rst = RST()

@classmethod
def md_refs(cls, references: 'list[str]' = None):
pass

@classmethod
def rst_refs(cls, references: 'list[str]' = None):
pass

@classmethod
def html_refs(cls, references: 'list[str]' = None):
pass
7 changes: 7 additions & 0 deletions BetterMD/elements/body.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from .symbol import Symbol


class Body(Symbol):
html = "body"
md = ""
rst = ""
44 changes: 39 additions & 5 deletions BetterMD/elements/code.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from .text import Text
from ..markdown import CustomMarkdown
from ..html import CustomHTML
from ..rst import CustomRst

class MD(CustomMarkdown):
def to_md(self, inner, symbol, parent):
Expand All @@ -19,17 +20,50 @@ def to_md(self, inner, symbol, parent):
class HTML(CustomHTML):
def to_html(self, inner, symbol, parent):
language = symbol.get_prop("language", "")
if isinstance(inner, Text):
inner = inner.to_html()
inner = "\n".join([i.to_html() for i in inner])

if language:
return f'<pre><code class="language-{language}">{inner}</code></pre>'
return f'<code class="language-{language}">{inner}</code>'

return f"<code>{inner}</code>"

def verify(self, text: str) -> bool:
return text.lower() == "code"

class RST(CustomRst):
def to_rst(self, inner, symbol, parent):
language = symbol.get_prop("language", "")

# Handle inner content
if isinstance(inner, list):
content = "".join([
i.to_rst() if isinstance(i, Symbol) else str(i)
for i in inner
])
else:
content = inner.to_rst() if isinstance(inner, Symbol) else str(inner)

# If it's a code block (has language or multiline)
if language or "\n" in content:
# Use code-block directive for language-specific blocks
if language:
# Indent the content by 3 spaces (RST requirement)
indented_content = "\n".join(f" {line}" for line in content.strip().split("\n"))
return f".. code-block:: {language}\n\n{indented_content}\n\n"

# Use simple literal block for language-less blocks
# Indent the content by 3 spaces (RST requirement)
indented_content = "\n".join(f" {line}" for line in content.strip().split("\n"))
return f"::\n\n{indented_content}\n\n"

# Inline code
# Escape backticks if they exist in content
if "`" in content:
return f"``{content}``"
return f"`{content}`"

class Code(Symbol):
props = ["language"]
html = HTML()
md = MD()
rst = "``"
rst = RST()
nl = True
12 changes: 6 additions & 6 deletions BetterMD/elements/h.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,36 +15,36 @@ def to_rst(self, inner: list[Symbol], symbol: Symbol, parent: Symbol) -> str:

class H1(Symbol):
html = "h1"
md = "#"
md = "# "
rst = RST("=")
nl = True

class H2(Symbol):
html = "h2"
md = "##"
md = "## "
rst = RST("-")
nl = True

class H3(Symbol):
html = "h3"
md = "###"
md = "### "
rst = RST("~")
nl = True

class H4(Symbol):
html = "h4"
md = "####"
md = "#### "
rst = RST("+")
nl = True

class H5(Symbol):
html = "h5"
md = "#####"
md = "##### "
rst = RST("^")
nl = True

class H6(Symbol):
html = "h6"
md = "######"
md = "###### "
rst = RST('"')
nl = True
6 changes: 6 additions & 0 deletions BetterMD/elements/html.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from .symbol import Symbol

class HTML(Symbol):
html = "html"
md = ""
rst = ""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider implementing conversion methods for consistency

The class doesn't appear to have any methods for handling the conversion between formats. Based on the PR context, most Symbol subclasses likely implement methods like to_html(), to_md(), or to_rst(). Consider adding these methods to maintain consistency with the rest of the codebase, even if they simply return the content unchanged.


🌐 Web query:

What methods are typically included in Symbol subclasses for format conversion?

💡 Result:

Symbol subclasses typically implement specialized methods for format conversion, leveraging programming language-specific mechanisms to control type coercion and string representation. These methods vary across ecosystems but share common patterns:

1. String Representation Overrides

  • toString()/__str__(): Converts symbol to human-readable string[10][12]
Symbol('foo').toString(); // "Symbol(foo)"[10]
  • toStringTag property (JS): Customizes Object.prototype.toString() output[10][12]
class CustomSymbol {
  get [Symbol.toStringTag]() { return 'MySymbol'; }
}
Object.prototype.toString.call(new CustomSymbol()); // "[object MySymbol]"[12]

2. Format Specification Handlers

  • __format__ method (Python): Controls format() behavior with specifiers[3][4][6]
class MySymbol:
    def __format__(self, format_spec):
        return f"Formatted: {format_spec}"

3. Type Conversion Protocols

  • Explicit conversion methods like toNumber(), toBoolean() (often throwing errors for invalid conversions)[10][14]
const sym = Symbol();
Number(sym); // TypeError: Cannot convert Symbol to number[10]

4. Custom Formatter Integration

  • Conversion flags in format strings (!s, !r in Python)[4][6]
f"{symbol!r}"  # Calls repr() instead of str()[4]

5. Parser Interaction Methods

  • parse methods for reverse conversion (e.g., Java DecimalFormat)[9][13]
DecimalFormat df = new DecimalFormat("#,##0.00");
Number num = df.parse("1,234.56");[9]

6. Localization Support

  • Culture-aware formatting through IFormatProvider (.NET)[7][13]
public string ToString(string format, IFormatProvider provider) {
  // Culture-specific formatting[13]
}

These methods enable precise control over symbol representation while maintaining type safety. JavaScript Symbols particularly restrict implicit conversions, requiring explicit toString() calls[10][12], whereas Python and .NET provide more flexible formatting hooks through standardized interfaces[3][4][13]. The implementation details often involve overriding base class methods or implementing formatting interfaces like IFormattable[13].

Citations:


Conversion Methods Implementation Required

The HTML class in BetterMD/elements/html.py (lines 3–6) currently defines only the format attributes without any explicit conversion methods. Given that other Symbol subclasses in our codebase typically implement conversion methods to handle format transformations (for instance, methods such as to_html(), to_md(), and to_rst()), it is advisable to add similar methods here. This ensures consistency across the codebase and provides a clear API for format conversion—even if the methods simply return the corresponding attribute values.

Suggested changes:

class HTML(Symbol):
    html = "html"
    md = ""
    rst = ""

    def to_html(self):
        return self.html

    def to_md(self):
        return self.md

    def to_rst(self):
        return self.rst

22 changes: 3 additions & 19 deletions BetterMD/elements/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,16 @@
from ..markdown import CustomMarkdown
from ..rst import CustomRst

class HTML(CustomHTML):
def to_html(self, inner, symbol, parent):
# Collect all input attributes
attrs = []
for prop in Input.props:
value = symbol.get_prop(prop)
if value:
# Handle boolean attributes like 'required', 'disabled', etc.
if isinstance(value, bool) and value:
attrs.append(prop)
else:
attrs.append(f'{prop}="{value}"')

attrs_str = " ".join(attrs)
return f"<input {attrs_str} />"

class MD(CustomMarkdown):
def to_md(self, inner, symbol, parent):
if symbol.get_prop("type") == "checkbox":
return f"- [{'x' if symbol.get_prop('checked', '') else ''}] {inner.to_md()}"
return f"- [{'x' if symbol.get_prop('checked', '') else ' '}] {inner.to_md()}"
return symbol.to_html()

class RST(CustomRst):
def to_rst(self, inner, symbol, parent):
if symbol.get_prop("type") == "checkbox":
return f"[ ] {inner.to_rst() if inner else ''}"
return f"[{'x' if symbol.get_prop('checked', '') else ' '}] {inner.to_rst() if inner else ''}"
return "" # Most input types don't have RST equivalents

class Input(Symbol):
Expand All @@ -50,6 +34,6 @@ class Input(Symbol):
"multiple",
"step"
]
html = HTML()
html = "input"
md = MD()
rst = RST()
7 changes: 6 additions & 1 deletion BetterMD/elements/p.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,9 @@ class P(Symbol):
html = "p"
md = ""
rst = "\n\n"
nl = True
nl = True

class Pre(Symbol):
html = "pre"
md = ""
rst = ""
Comment on lines +9 to +12
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

New Pre class implementation appears incomplete

The new Pre class implementation has empty string values for md and rst properties. In standard Markdown and reStructuredText, preformatted text typically requires special formatting (like code blocks with backticks or indentation). Without proper formatting, the conversion may not reflect preformatted text correctly in these formats.

Consider implementing proper Markdown and RST representations:

class Pre(Symbol):
    html = "pre"
-    md = ""
-    rst = ""
+    md = "```\n"  # Triple backticks for code blocks in Markdown
+    rst = "::\n\n"  # Double colon with double newline for preformatted blocks in RST
+    nl = True  # Ensure a newline after the element like the P class
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Pre(Symbol):
html = "pre"
md = ""
rst = ""
class Pre(Symbol):
html = "pre"
md = "

Loading