-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add \lstset{texcl=true} to latex publish template #5
Conversation
This makes comment lines to be treated as LaTeX in code blocks outputted in `publish` command. Fixes a bug where having UTF-8 characters in comments would cause `pdflatex` to fail when rendering a document with error `! Package inputenc Error: Invalid UTF-8 byte sequence.`.
Thank you for this PR 🙂 This change seams pretty straight forward. I'll apply it in the major Octave repo soon at https://www.octave.org/hg/octave. This is just a mirror of that repo. |
@Jaakkonen How would you like to get attributed for your change? Do you prefer anonymity or another name than the current one in the patch https://github.com/gnu-octave/octave/pull/5.patch?
|
Link to Savannah https://savannah.gnu.org/bugs/?61272 |
Is this change save? What would happen if the comment contained invalid LaTeX commands, e.g. a stray Can you show an example for which |
You can already inject LaTeX commands in title comments (Lines starting with
Running
will fail without this patch and will work with it. CaveatsHowever there are regressions regarding latex control characters in non-title comments:
Running
will work without this patch but will fail with it. Caveat fixesDoing some string substitutions gets it back to as expected:
I think these substitutions should be added in the processing before merging this patch.
It looks like Github decided to use its own proxy email but |
Doing the necessary string substitution correctly automatically looks rather difficult to me. But I might be missing something obvious. Alternatively, we could take a different approach. E.g. the one outlined at https://en.wikibooks.org/wiki/LaTeX/Source_Code_Listings#Encoding_issue
Unfortunately, that would limit the supported characters to the ones explicitly included with a substitution. Maybe, we could add the following setting from further up in that article to allow users to explicitly escape to LaTeX:
But maybe chose escape sequences that are less likely to show up in existing code (i.e., not including comment characters). |
The literate seems to be the way to go as that keeps backwards compability. I couldn't get the |
Admittedly, this issue turns out more compilcated than I expected 😓 Probably it would be best and leave things as they are (following the plain Matlab design). Another way to go is to make use of the ability to easily customize the octave/scripts/miscellaneous/publish.m Lines 261 to 263 in 963d643
For example, copy and rename the file
Is this a satisfyable solution? |
The documentation of
It could be argued that the current behavior (i.e., not interpreting LaTeX commands in some contexts when that is the specified output format) is a bug. So, changing the behavior in that respect could be considered a bug fix - not a regression. |
Better documentation is always good, @mmuetzel please improve it as you please 🙂 The mentioned sentence from the Injecting LaTeX markup in comment sections (and expecting respective HTML-pendants as output), etc., was neither supported by Matlab and only works by coincidence. Again, the only documented supported markup is "Publishing Markup".
Returning to the original problem, an encoding issue with LaTeX listings, the English character limitation is better solved indvidually for each used language / special characters, as described above. |
@siko1056: Thanks for clarifying. To make sure I didn't misunderstand again, is the following correct? In that case, the current bug is that additionally in title comments, LaTeX syntax is interpreted when the output format is "latex". |
There is no final answer to it. When I worked on publish and compared a lot with Matlab, some "native" (LaTeX / HTML) markup injections (intentionally?) worked (slipped through). Escaped for the output format was mostly in Matlab what causes errors or bad output. That is what I meant by "works by coincidence". The answer what works and not, probably changes with each Matlab release. For my part, I am not very much interested in publish anymore, since Jupyter Notebook fill in this gab way better these days. This bug only caught my eye, as the fix seemed easy, but LaTeX and special chars seems to be a bigger issue which cannot be trivially solved for all potential use cases of "Publishing Markup". However, individually things can be better tackled with the described flexibility of Octave's publish, as described above. |
Sorry. I don't understand your answer. Let me re-phrase my question:
be a valid replacement for
? That still leaves open whether "native" formatting will be displayed literally or will be interpreted. As @Jaakonen pointed out, the current behavior is inconsistent (in that "native" LaTeX formatting is interpreted in title comments but displayed literally in follow up comments). I understand that you don't want to spend much time for changes in In that case, this additional sentence in the documentation might clarify that:
|
Regarding the first sentence, there is a difference between a comment and a (let's call it) "Publish comment", which start with
or without title
Not every comment is a comment 😓 Matlab did not give them a precise name, they use the concept of "section breaks" for the description, which does not exist in Octave
Thus the first sentence does not make much sense here. Regarding the last sentence, this should not be advertised, as users start relying on it 😇 Only "Publishing Markup" is supported. Of course please continue the development and change the game to your needs, same with the one-liner change. I don't see such easy fix and don't work on it now. If you found it, please apply it 🙂 |
Tbh, I don't understand the current documentation of One more attempt for a documentation change:
|
Fully agree to this sentence 👍 Basically publish does the same as Jupyter Notebook, but was invented much earlier before Matlab 2006. Jupyter has a clear distinction between Markdown cells and code cells. Only the code cells must be evaluated by Octave. The Markdown is treated seperately. In publish, this concept of a Markdown cells is simulated with those comment blocks
while everything else automatically is a code cell. The current implementation of publish embeds those code cells for LaTeX (PDF) output inside \begin{lstlisting} environments, which works quite well for English codes. Thus from that point on the control is with pdflatex. During the development, there was an endless cycle of try and error to fine tune what characters when to escape. And I am afraid to get dragged into this cycle again 😇 It takes lots of time and you always come up with new cenarios you never have imagined somebody uses this. Since Jupyter exists (I shortly learned after "finishing" my work 😆 ) I thought to have wasted my time a lot on this format (Jupyter suites for my use cases simply better, more supported by other tools, etc.). However, Octave now can also publish documents in a mostly Matlab compatible way 🙂 |
* scripts/miscellaneous/publish.m: Clarify documentation. Publishing Markup is only interpreted in section comments. Remove sentence that suggests that commands in the output format might be interpreted in that format. * scripts/miscellaneous/private/__publish_latex_output__.m: Add substitution rules for some multi-byte UTF-8 characters (mostly Latin-based) to LaTeX publish template. This fixes a bug where using these characters in comments would cause `pdflatex` to fail. See: #5
Letting this sink a bit, I opted for pushing a change that uses the While this doesn't fix the issue with multi-byte UTF-8 characters in non-section comments completely, it probably avoids it for most (Western) users. I forgot to include the bug number of the "proxy bug" on savannah. Sorry. |
* scripts/miscellaneous/publish.m: Clarify documentation. Publishing Markup is only interpreted in section comments. Remove sentence that suggests that commands in the output format might be interpreted in that format. * scripts/miscellaneous/private/__publish_latex_output__.m: Add substitution rules for some multi-byte UTF-8 characters (mostly Latin-based) to LaTeX publish template. This fixes a bug where using these characters in comments would cause `pdflatex` to fail. See: gnu-octave/octave#5
…dimensions (bug #65261) * data.cc (Fsize): Add explanation of case where DIM arg exceeds ndims (A). Add Example #5 to docstrings showing case of DIM arg exceeding ndims (A).
This makes comment lines to be treated as LaTeX in code blocks outputted in
publish
command. Fixes a bug where having UTF-8 characters in comments would causepdflatex
to fail when rendering a document with error! Package inputenc Error: Invalid UTF-8 byte sequence.
.