GeneralNeed spell checking system

LaTeX specific issues not fitting into one of the other forums of this category.
Post Reply
san
Posts: 9
Joined: Mon Feb 02, 2009 1:40 pm

Need spell checking system

Post by san »

I need to check spelling (Russian language) in my .tex files.

The problem is that spell-checking systems I have seen are very limited. For example, I can do the following:

Code: Select all

\newcommand{\firstPart}{Hell}
\newcommand{\secondPart}{on}
...
\firstPart{}\secondPart{}, World!
The output will be:

Code: Select all

Hellon, World!
It is incorrect.

Also there can be more complicated text-generation macros with loops, conditions, etc. Also, the same letter can be written as «Я», or as «\CyrYa» and so on... So, I need a system which will be doing spell checking at the output of TeX.

Also it will be good to find errors like «This girls are very pretty»

The best solution I have found is:
1) Produce .pdf from .tex;
2) Do recognition (OCR) of .pdf;
3) Paste recognition result into Microsoft Word;
4) Check spelling in it.

The problems are:
1) Manual, time-consuming work;
2) It is a pain to check spelling in text full of formulas: I need to manually mark most of the formulas as pictures during OCR.

The good thing is that even text inside .eps-images is checked (figures sometimes contain legends and labels which also can have missprints).

What is your suggestions?

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org
LaTeX Beginner's Guide LaTeX Cookbook LaTeX TikZ graphics TikZによるLaTeXグラフィックス
barcense
Posts: 10
Joined: Sun Jun 10, 2007 2:16 pm

Need spell checking system

Post by barcense »

you can use the ps2ascii utility, it would be included in your LaTeX distribution, to extract the text from the pdf file to a plaintext file, so you can use, in this, your spell checker.

Best regards.
san
Posts: 9
Joined: Mon Feb 02, 2009 1:40 pm

Re: Need spell checking system

Post by san »

I tested it. ps2ascii produces mojibake instead of Russian letters. How to configure it to make UTF-8 output?
san
Posts: 9
Joined: Mon Feb 02, 2009 1:40 pm

Need spell checking system

Post by san »

Following your idea about extracting text from .pdf, I searched Internet for Extract text from pdf. I have found and tested several free text extractors. And the only one that can deal with Russian letters is Text Mining Tool. Thanks to the author.

Now I want to automate spell-checking. First of all, extracted text is split by lines (not by paragraphs) and words at the end of lines often hyphenated. Also, formulas produce litter.

So, what I need to write in the .tex preamble to
1) Disable hyphenation;
2) Disable formulas;
3) Disable images;
4) Disable line breaks (except for new paragraphs).

?
?
Post Reply