Fonts & Character SetsBibTeX UTF-8 Support

Information and discussion about fonts and character sets (e.g. how to use language specific characters)
Post Reply
jblocher
Posts: 7
Joined: Mon Feb 01, 2010 3:08 pm

BibTeX UTF-8 Support

Post by jblocher »

I have an author in my .bib file with a slovene caron in his name, which is causing an error, in BibTeX I think.
Here is some code

Code: Select all

\documentclass[11pt,notitlepage]{article}
\usepackage[english]{babel}
\usepackage[T1]{fontenc} 
\bibliographystyle{plain} 
\usepackage[utf8]{inputenc}
\begin{document}
I'd like to cite \cite{Cohen:2005p8599}
\bibliography{mwe_fonts}
\end{document}
with this bib file, called mwe_fonts.bib

Code: Select all

@article{Cohen:2005p8599,
author = {Randolph Cohen and Joshua Coval and L̆ubos̆ Pastor}, 
journal = {Journal of Finance},
title = {Judging Fund Managers by the Company They Keep},
abstract = {This is really abstract},
rating = {0}
}
the .bbl file looks like this:
\begin{thebibliography}{1}
\bibitem{Cohen:2005p8599}
Randolph Cohen, Joshua Coval, and LÃÜubosÃÜ Pastor.
\newblock Judging fund managers by the company they keep.
\newblock {\em Journal of Finance}.
\end{thebibliography}
And the pdflatex gives me this error:
! Package inputenc Error: Unicode char \u8:̆ not set up for use with LaTeX.

Versions:
This is BibTeX, Version 0.99c (TeX Live 2009)
This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009) (format=pdflatex 2009.11.7)

So, it looks like BibTeX isn't converting the characters correctly. Thoughts?

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org
LaTeX Beginner's Guide LaTeX Cookbook LaTeX TikZ graphics TikZによるLaTeXグラフィックス
josephwright
Site Moderator
Posts: 814
Joined: Tue Jul 01, 2008 2:19 pm

Re: BibTeX UTF-8 Support

Post by josephwright »

BibTeX is an 8-bit program. Things will go wrong when BibTeX tries to process anything that is represented outside of this range. (Material which is just passed through tends to be OK.)

This is a long-standing issue. One possible approach to get round it is the biber-biblatex approach. Simply adding UTF-8 support to BibTeX is unlikely as there are other issues that also need to be addressed.
Joseph Wright
paul
Posts: 49
Joined: Thu Apr 08, 2010 5:56 am

BibTeX UTF-8 Support

Post by paul »

josephwright wrote:BibTeX is an 8-bit program. Things will go wrong when BibTeX tries to process anything that is represented outside of this range. (Material which is just passed through tends to be OK.)

This is a long-standing issue. One possible approach to get round it is the biber-biblatex approach. Simply adding UTF-8 support to BibTeX is unlikely as there are other issues that also need to be addressed.
cl-bibtex solves the problem...
rf
Posts: 21
Joined: Mon Jul 20, 2009 5:27 pm

BibTeX UTF-8 Support

Post by rf »

paul wrote:
josephwright wrote:BibTeX is an 8-bit program. Things will go wrong when BibTeX tries to process anything that is represented outside of this range. (Material which is just passed through tends to be OK.)

This is a long-standing issue. One possible approach to get round it is the biber-biblatex approach. Simply adding UTF-8 support to BibTeX is unlikely as there are other issues that also need to be addressed.
cl-bibtex solves the problem...
interesting -- i hadn't ever heard of cl-bibtex (though i've looked at a variety of bibtex replacements, none of which showed any sign of usability, in the way biber-biblatex does).

one of the general features of these new things is their authors can't be bothered to submit to ctan, so the program doesn't make it to faqs or distributions (though the latter may raise portability issues -- my experience of writing any lisp other than elisp dates to the 1960s, when i was a graduate student...).
pipk
Posts: 6
Joined: Sun May 01, 2011 5:01 pm

Re: BibTeX UTF-8 Support

Post by pipk »

As an addendum, note that cl-bibtex (and I don't think bibtexu, a recent Unicode bibtex) doesn't really sort with full Unicode support like biber. Real Unicode sorting needs CLDR support for many common locales. One of the main problems with Unicode support is sorting of bib entries, not just not mangling Unicode. biber has fully customisable sorting with full Unicode support plus choice of case-sensitive/upper-before-lower sorting. biber+biblatex has many very powerful features which bibtex doesn't. See the summary in the biber PDF doc for details.
paul
Posts: 49
Joined: Thu Apr 08, 2010 5:56 am

BibTeX UTF-8 Support

Post by paul »

pipk wrote:As an addendum, note that cl-bibtex (and I don't think bibtexu, a recent Unicode bibtex) doesn't really sort with full Unicode support like biber. Real Unicode sorting needs CLDR support for many common locales. One of the main problems with Unicode support is sorting of bib entries, not just not mangling Unicode. biber has fully customisable sorting with full Unicode support plus choice of case-sensitive/upper-before-lower sorting. biber+biblatex has many very powerful features which bibtex doesn't. See the summary in the biber PDF doc for details.
Hmm...FWIW, cl-bibtex uses a function in the global variable *generate-sort-key* to generate the object it uses as a sort key, and then a generic CMP= to compare them. The default value for *generate-sort-key* is #'identity, so it just uses strings, and CMP= on strings uses STRING=, which doesn't do "real Unicode sorting".

If your Lisp implementation has a way to sort strings in a Unicode-correct way, you can easily make CMP= use it. Alternatively, you can make *generate-sort-key* return something other than a string, and, e.g., use the FFI to do the comparison with ICU or something (there's such a thing, specific to CMUCL, included with cl-bibtex; on CMUCL, you can just do

Code: Select all

(defvar *en-collator*)

(defun generate-unicode-sort-key (string)
  (icu:ucol-get-sort-key *en-collator* string))

(icu:with-open-collator (*en-collator* "en")
    (setq bibtex-runtime::*generate-sort-key* #'generate-unicode-sort-key)
    ...)
and you get proper Unicode sorting...vary "en" as appropriate.)
Post Reply