Skip to content

Whitespaces and zeros before numbers are trimmed unexpectedly #440

@YDX-2147483647

Description

@YDX-2147483647

Page numbers, serial numbers, etc. may have leading zeros (e.g., 011), or have whitespaces before the number (e.g., GB/T 7714).

Sometimes, it is important to keep these zeros and whitespaces:

  • 011 might mean the 11th file on shelf 0, but 11 might mean the 1st file on shelf 1.
  • T in GB/T 7714 means Recommend. It has nothing to do with 7714.

However, Hayagrive 0.9.1 / Typst v0.14.2 ignores whitespaces and leading zeros when parsing numeric values. As a result, they get trimmed unexpectedly.

Example

  • Desired output: GB/X 03792
  • Actual output: GB/X3792
    Image
#let bib = ```yaml
key:
  type: report
  serial-number: GB/X 03792
```.text

#let csl = ```xml
<?xml version='1.0' encoding='utf-8'?>
<style xmlns="http://purl.org/net/xbiblio/csl" class="in-text" version="1.0">
  <info>
    <title/>
    <id/>
  </info>
  <citation>
    <layout >
      <text value="irrelevant"/>
    </layout>
  </citation>
  <bibliography >
    <layout>
      <text variable="number"/>
    </layout>
  </bibliography>
</style>
```.text

// #set text(lang: "zh")
#bibliography(
  bytes(bib),
  // style: "gb-7714-2015-numeric",
  style: bytes(csl),
  full: true,
)

The built-in gb-7714-2015-numeric style is also affected.

Image

Current logic

Numeric values are parsed into Numeric { prefix, value, suffix }.

/// A numeric value that can be pluralized.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Numeric {
/// The numeric value.
pub value: NumericValue,
/// A string that is prepended to the value.
pub prefix: Option<Box<String>>,
/// A string that is appended to the value.
pub suffix: Option<Box<String>>,
}

The value is stored as i32, so leading zeros are ignored.

/// The numeric value.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NumericValue {
/// A single number.
Number(i32),
/// A set of numbers.
Set(Vec<(i32, Option<NumericDelimiter>)>),
}

When parsing a numeric value, Hayagriva first matches a non-whitespace prefix, and eat whitespaces before num. Therefore, whitespaces between prefix and value are ignored.

impl FromStr for Numeric {
type Err = NumericError;
fn from_str(value: &str) -> Result<Self, Self::Err> {
let mut s = Scanner::new(value);
let prefix =
s.eat_while(|c: char| !c.is_numeric() && !c.is_whitespace() && c != '-');
let value = number(&mut s).ok_or(NumericError::NoNumber)?;

fn number(s: &mut Scanner) -> Option<i32> {
s.eat_whitespace();
let negative = s.eat_if('-');
let num = s.eat_while(|c: char| c.is_numeric());
if num.is_empty() {
return None;
}
num.parse::<i32>().ok().map(|n| if negative { -n } else { n })
}

Links

Importance

Also, according to the survey Hayagriva对GB/T 7714—2015的支持情况, this issue accounts for 15% (code_space, whitespace) + 8% (num, leading zero) = 23% of all errors.
(Sorry the survey is not available in English at present.)

Note

That survey is based on CSL-JSON. It specifies "type": "standard" in CSL-JSON to match type="standard" in the CSL style.

However, it's impossible to match type="standard" from *.bib or *.yaml at present, according to the following line.

Kind::Regulation | Kind::Standard | Kind::Treaty => false,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions