doi2bib
is a simple Python script I wrote to automatically retrieve bibtex information for a given DOI. The script queries Crossref to obtain the bibtex. The script can also handle pre-prints published on Arxiv. Here is the content of doi2bib
along with some explaination of what the script does.
doi2bib
#!/usr/bin/env python3
import sys
import argparse
from urllib import request, error
def get_bibtex(doi,ispreprint):
if ispreprint:
= f"https://arxiv.org/bibtex/{doi}"
url else:
= f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
url = request.Request(url)
req
try:
with request.urlopen(req) as response:
return response.read().decode()
except error.HTTPError as e:
return f"HTTP Error: {e.code}"
except error.URLError as e:
return f"URL Error: {e.reason}"
if __name__ == "__main__":
= argparse.ArgumentParser(
parser ="Convert DOI or Arxiv ID to Bibtex"
description
)
parser.add_argument("doi",
help="DOI or Arxiv ID of paper"
)
parser.add_argument("-p",
"--preprint",
help="Treat provided DOI as Arxiv ID",
="store_true",
action=False,
default
)= parser.parse_args()
args
= get_bibtex(args.doi, args.preprint)
bibtex print(bibtex)
- 1
- The script must be provided with a DOI. This can also be an Arxiv ID.
- 2
-
The
-p
or--preprint
flag can be specified to indicate that the provided DOI is an Arxiv ID. - 3
-
The Crossref API is queried with the provided DOI to retrieve the bibtex information. With the
--preprint
flag, the Arxiv API is used.
I pipe the results through the bib-tool CLI, to format the text and generate a unique key. I specify the following key format in my .bibtoolrsc
file.
.bibtoolrsc
print.use.tab=off
fmt.et.al=""
key.format="%-1n(author)%4d(year)%-T(title)"
By default, bibtool uses tabs for indentation. I turn this off. Bibtool adds “.ea” to the author name to indicate “and others”. I prefer to just have the last name of the first author in the key, so I set it to an empty string. I set the format of the key to the last name of the first author, followed by the year of publication and the first meaningful word from the title.
Here is the script in action, I use one of my own publications as an example.
$ doi2bib 10.1145/3522664.3528621 |bibtool -k
@InProceedings{ shome2022data,
series = {CAIN ’22},
title = {Data smells in public datasets},
url = {http://dx.doi.org/10.1145/3522664.3528621},
doi = {10.1145/3522664.3528621},
booktitle = {Proceedings of the 1st International Conference on AI
Engineering: Software Engineering for AI},
publisher = {ACM},
author = {Shome, Arumoy and Cruz, Luís and van Deursen, Arie},
year = {2022},
month = may,
collection = {CAIN ’22}
}
And here is another example using an Arxiv ID (again, one of my own).
$ doi2bib --preprint 2305.04988 |bibtool -k
@Misc{ shome2023towards,
title = {Towards Understanding Machine Learning Testing in
Practise},
author = {Arumoy Shome and Luis Cruz and Arie van Deursen},
year = {2023},
eprint = {2305.04988},
archiveprefix = {arXiv},
primaryclass = {cs.SE}
}