quoth

a command-line tool to record, retrieve, search, and categorize quotes from books

Started learning (and loving) Rust a while ago. I’m porting Peter Shirley’s Ray Tracing mini series to understand how to structure Rust programs without going object-oriented. Making my way through rosalind to solidify my knowledge of Rust’s programming constructs for everyday algorithms. So then I thought I’d look into something I’d never really worked on before - command-line tools. This is an area the language particularly excels at, there’s even a Rust CLI Working Group.

When I’m reading a book and find a noteworthy quote I usually jot it down in a text file. Every now and then someone says something that reminds me of a certain quote and then I dredge up the (long) text file and try to search for it using the author or the book or whatever I remember of the quote. It’s actually pretty easy to do but hey, a quote-taking CLI tool fits my requirements for a Rust learning project just fine. It even lets me talk about our good old raven who won’t say “Nevermore”.

Started out with a list of functionalities to implement -

  • Record quote with author, book, and optional tags
  • Auto-complete quoth commands
  • List recorded quotes / display a random quote, optionally filtered by date, author, book, or tag
  • Delete/change a quote by ID
  • Search by keyword
  • Import/export quotes to/from TSV files
  • Pretty-print quotes

I wanted to use other crates wherever I could, both to get comfortable with the crate ecosystem and as a change from rosalind where most of the algorithms are from scratch (well, except the ones where I could use petgraph, it doesn’t count). The experience was very pleasant and I think I’ll end up using this tool and (series of) post(s) as a sort of usage reference for useful crates. Most of the examples here are shortened/simplified from the full code.

Argument parsing and auto-completion with clap

Parsing command-line arguments was probably the most fun part of the project, since I could go from that list of functionalities to a fully working skeleton tool in around 10 minutes, with the magic that is clap. I chose to use the YAML format for storing the different arguments the tool can take, to keep the code-base clean. Things related to directly modifying quotes (adding, changing, and deleting) are part of the main tool (i.e. you just call quoth to add a quote or use some extra options to modify one):

name: quoth
about: "Add a new quote\nOr use a subcommand to find a quote"
before_help: "Record, retrieve, search, and categorize quotes from books"
args:

- delete:
    help: Delete quote at <INDEX>
    short: d
    long: delete
    value_name: INDEX
    takes_value: true

- change:
    help: Changes quote at <INDEX>
    short: c
    long: change
    value_name: INDEX
    takes_value: true

The other functionalities - searching, listing, displaying a random quote etc. are subcommands, e.g.:

subcommands:
- search:
    about: Finds quotes matching a pattern
    display-order: 1
    args:
    - pattern:
	short: p
	required: true
	index: 1
	value_name: PATTERN
	help: Lists quotes matching <PATTERN>
    - from:
	long: from
	value_name: DATE
	help: Quotes from <DATE>
    - to:
	long: to
	value_name: DATE
	help: Quotes till <DATE>
    - on:
	long: on
	value_name: DATE
	help: Quotes on <DATE>
    - author:
	short: a
	long: author
	value_name: AUTHOR
	takes_value: true
	help: Quotes by <AUTHOR>
	conflicts_with:
	- book
    - book:
	long: book
	short: b
	value_name: BOOK
	takes_value: true
	help: Quotes from <BOOK>
    groups:
    - date-range:
	args:
	- from
	- to
	multiple: true
	conflicts_with:
	- on

The ‘groups’ option allows you to input --from, --to or both, and with the ‘conflicts-with’ option I can specify that --on is incompatible with the other two.

With the YAML file set up, clap can now already show a neat auto-generated help:

quoth_help.png

Now for the actual parsing. The YAML file is parsed into a clap ArgMatches object with which you can query if a particular option was used using matches.is_present("option"). To check which (if any) subcommand was used, matches.subcommand() returns a tuple of ("subcommand_name", subcommand_matches) that you can match on. Here’s the main argument-parsing logic of quoth (where all those <x>_function s represent the code that would get called in each case):

#[macro_use]
extern crate clap;
use clap::App;

fn main() {
    let yaml = load_yaml!("quoth.yml");
    let matches = App::from_yaml(yaml).get_matches();
    if matches.is_present("delete") {
	delete_quote_function()
    } else if matches.is_present("change") {
	change_quote_function()
    } else {
	match matches.subcommand() {
	    ("list", Some(sub_matches)) => list_function(sub_matches),
	    ("search", Some(sub_matches)) => search_function(sub_matches),
	    ("random", Some(sub_matches)) => random_function(sub_matches),
	    ("config", Some(sub_matches)) => config_function(sub_matches),
	    ("parse", Some(sub_matches)) => parse_function(sub_matches),
	    _ => add_quote_function(),
	}
    }
}

I added shell-completion generation as part of the config subcommand. The documented method for generating completion files assumes you have a build_cli() function in a cli.rs file that makes and returns a clap::App. This can then be called both by the main function and the completion-making function. Unfortunately, load_yaml! is written in a way that the App created with App::from_yaml can’t be returned by a function (because it takes a reference of the load_yaml!s returned value which dies by the time you return the App. Here’s an issue about it). My fix was just to write it out again, bit annoying because the macro needs a string literal, so now there are two ugly hard-coded string literals in different parts of the project, the horror! matches.value_of("option") lets you get the value entered for that option (e.g. the author name entered after --author or, in this case, the kind of shell you’re using).

use clap::{App, ArgMatches, Shell};
use failure::Error;
use std::io;
use crate::errors:QuothError;

fn completions(matches: &ArgMatches<'a>) -> Result<(), Error> {
    let shell = utils::get_argument_value("completions", matches)?.ok_or(
	QuothError::OutOfCheeseError {
	    message: "Argument shell not used".into(),
	},
    )?;
    let yaml = load_yaml!("../quoth.yml");
    let mut app = App::from_yaml(yaml);
    app.gen_completions_to("quoth", shell.parse::<Shell>().unwrap(), &mut io::stdout());
    Ok(())
}

pub fn get_argument_value<'a>(
    name: &str,
    matches: &'a ArgMatches<'a>,
) -> Result<Option<&'a str>, Error> {
    match matches.value_of(name) {
	Some(value) => {
	    if value.trim().is_empty() {
		Err(QuothError::NoInputError.into())
	    } else {
		Ok(Some(value.trim()))
	    }
	}
	None => Ok(None),
    }
}

After this, a couple of commands and we’re good to go with shell completions:

quoth config --completions zsh > ~/.oh_my_zsh/completions/_quoth
exec zsh

quoth_completions.gif

Taking command-line input with dialoguer

To add a quote, the user needs to input the author, book title, any tags he/she would like to associate with the quote, and finally the quote text itself. dialoguer lets us take in this kind of input, and there’s even an option to use an external editor like vi/emacs which is perfect for long-form quotes.

use dialoguer::{Editor, Input, theme};

/// Takes user input from terminal, optionally has a default and optionally displays it.
pub fn user_input(
    message: &str,
    default: Option<&str>,
    show_default: bool,
) -> Result<String, Error> {
    match default {
	Some(default) => Ok(Input::with_theme(theme::ColorfulTheme::default())
	    .with_prompt(message)
	    .default(default.to_owned())
	    .show_default(show_default)
	    .interact()?
	    .trim()
	    .to_owned()),
	None => Ok(Input::<String>::with_theme(theme::ColorfulTheme::default())
	    .with_prompt(message)
	    .interact()?
	    .trim()
	    .to_owned()),
    }
}

/// Gets input from external editor, optionally displays default text in editor
pub fn external_editor_input(default: Option<&str>) -> Result<String, Error> {
    match Editor::new().edit(default.unwrap_or(""))? {
	Some(input) => Ok(input),
	None => Err(QuothError::EditorError.into()),
    }
}

quoth_quoth.gif

Parsing dates with chrono and chrono-english

For each quote we need to store the date at which it was recorded so that you can look back at all the quotes you read in a particular time period. chrono is a crate to deal with dates and chrono-english is one that parses dates as we would write them in everyday use to the kind the chrono likes. This means you can do stuff like quoth list --from "last friday" --to monday which is pretty neat!

use chrono::{Date, Utc}
use chrono_english::{parse_date_string, Dialect}

/// Obtain a date from a (user-inputted) string such as "last Friday"
pub fn parse_date(date_string: &str) -> Result<Date<Utc>, Error> {
    Ok(parse_date_string(date_string, Utc::now(), Dialect::Uk)?.date())
}

/// Check if a quote was recorded within a date range
pub fn in_date_range(quote: &Quote, from_date: Date<Utc>, to_date: Date<Utc>) -> bool {
    from_date.and_hms(0, 0, 0) <= quote.date && quote.date < to_date.and_hms(23, 59, 59)
}

Serializing data with serde and serde-json

Now that we have all the information about a quote, we need a way to store it to disk and retrieve it when necessary. serde makes this infinitely easy. A #[derive(Serialize, Deserialize) makes the Quote struct writable and readable, after which serde-json can be used to actually write and read it. I just append a new quote to the end of the quotes JSON file, which isn’t really the right way to format multiple-entry JSON files but serde reads them just fine, so it’s all good. [EDIT: 1/1/2020] - Don’t need this anymore since quotes are serialized and stored directly in sled

#[macro_use]
extern crate serde_derive;
use serde_json;

#[derive(Serialize, Deserialize, Debug)]
pub struct Quote {
    pub index: usize,
    pub book: String,
    pub author: String,
    pub tags: Vec<String>,
    pub date: DateTime<Utc>,
    pub quote: String,
}

impl Quote {
    fn write(&self, quoth_dir: &PathDir) -> Result<(), Error> {
	let quote_json = serde_json::to_string(self)?;
	let quote_file = PathFile::create(quoth_dir.join(config::QUOTE_PATH))?;
	quote_file.append_str(&quote_json)?;
	Ok(())
    }
    /// Read quotes from a JSON file and return consumable iterator
    fn read_from_file(
	json_file: &PathFile,
    ) -> Result<impl Iterator<Item = serde_json::Result<Self>>, Error> {
	Ok(serde_json::Deserializer::from_reader(FileRead::open(json_file)?).into_iter::<Self>())
    }
}

Importing/Exporting TSV files with csv

I already have a text file of quotes built up over the years and it would be pretty weird to enter them into quoth one by one. So quoth import and quoth export use the csv crate to read/make TSV files of quotes.

use csv;

fn import(quote_index: usize, filename: &PathFile) -> Vec<Quote> {
    let mut reader = csv::ReaderBuilder::new()
		    .delimiter(b'\t')
		    .from_path(&tsv_file)?;
    let quoth_headers: HashMap<&str, i32> = [
	("TITLE", 0),
	("AUTHOR", 1),
	("TAGS", 2),
	("DATE", 3),
	("QUOTE", 4),
    ].iter().cloned().collect();
    let header_indices: Vec<_> = reader
	.headers()?
	.into_iter()
	.map(|h| quoth_headers.get(h.to_ascii_uppercase().as_str()))
	.collect();
    let mut quotes = Vec::new();
    let mut quote_index = quote_index;
    if [0, 1, 4].iter().all(|x| header_indices.contains(&Some(x))) {
	for record in reader.records() {
	    let mut quote_data = ("", "", "", Utc::now(), String::new());
	    let record = record?;
	    for (entry, index) in record.into_iter().zip(header_indices.iter()) {
		if let Some(i) = index {
		    match i {
			0 => quote_data.0 = entry,
			1 => quote_data.1 = entry,
			2 => quote_data.2 = entry,
			3 => quote_data.3 = utils::parse_date(entry)?.and_hms(0, 0, 0),
			4 => quote_data.4 = entry.into(),
			_ => {
			    return Err(QuothError::OutOfCheeseError {
				message: "Please Reinstall Universe And Reboot".into(),
			    }
			    .into())
			}
		    }
		}
	    }
	    quotes.push(Quote::new(
		quote_index,
		quote_data.0,
		quote_data.1,
		quote_data.2,
		quote_data.3,
		quote_data.4,
	    ));
	    quote_index += 1;
	}
    }
    Ok(quotes)
}

fn export(quotes: &[Quote], filename: &PathFile) {
    let mut writer = csv::WriterBuilder::new()
	.delimiter(b'\t')
	.from_path(PathFile::create(filename)?)?;
    for quote in quotes {
	writer.serialize(TSVQuote::from(quote))?;
    }
    writer.flush()?;
    Ok(())
}

TSVQuote is a struct with all the same stuff as Quote except the tags are in a comma-separated string, and the date is more human-readable.

Key-value databases with sled

Just to make it harder for myself I decided to store key-value databases linking authors to quotes, books to quotes, tags to quotes, and authors to books. Right now it doesn’t really make a discernible performance difference (unless you have a million quotes or something I guess?) but sled looked cool and I have some ideas for improving on this later.

Every time you add a quote, the quote’s index gets added to the (semicolon-separated byte array representation of a) list stored with the sled trees of the corresponding author, book and tags. If it’s a new book then the book gets added to the book list of the corresponding author. This last thing is so that, later, I can see how many books I’ve read from a particular author. [EDIT: 1/1/2020] - Turns out I’d really made it difficult for myself. sled now has databases with multiple trees inside them so I could use a single database to store all the links. No need for an extra Metadata struct any more either. So instead of the get_tree function, I could use self.db.open_tree("author_quote")? etc.

extern crate sled;
use sled;

/// If key exists, add value to existing values - join with a semicolon
fn merge_index(_key: &[u8], old_indices: Option<&[u8]>, new_index: &[u8]) -> Option<Vec<u8>> {
    let mut ret = old_indices
	.map(|old| old.to_vec())
	.unwrap_or_else(|| vec![]);
    ret.extend_from_slice(&[config::SEMICOLON]);
    ret.extend_from_slice(new_index);
    Some(ret)
}

/// Retrieve a `sled` tree from a given path
fn get_tree(path: &PathDir) -> Result<sled::Tree, Error> {
    let config = sled::ConfigBuilder::new()
	.path(path)
	.merge_operator(merge_index)
	.build();
    let tree = sled::Tree::start(config)?;
    Ok(tree)
}

/// Add a book to the trees and change metadata accordingly
fn add_book(
    trees: &mut Trees,
    author_key: &[u8],
    book_key: &[u8],
    index_key: &[u8],
) -> Result<(), Error> {
    trees
	.author_book_tree
	.merge(author_key.to_vec(), book_key.to_vec())?;
    trees
	.book_quote_tree
	.set(book_key.to_vec(), index_key.to_vec())?;
    trees
	.book_author_tree
	.set(book_key.to_vec(), author_key.to_vec())?;
    trees.metadata.increment_books();
    Ok(())
}

Then when you want to filter quotes by author, book, or tag, the corresponding sled databases are used to figure out which quote IDs to look at. In practice this still means streaming the quotes JSON file and checking ID by ID but since they’re inserted into the databases in sorted order we can break once the last ID is found. Would be nice to have a constant-time lookup to any line in the file but I couldn’t find a simple way to do this (e.g. csv-index may be nice but the index has to be regenerated at each update, including just appending a line, which sounds too messy for something this simple).

[EDIT: 1/1/2020] This part changed drastically when I figured out I could serialize quotes with bincode. So instead of the retrieve_many function, I had these defined on the Quote struct:

impl Quote {
    pub fn to_bytes(&self) -> Result<Vec<u8>, Error> {
	Ok(bincode::serialize(&self)?)
    }
    pub fn from_bytes(bytes: &[u8]) -> Result<Self, Error> {
	Ok(bincode::deserialize(bytes)?)
    }
}

And a sled quote tree that mapped quote indices to serialized quotes. This simplified a lot of the code.

/// Retrieve quote indices from a given book
pub fn get_book_quotes(trees: &Trees, book: &str) -> Result<Vec<usize>, Error> {
    utils::split_indices_usize(
	&trees
	    .book_quote_tree
	    .get(&utils::camel_case_phrase(book).as_bytes())?
	    .ok_or(QuothError::BookNotFound {
		book: book.to_owned(),
	    })?,
    )
}

/// Retrieve many quotes given indices
pub fn retrieve_many(indices: &[usize], quoth_dir: &PathDir) -> Result<Vec<Quote>, Error> {
    let mut indices = indices.to_vec().into_iter().peekable();
    let mut quote_stream = Quote::read(quoth_dir)?;
    let mut quotes = Vec::new();
    while let Some(index) = indices.peek() {
	let quote = quote_stream
	    .next()
	    .ok_or(QuothError::QuoteNotFound { index: index })??;
	if quote.index == index {
	    quotes.push(quote);
	    indices.next().unwrap();
	}
    }
    Ok(quotes)
}

Pretty-print with console and textwrap

The next task is to display quotes in the terminal with the magnificence that they deserve. console lets you add colors and formatting to terminal text, and textwrap does exactly what it says on the tin, along with conveniently giving you the current width of the terminal.

extern crate console;
extern crate textwrap;
use console::{pad_str, style, Alignment};
use textwrap::{termwidth, Wrapper};

pub fn pretty_print(quote: &Quote) {
    let width = termwidth() - 4;
    let wrapper = Wrapper::new(width)
	.initial_indent("  ")
	.subsequent_indent("  ");
    let quote_text = format!("\"{}\"", &quote.quote);
    println!("\n{}", wrapper.fill(&quote_text));
    println!(
	"{}",
	style(pad_str(
	    &format!("--#{}--", quote.index),
	    width,
	    Alignment::Center,
	    None
	))
	.dim()
    );
    println!(
	"{}",
	style(pad_str(&quote.author, width, Alignment::Right, None)).magenta()
    );
    println!(
	"{}",
	style(pad_str(&quote.book, width, Alignment::Right, None))
	    .cyan()
	    .italic()
    );
}

Here’s how it looks, I even added a tiny raven-looking unicode symbol:

quoth_pretty_print.png

Search with regex

I ended up using regex to implement the search feature, though later on I’d love to figure out how to use skim, a fuzzy search crate that also highlights the matched text. Right now it’s a pretty basic multi-word non-contiguous search, could use some fancification.

extern crate regex;
use regex::Regex;

fn search(quotes: Vec<Quote>, matches: &ArgMatches<'a>) -> Result<(), Error> {
    let pattern = utils::get_argument_value("pattern", matches)?.unwrap();
    /// i - case-insensitive
    /// m - multi-line mode
    /// s - allow . to match \n
    let pattern = Regex::new(&format!(
	r"(?ims){}",
	pattern.split_whitespace().collect::<Vec<_>>().join(".+")
    ))?;
    for quote in &quotes {
	if pattern.is_match(&quote.to_string()) {
	    quote.pretty_print();
	}
    }
    Ok(())
}

Error management with failure

I started out using error-chain for error management but switched to failure after seeing this post [EDIT: 1/1/2020] - I decided to change again to anyhow and thiserror since they seem the most intuitive at the moment. This (slightly messy) commit shows all the changes - but essentially it’s the same as above except that you use thiserror::Error instead of Fail, and instead of #[fail(display = "I don't know who {} is.", author)] you would write #[error("I don't know who {author:?} is.")]. . I must say, error-management in Rust is a breeze. Coming from mostly using Python where you usually ignore these kinds of things, I was expecting it to be much harder. I made an enum of Error types deriving Fail for all the possible quoth-related errors that could occur, including a catch-all for the ones that shouldn’t happen at all (I used this with a custom message instead of unwrap-ping trivial things). All errors, thrown by quoth or by other libraries, are caught by a Result<x, Error> and ?s everywhere.

#[macro_use]
extern crate failure
use failure::Error;

#[derive(Debug, Fail)]
pub enum QuothError {
    #[fail(display = "I don't know who {} is.", author)]
    AuthorNotFound { author: String },
    #[fail(display = "You haven't written that quote: {}.", index)]
    QuoteNotFound { index: usize },
    #[fail(display = "I haven't read {} yet.", book)]
    BookNotFound { book: String },
    #[fail(display = "You haven't tagged anything as {} yet.", tag)]
    TagNotFound { tag: String },
    /// Thrown when no text is returned from an external editor
    #[fail(display = "Your editor of choice didn't work.")]
    EditorError,
    /// Thrown when explicit Y not received from user for destructive things
    #[fail(display = "{}\nDoing nothing.", message)]
    DoingNothing { message: String },
    /// Catch-all for stuff that should never happen
    #[fail(display = "{}\nRedo from start.", message)]
    OutOfCheeseError { message: String },
}

Path handling with dirs and path-abs

I’m not entirely sure if this is the community-blessed way to do path handling and configuration file management, especially since I haven’t tested quoth on other OSs yet, so this section (and corresponding code) may be rewritten.

So, quoth needs a directory to store all its data and it needs to know where this directory is. For now I have a config file at $HOME/quoth.txt with a single line containing the location of the quoth data directory (which defaults to $HOME/.quoth). Later on (if (when!) I extend this project) this config file could store a lot more configurable options (e.g. setting a default external editor, separating fiction and non-fiction quotes into different quoth invocations, etc.). dirs is a teeny-tiny crate that lets you find a user’s $HOME directory on Linux, Windows and macOS. path-abs is a crate for path and file operations. Both crates are pretty straightforward to use and quite well documented but here’s an example anyway.

/// Makes config file (default ~/quoth.txt) with a single line containing the location of the quoth directory (default ~/.quoth)
fn make_quoth_config_file() -> Result<(), Error> {
    match dirs::home_dir() {
	Some(home_dir) => {
	    let config_file = PathFile::create(PathDir::new(&home_dir)?.join(config::CONFIG_PATH))?;
	    config_file.write_str(
		&PathDir::new(home_dir)?
		    .join(config::QUOTH_DIR_DEFAULT)
		    .to_str().unwrap(),
	    )?;
	    Ok(())
	}
	None => Err(QuothError::Homeless.into()),
    }
}

A(n incomplete) cheatsheet of sorts:

  • dirs::home_dir() tries to find the $HOME of a user, returning None if it can’t.
  • PathFile::create returns an existing file or makes and returns an empty new one as a PathFile. PathDir::create is similar but for directories. Both are not recursive, so if you give something like PathDir::create("nonexistent-dir-1/nonexistent-dir-2") then you’ll get an Error - recursive creation can be done with the create_all variant.
  • PathDir::new (and the corresponding PathFile::new) makes a new PathDir (or PathFile) object from filename string but Errors out if the actual directory (or file) doesn’t already exist.
  • PathDir::join is for joining paths, it doesn’t actually check if the newly constructed file or directory exists.
  • PathFile::write_str is quite destructive so careful - it writes a &str to the PathFiles file but truncates everything in the file if it exists and creates a new file if it doesn’t [EDIT: 1/1/2020] - You need PathOps for performing operations (like reading and writing) on files now. .
  • PathFile::read_string returns a String of the entire file’s contents.
  • A PathAbs represents a possibly nonexistent file or directory (while PathFile and PathDir have to exist or are made during creation).

Wrapping up

Good enough start for now, though there’s definitely some more polishing required. One thing that would be handy is if you could auto-complete book and author names based on what you’ve entered before. No idea how to do this though. I plan to keep the examples in this post and the code updated to reflect any changes in the crates/ecosystem, including changing the crate itself if a new community favorite emerges.

I’d like to stretch the example even further and look into plotting graphs of books read per month etc. (tui looks interesting)[EDIT: 1/1/2020] - Did this now! see the stats function in quoth/mod.rs

Maybe also some basic NLP, with word-clouds and recommending books by comparing your quotes to a database of quotes? Good excuse to search for some more Rust crates.



For comments, click the arrow at the top right corner.