Ask HN: A tool for writing English that checks “popularity” of used sentences?

IanCal · on Oct 21, 2016

You might like to check out writeful: http://writefullapp.com/

3ot · on Oct 21, 2016

Wow, perfect. I've been looking for something like this for years.

pythonbull · on Oct 23, 2016

Great app. Are you working on Android version as well?

IanCal · on Oct 23, 2016

Not my app, I'm afraid :) I just found out about it through my company.

antaviana · on Oct 20, 2016

AFAIK, an ex-Googler had that very same itch and he founded http://www.linguee.com to try to solve it.

rossng · on Oct 20, 2016

I've found Linguee very useful for English -> French translation.

I think it draws heavily from the huge corpus of professionally translated EU regulations and documents.

bbotond · on Oct 21, 2016

Agreed, it has been extremely useful to me too for translating various Hungarian technical terms into English. Naming classes and database tables is much easier this way because most often the right term cannot be found in even the most detailed technical dictionaries but Linguee somehow just knows it. And it also shows the context so you can be very confident in your choice.

barryhunter · on Oct 20, 2016

There are quite a few Ngram datasets available https://www.google.com/search?q=download+n-gram+dataset

... these are almost certainly used in many spelling and grammar checkers. (To help with where the same spelled word is used in different context)

http://www.aclweb.org/anthology/W12-0304

twa927 · on Oct 20, 2016

Yes, I remember trying to use Google Books Ngram Dataset [1], but it was too tedious for me to setup and maintain a server with the data for a purpose of a quick-and-dirty tool (that's why I asked for a ready API). Still, using it is probably a nice idea for a more ambitious side project or even a startup.

EDIT. Actually I would happily pay for a tool that implements the idea. Grammarly has paid plans but $30/month is too steep (for my types of usages), and the types of grammar checks it performs is not exactly what I need (which is what real people in real situations use).

[1] http://storage.googleapis.com/books/ngrams/books/datasetsv2....

plusepsilon · on Oct 20, 2016

We (foxtype) actually have a dev tool that does exactly this.

If we publish it as an online tool do you think people will find it useful?

We have multiple corpora, some language models built in neural networks, etc.

mrob · on Oct 20, 2016

LanguageTool has limited support for using Google's n-gram data to find spelling errors. It only uses 3-grams, and only for a list of commonly confused words. I'm not aware of any Free Software that does better.

http://wiki.languagetool.org/finding-errors-using-n-gram-dat...

aytekin · on Oct 20, 2016

I wonder if there is a tool like this:

1. You enter a sentence

2. It gives out 5 different ways to say the exact same thing.

Such a tool not only would help ESL people but also it would help native speakers find more relaxed or formal versions of a sentence.

infinitone · on Oct 20, 2016

We're building a tool that does something similar but for email. Currently we're targeting cold sales emails- the idea is, you enter a recipient's email and we aggregate data about them and surface relevant, personable sentences that you can use in the email. You'll also be able to change the tone of these sentences (funny, professional, casual, etc.)

Learn more: http://emailfox.co

richardwhiuk · on Oct 20, 2016

That's ... disgustingly creepy on first glance.

rexpop · on Oct 24, 2016

How do you mean? A tool to help you relate to the folks you're cold-emailing? It seems like a way to find common ground.

Maybe I am misreading you, but perhaps you think the effect is a dishonest one? That because the language EmailFox helps you find isn't the phrasing you improvised at first blush, it's not your copy?

If the UX is strong, think of the wonders this could do for non-fiction writing!

taneq · on Oct 21, 2016

Something like this will only work if your clients don't screw it up by spamming the same targets over and over.

If you can somehow get it through to your clients that they should only ever spam one target once, then hell, I'm for it.

infinitone · on Oct 21, 2016

Yup, agreed. I think overtime we'll also try to compensate for this by build a ML model from all the emails being sent to provide more 'fuzziness' to it.

cgio · on Oct 21, 2016

don't want to sound harsh, but the copy of your website does not inspire much confidence for a tool that i am supposed to use for writing.

infinitone · on Oct 21, 2016

You're not harsh at all- i'd appreciate feedback, what parts do you think we should improve?

I'll admit we kind of rushed the sign up page as we've been busy building the product.

matt4077 · on Oct 20, 2016

It's a pretty hard problem to solve in for the general case. It's actually quite rare for sentences to not be unique. Example: https://www.google.de/search?rls=en&q=%22It+gives+out+5+diff...

twa927 · on Oct 20, 2016

Yet both "It gives out" and "different ways to say the exact same thing" give many thousands of results. So "5" should be recognized as a template variable or the meaning should be combined from popular fragments.

kwhitefoot · on Oct 21, 2016

Perhaps one way to make such a tool would be to try replacing each word with one of its synonyms but that wouldn't change the form of the sentence.

I thought at first that it might be worth using Google Translate (or Bing, etc.) to translate a sentence from English into a language and back again. I was surprised to find that most of the results are grammatically incorrect and at least one, while almost grammatically correct, has a quite different meaning (Eng. -> Latin -> Eng.).

And unfortunately there wasn't much variation to be seen either.

It seems that the verb to be is easily mangled and that some languages seem to require a definite article where the original English does not.

Original English: It's actually quite rare for sentences to not be unique.

Spanish: En realidad es bastante raro para frases para no ser único.

Back to English: It's actually quite rare for phrases not to be unique.

German: Es ist eigentlich ziemlich selten für Sätze nicht einzigartig sein.

Back to English: It's actually quite rare for phrases not to be unique.

Japanese: 文章は一意ではないことは実際には非常にまれです。

Back to English: Sentence is very rare in practice is not unique.

French: Il est en fait assez rare pour les phrases à ne pas être unique.

Back to English: It is actually quite rare for phrases not to be unique.

Polish: To rzeczywiście dość rzadko zdania nie być unikalna.

Back to English: It's actually quite rare sentence not be unique.

Hebrew: זה בעצם די נדיר משפטים לא להיות ייחודיים.

Back to English: It's actually quite rare sentences not be unique.

Italian: In realtà è abbastanza raro per le frasi di non essere unico.

Back to English: It's actually quite rare for the sentences not to be unique.

Latin (but which era?): Suus 'vere non esse unica sententia admodum rarum.

Back to English: It's really not very rare, is a single sentence.

Romanian (which some say is similar to Latin!): Este de fapt destul de rar pentru fraze să nu fie unic.

Back to English: It's actually quite rare for phrases is not unique.

infinitone · on Oct 20, 2016

Check out http://foxtype.com - does some of that but more grammar-like heuristics such as conciseness, complexity.

On a side note, I'm part of a team working on http://emailfox.co which will provide 'Smart Sentences' for you when composing an email, based on a recipient. Allowing you to write personal, relevant emails faster.

angry-hacker · on Oct 20, 2016

People on mobile: looks like foxtype already is a product, it's a chrome extension.[0]

On mobile it just asks your email so they tell you when they launch (on mobile?). Horrible to have landing pages like that. Absolutely useless product page on mobile.

[0] https://chrome.google.com/webstore/detail/foxtype/npcfiblhbj...

twa927 · on Oct 20, 2016

I like the idea of composing sentences from some high-level pre-checked blocks (the new Google's Assistant seems to do this also). But this doesn't fit my use cases because when I pasted a sentence with a grammar error, it didn't tell me there's an error.

rtrsqrrl · on Oct 20, 2016

Try http://www.netspeak.org/?locale=en it seems to do some of the things you asked. It is implemented on top of n-gram corpora.

twa927 · on Oct 20, 2016

It looks helpful, but I would like to paste a whole document and be told which fragments look suspicious because of low popularity. The site requires putting wildcards and completes only a single n-gram.

rtrsqrrl · on Oct 20, 2016

I did not get that from your question. This takes parts of a sentence (i think at most 5-grams) and a few operators e.g.:

If you ask for similar words to 'much' in a fragment.

  'and knows ... #much ...'

  =>
  
  and knows a lot, 3.500, 65,2%
  and knows a lot about, 2.100, 39,5%
  and knows a great deal, 690, 12,6%
  and knows much, 630, 11,5%
  and knows lots, 380, 7,1%
  and knows lots of, 300, 5,5%
  and knows a good deal, 100, 1,9%
  and knows practically, 53, 1,0%
  and knows very much, 45, 0,8%

Xeoncross · on Oct 20, 2016

You could probably use some of the Ngrams datasets to figure this out. Parse some books from https://www.gutenberg.org/ or use the google ngrams corpus. Pay attention to the year(s) which you wish to model english from - grammar and form keep changing!

rebelde · on Oct 20, 2016

I have been thinking of doing something like this (using Ngrams for grammar check for non-natives) for a while. I would be happy to fund development if you or somebody else are interested in working on it.

franciscop · on Oct 20, 2016

From XKCD themselves, an editor that only allows for common words: https://xkcd.com/simplewriter/

0xdeadbeefbabe · on Oct 20, 2016

www.grammarly.com (haven't tried it though) In the demo they showed it turning a sentence into a more colloquial sentence.

I'm a native English speaker, and I'd like to know appropriate punctuation for a given combination of words. I'd like to search through a list.

vittore · on Oct 20, 2016

Thank you, I especially like macOS editor.

ChicagoBoy11 · on Oct 20, 2016

When I'm conflicted about different phrasings of things (for instance, if there is a hyphen or there isn't on when writing compound words), I usually just use a google search and go with whatever result has the most number of hits. That could be a suitable enough proxy for your use-case, and perhaps you could just use the google search service as an API...

Of course, the RIGHT way to do this would be to use the n-gram datasets that people here have suggested :-)

mrtimuk · on Oct 21, 2016

In FAQ: "Why does Google Books only provide feedback on 5 tokens or less?"

You mean "..feedback only for 5 tokens or FEWER?" Use your app! ;) //runs away

camelite · on Oct 20, 2016

Some thing like this: http://corpus.byu.edu/bnc/ ?

hendler · on Oct 20, 2016

To improve the qualitative aspects of writing, in this case for job listings primarily, check out https://textio.com/. There's no API, but I think it will help you think about what "popular" language means.

nl · on Oct 20, 2016

What you want is a language model. This will give you the probability on a word by word basis.

Something like [1] is pretty much state-of-the-art. It's worth noting that the kind of writing you are doing change the probability significantly. [2] shows this quite well.

[1] https://colinmorris.github.io/lm-sentences/#/billion_words

[2] https://colinmorris.github.io/lm-sentences/#/brown_romance

adrianratnapala · on Oct 21, 2016

Bah, if you have good reason to be confident that your sentence is correct even if English speakers might feel it is wrong, then I say you should just write it anyway.

I like to read such things because it makes me think about what is being said and how the language works. If we always use "popoular" patterns then our writing becomes cliched and boring and people's eyes will glide right over it.

QuantumRoar · on Oct 21, 2016

You have a point. But as a non-native speaker trying to learn a language, you aim to become so fluent that people will not notice you're foreign. You want to be able to play with the language.

A big part of learning a language is to become familiar with frequent speech patterns and slang. A language is not a sterile set of words with attached grammar but a slippery gelatinous blob that molds itself to the culture and people. Spoken languages are quite lively. If I want to integrate myself and joke around with natives, I need to learn to mold it the same way as natives to. In order to learn how to do that, you first have to start imitating.

adrianratnapala · on Oct 21, 2016

Perhaps it also depends on your learning history.

Right now I live in Germany, and speak pretty ungramatically, but from being here I copy a lot of everyday idiom without really understanding it. So what I would like is the opposite of what you are looking for: confidence that my German sentences (especially written ones) are formally correct.

I don't mind if that makes me look like a well taught foreigner. Right now I sound like a badly taught foreigner.

KayL · on Oct 21, 2016

If you can read Chinese, there's interesting tool:

http://www.pigai.org/guest2016.html

It extracted common phrase from the sentences with explanations & suggestions & count usages from corpus.

ecesena · on Oct 20, 2016

Never found it, but if you build it count me in as a user. Same issue, same solution.

plusepsilon · on Oct 20, 2016

Thanks for the mention above (foxtype.com).

We're currently building an online editor checks

plusepsilon · on Oct 20, 2016

Oops, accidently sent.

We're currently building an online editor that:

1 checks for compatibility of words in a sentence (essentially popularity) 2 give example sentences for a certain word 3 word suggestions depending on context.

Language models would be a decent way to check popularity though it would be noisy. Sentence level rewrites would be hard unless you make it template driven.

hyperpallium · on Oct 20, 2016

\incidental Use quotes (") for exact match, not apostrophes (').

0b01 · on Oct 20, 2016

https://github.com/rickyhan/bodine

This is a tiny tool I wrote a long time ago. There's also writefullapp.com which is closed source.

kamillarott · on Oct 21, 2016

I can suggest http://samedaypapers.com/. It always helps me)))