In what sense are these theories a "basis" to NLP? Did they have any influence? Do they bring any practical contributions? I suspect a slight similarity between popular domains (Wittgenstein and NLP) was contrived into an article that seems very light on the W part.
The "Wittgenstein’s theories" that appear here is just that "the meaning of a word is its use in the language". If such a plain concept was all of Wittgenstein’s theories, he would be long forgotten.
For centuries, dictionaries have presented words through one or several explanations as well as quotes and examples. 150 years ago, Émile Littré wrote a wonderful French dictionary that contains 80,000 words and about 300,000 literary quotes. He knew no word has a simple and permanent meaning, and that one needs to know many real world contexts to get a fine view on a word.
Yeah, it's a weird kind of philosophy clickbait or something. An only slightly less valid alternate subheading might be "Football is the basis of all modern NLP"[1]
For Wittgenstein - in philosophy, one always refers to the early or later Wittgenstein; they're totally different, and saying 'for Wittgenstein' without specifying which one doesn't make a lot of sense. The early (Tractatus) one had a picture theory of language.[0]
[1] "One day when Wittgenstein was passing a field where a football game was in progress the thought first struck him that in language we play games with words. A central idea of his philosophy, the notion of a ‘language-game’, apparently had its genesis in this incident." - Norman Malcom, Ludwig Wittgenstein: A Memoir
I can't resist a more entertaining extract from that book:
"My wife once gave him some Swiss cheese and rye bread for lunch, which he greatly liked. Thereafter he would more or less insist on eating bread and cheese at all meals, largely ignoring the various dishes that my wife prepared. Wittgenstein declared that it did not much matter to him what he ate, so long as it was always the same. When a dish that looked especially appetizing was brought to the table, I sometimes exclaimed 'Hot Ziggety!' - a slang phrase that I learned as a boy in Kansas. Wittgenstein picked up this expression from me. It was inconceivably droll to hear him exclaim 'Hot Ziggety!' when my wife put the bread and cheese before him. ...
One of Wittgenstein's favourite phrases was the exclamation, 'Leave the bloody thing alone!' He delivered this with a most emphatic intonation and mock solemnity of expression. It had roughly the signification that the thing in question was adequate and one should not try to improve it. He used it on a variety of occasions: one time meaning that the location of his bed was satisfactory and it should not be moved; another time, that the mending that my wife had done on a jacket of his was sufficient and that she should not try to make it better."
"Ludwig Wittgenstein was almost certainly autistic. Several notable psychiatrists, such as Christopher Gillberg in A Guide to Asperger Syndrome, have written extensively about the evidence backing this assertion."
Thanks!, that was very interesting. I don't remember reading anywhere that LW had Asperger's (which is what that writer means by 'autistic' I guess), but it seems to fit perfectly. I've just tried to get to know him without labelling. I was close friends with people like Nietzsche and Kierkegaard, but never felt friends with LW hehe. I didn't get far into the Tractatus, probably just a few lines. But I've enjoyed everything else he wrote, always thought-provoking. A fascinating character, with a great sense of humour--he had a friend with whom he used to exchange picture postcards with very silly irrelevant things written on them. Wish I could remember what book I saw those in. Monk, maybe. Also, he was born extremely rich, but believed money corrupts people, so gave it all away to his siblings. When asked why them, since they were already rich, he said 'They're already corrupted'...
(Hmm It just struck me, maybe that explains some of Thoreau too? He didn't seemed incapable of friendship in the normal sense, said No to everything, preferred solitude, seemed utterly unlike other people, his 'duties to himself' overruled all others etc.)
A while ago, I commented here[1] to the effect that there's a good fit between the treatment of meaning in Wittgenstein's Philosophical Investigations (PI) and the neural net / connectionist approach, calling it "decentralised, statistical, emergent" in contrast to more cognitivist ideas.
I was challenged to justify my buzzword-laden characterisation, and I reproduce my response here as it seems relevant to your question:
> Right at the start of PI, the "Augustinian picture" of meaning is set out: "words in language name objects - sentences are combinations of such names" a picture where "every word has a meaning. This meaning is correlated with the word. It is the object for which the word stands." And so (later, #81) we "think that if anyone utters a sentence, and means or understands it, he is operating a calculus according to definite rules." This is the view of language which PI aims to - sorry, buzzword incoming - disrupt.
> By contrast, PI puts the case that there is no central unifying model applicable to all instances of language use, rather: "I am saying that these phenomena have no one thing in common which makes us use the same word for all - but that they are related to one another in many different ways, and it is because of this relationship, or these relationships that we call them all language." (PI #65)
> There follows a discussion of vagueness. In the Augustinian picture where the meaning of a word is an object, and a calculus of these objects is performed, it is difficult to avoid the consequence that meanings are exact. PI uses the example of defining the word "game": "How should we explain to someone what a game is? I imagine that we should describe games to him, and we might add: "This and similar things are called 'games'". And do we know any more about it ourselves? [...] But this is not ignorance. We do not know the boundaries because none have been drawn. [...] we can draw a boundary - for a special purpose. Does it take that to make the concept usable? Not at all! [...] One might say that the concept 'game' is a concept with blurred edges."
> So much for "decentralised" and "statistical". As far as "emergent" goes, I think even with the large variance in readings of PI it's uncontroversial to say that it seeks to ground meaning and understanding in relation to "customs" or social practices rather than in some variety of metaphysical correspondence between language and reality required by different variations of the Augustinian picture. In this sense, meaning emerges from the use of words relative to these cultural forms.
> Connectionism, as an investigative paradigm, (oops, buzzword!) is simpler (I believe) in that it doesn't require the identification of an actual realised model (or "mental mechanism") such as a neural encoding of a "language of thought" or cognitive frames, etc., in the brain - it "just" requires that a bunch of simple elements can result in complex rule-following behaviour without needing to explicitly encode the rules. Hopefully the quotes above will go some way to indicate how this programme is philosophically somewhat in tune with PI.
> (Indeed the extensive sections on samples and teaching language games are eerily reminiscent of descriptions of training neural nets, now that I think of it... "How do I explain the meaning of 'regular', 'uniform', 'same' [...] if a person has not yet got the concepts? I shall teach him to use the words by means of examples and by practice - And when I do this I do not communicate less to him than I know myself." (PI #208))
NLP via word associations might get you closer to capturing meaning in a Wittgensteinian conception than other methods (I happen to think the PI should be required reading for people studying NLP). But he would probably say the inputs into an NLP model would need to capture a much wider range of human culture and behavior beyond co-occurrences and symbols.
And even if you had a very complete model - for him "meaning" something through an utterance was likely something inextricably linked with being human engaged in social acts. (See: the private language argument https://en.wikipedia.org/wiki/Private_language_argument)
One thing I learned from reading Wittgenstein is that language is far more than speech and words. If you want to understand language, or rather "communication" in a Wittgensteinian-way, I think you'd have better luck analysing YouTube videos, where body language, environment, context, intonation, prosody, implicit emotion, etc is far more visible, rather than just analysing text.
It's such a weird, but somewhat freeing perspective on language, where I once thought that the for instance the whistle language is absolutely fascinating and perplexing, it just makes perfect normal sense to me now. As long as you can recognize the human through the medium, there is room for language, and that's so cool! There's so much room for playing with communication and interaction I never thought about thanks to his writing, it also really makes me understand Jazz as a "language" way better, I never understood the almost pretentious way people talked about Jazz, "it's different, hard to explain if you never experienced it for yourself!", it really, truly is a conversation in music, super interesting.
> he would probably say the inputs into an NLP model would need to capture a much wider range of human culture and behavior
I get what you're saying - that something deserving the name "language" way outstrips the simple word embedding strategies in TFA, right? and I wholeheartedly agree - but my point is that the position of PI, at least as I read it, is fundamentally opposed to the idea that modeling - what he calls "a mental mechanism" - is how language proceeds. This is one of the major points of divergence between PI and the Tractatus (which posits an explicit correspondence or modeling relation between language and the world.)
The most explicit formulation of this that comes to mind is in PI #689:
> "There must surely be a further, different connexion between my talk and N, for otherwise I should still not have meant HIM."
> Certainly such a connexion exists. Only not as you imagine it: namely by means of a mental mechanism.
The point is very much that PI's emergent view of meaning is quite in tune with the connectionist sentiment that a bunch of simple elements (i.e. neurons, either in the brain or in some AI project) can exhibit the rule learning and rule following capacities at the centre of PI's treatment of language, without any model of the elements of language needing to be explicitly encoded.
The degree to which such a model may or may not exist, embedded within the parameterisation of a language-capable set of neural components, is an interesting question. But even granting its "existence" I think the point would be that the basic properties of the set of components, coupled with the training and samples provide the capability and the "model," if we want to grant its existence, is effectively a by-product - epiphenomenal if you like. :)
You’re being refuctive in your framing of his early work. At the same time as Wittgenstein was playing with words, the logical positivists were attempting to go the opposite route, and find language that both expressed meaningful things about the world and was also rigorously well formed; the meaning is a mathematical logical implication formed from empirical observations.
Naturally, they failed. Wittgenstein represents the other end of the spectrum: not only will words not save us from meaninglessness and irrationality, words can lead us straight towards them in the guise of reason.
As always, the celebrated hero is standing on giants, and they symbolize the progress rather than singlehanded executing a large leap themselves. You could apply a similar disdain to most scientists and philosophers we learn about.
It's really difficult to overstate how important embeddings are going to be for ML.
Word embeddings have already transformed NLP. Most people I know, when they sit down to work on an NLP task, the first thing they do is use an off-the-shelf library to turn it into a sequence of embedded tokens. They don't even think about it; it's just the natural first step, because it makes everything so much easier.
In the last couple years, embeddings for other data types (images, whole sentences, audio, etc.) have started to enter mainstream practice too. You can get near-state-of-the-art image classification with a pretrained image embedding, a few thousand examples, and a logistic regression trained on your laptop CPU. It's astonishing.
(Note: I work on https://www.basilica.ai , an embeddings-as-a-service company, so I'm definitely a little bit biased.)
It's an exciting time for sure. To the layman this feels like the first real progress we've had towards AI since the 70s. It seems like the field kind of wandered off into the realms of pure mathematics for a few decades with little tangible progress, but now we're getting stories every few weeks about how computers can recognize objects in pictures or compose new music or whatever.
Figuring out how to process context is important for NLP, no question.
But I think this is probably wrong on Wittgenstein. I'm pretty sure his entire point in the Philosophical Investigations was that "meaning" is exactly NOT probabilities of symbol co-occurrence, or just names of objects in the world. Symbols acquire meanings from their use by humans. Accounting for context in NLP via probabilities of occurrence might be useful in better reproducing language, but we should be careful not to say that this is the essence of meaning and language.
>Accounting for context in NLP via probabilities of occurrence might be useful in better reproducing language, but we should be careful not to say that this is the essence of meaning and language.
Yes, and the article actually includes evidence in favor of this and against its own conclusion. It mentions that vector "cat" is closer to vector "dog" than vector "dog" is to vector "dogs," which makes sense if you interpret it as a measure of appearance in sentences but no sense at all if you force it into the mold of "the meaning of words."
> And it’s now quite clear where the Wittgenstein’s theories jump in: context is crucial to learn the embeddings as it’s crucial in his theories to attach meaning. In the same way as two words have similar meanings they will have similar representations (small distance in the N-dimensional space) just because they often appear in similar contexts. So “cat” and “dog” will end up having close vectors because they often appear in the same contexts: it’s useful for the model to use for them similar embeddings because it’s the most convenient thing it can do to have better performances in predicting the two words given their contexts.
I am actually fine to say that this approach is useful and convenient - and that we can fairly call measuring probabilities of co-occurrence measuring "context" in some sense.
But "context" for Wittgenstein in his account of meaning was clearly not word or symbol occurrences. It was a much broader view of the way that language fits in with human intentions and behavior and the wide variety of uses for a word. I hate to quote Wikipedia, but from the PI article: "Wittgenstein argues that definitions emerge from what he termed "forms of life", roughly the culture and society in which they are used." https://en.wikipedia.org/wiki/Philosophical_Investigations#M...
I'm in total agreement. It's especially confusing that the author would go through the bother of invoking Wittgenstein when it seems like he meant the exact opposite.
The words "dogs" refers to something like a set whereas "dog" (usually) refers to an individual. Thus they have very different meanings, even if those meanings have an obvious and simple relationship.
On the other hand "cat" and "dog" refer to individuals that are similar in many ways (and interchangeable for some common purposes). That is, these two words have somewhat similar meanings.
A dog and cats aren't data structures, they're a singular animal of a biological family and a multiple animals of another. Data structures belong to a different language game, which is programming jargon.
> A representation of a dog or cat in your mind is a data structure of a kind. Embeddings are about the names, not the named.
The later Wittgenstein's philosophy was about how meaning is language usage. As such, it was not interested in the names, which are arbitrary rules of the language game. In ordinary speech, talk of dog and cats is not about data structures or representations. That sort of talk takes place in a specialized language game like computer or cognitive science. As such, it's an abuse of language to reify the usage in one language game for all language games, as if a word had a universal meaning independent of the game.
The author has seriously misunderstood Wittgenstein's contributions to philosophy of language.
>And it’s now quite clear where the Wittgenstein’s theories jump in: context is crucial to learn the embeddings as it’s crucial in his theories to attach meaning.
Yes, Wittgenstein said context is important for meaning, but that is hardly his unique or even most important contribution to philosophy of language. Wittgenstein's real contribution is in showing that meaning cannot be pinned down like butterflies under glass -- that meaning spontaneously arises in each playthrough of a language-game, and that any effort to find a "canonical", "authoritative" definition is grasping at an illusion.
But word embeddings try to do almost exactly what Wittgenstein says is an illusion -- trying to pin down a canonical n-dimensional vector for each word. To correspond with Wittgenstein's theory, there cannot exist any mapping from a word to a vector. Perhaps each vector can be dynamically changing in a by principle uncomputable way. But to get there we are going to need a lot more advances than the state of the art NLP.
It doesn't mean the approach isn't useful for building systems that we can interact with linguistically, just that we shouldn't kid ourselves into thinking the model has captured meaning.
One interesting concept I read in Wittgenstein was the idea of decomposing a word into its constituent parts. I’ll use the term broom for it because that was the classic example and also the motivation for David Foster Wallace’s novel “Broom of the System.”
So you take “broom” and you could decompose it into “handle” and “bristles”. But then you could decompose it more, by recursively decomposing “handle” into “grains of wood” and “bristles” into “pieces of fiber” (or whatever).
You keep doing this ad infinitum, I guess on down to the summation of a bunch of quarks or whatever.
The question of interest to Wittgenstein was where does this process bottom out. What would it mean, either physically or semantically, to have a word identifying a concept that could not be broken down into further constituent parts.
Wittgenstein was interested in this for the philosophy of language. But I got interested in it by thinking about the decomposition as a mathematical operator,
D(“broom”) = {“handle”, “bristles”}
and then asking what it could mean if this operator D had an “eigenvector” with an “eigenvalue” of 1, so that Dx = x for some non-decomposeable word x.
In some ways, you can see how it could relate to things like word2vec and embedding representations if you could represent a decomposition operator, and define a hierarchical relationship of words as an ordering of how to more or less specifically decompose a word’s representation.
I've always wondered if 'exists' is something like that -- and hence why it can't be a property. (Well, if you believe that Kant guy.)
You can sort of think of all objects -- broom, bristles, quarks, etc -- as being codata that decomposes to some version of "existence existing", an interference pattern of some fundamental object self-interacting.
These older word embedding models (word2vec, GloVe, LexVec, fastText) are being superseded by contextual embeddings ( https://allennlp.org/elmo ) and fine-tuned language models ( https://ai.googleblog.com/2018/11/open-sourcing-bert-state-o... ). These contextual models can infer that "bank" in "I spent two hours at the bank trying to get a loan" is very different from "The ocean bank is where most fish species proliferate."
It's interesting how different this is from 10 years ago, when Chomsky's theories were the basis of all modern NLP, or even 5 years ago, when most NLP used a hybrid of formal grammars + embeddings. I remember attending a tech-talk on part-of-speech tagging in 2011; the state-of-the-art then was a probabilistic shift-reduce parser where the decision to shift vs. reduce at each node was done by a machine-learned classifier.
Wittgenstein emphasized meaning as context and usage before Chomsky, but the actual method was first properly investigated by structural linguists such as JR Firth and Zelig Harris, who was Chomsky's supervisor. Good articles here:
For those interested, I recently wrote a guide on using neural networks for NLP[1].
I wrote the guide with the explicate goal of trying to help the people understand NLP (sentence classification) without the need to understand the math.
I am really struggling to find where Wittgenstein fits into any of this at all.
>And it’s now quite clear where the Wittgenstein’s theories jump in: context is crucial to learn the embeddings as it’s crucial in his theories to attach meaning.
That's not at all clear to me. The crucial part of W's tome is that two sentient beings are knowingly engaging in a game where they have 'agreed' on meanings. My guess from reading Philosophical Investigations is that W would only think NLP were possible in formal settings like law, where all players of the game know the rules quite well, and the program could be trained as if it were a player in that game.
I think the point is that the only way to learn what a word means is to see how it is used. Trying to define a word from some kind of first principles, dictionary-style, is not going to be very effective. The best way for a computer to learn what words mean is to analyze a lot of real-world data.
I would love for a computer to be able to ask questions, or at least surface marginal cases for more training, but that seems to be a very uncommon feature at least in these toy examples.
The issue isn't that this approach won't be useful in building systems we can interact with linguistically. The problem is in describing the system as having learned a meaning.
It might seem pedantic or like something only philosophers of language would care about. But it gets to the core of how we should talk and think about the nature of AI as NLP gets more and more sophisticated.
Well it may not be very satisfying, but Wittgenstein's point is that there isn't anything more to understand about the meaning of words than the ability to use words effectively. http://existentialcomics.com/comic/268
I would think that the tractatus would be more useful to an AI. But Witgebstein's remarkable ability to shift the paradign and over extend into a meta level of analysis seems similar to the way alpha mind and Leela play chess. The tools W uses to understand perception have a more probabilistic and irrational nature then the tools he uses in his previous work. As if he realized that human communication cannot be considered as a closed and finite system, hence I cannot see how his ideas are implemented in these applications, yet.
Yes, and considering the Tractatus as a framework for a closed and finite set of linguistic rules, such as a domain specific language has great applicability.
For example, I flipped open my copy (yes I keep a copy on my desk) and opened to 4.122: rules to indicate internal and external relations between objects. Almost reads like a system requirements document.
Yes, because humans don't operate in a limited, context explicit vocabulary. But your point doesn't destroy the value of Tractatus to be used for another purpose.
>As human beings speaking English it is quite trivial to understand that a “dog” is an “animal” and that is more similar to a “cat” than to a “dolphin” but this task is far from easy to be solved in a systematic way.
Are they? A dog can be trained like a dolphin, unlike a cat. In the context of training, dogs are more similar to dolphins.
Yes, I suspect you had to cherry-pick a dimension in which dog and dolphin are closer than dog and cat. That would defy conventional wisdom justifies that dogs are closer to cats than to dolphins, but that's also modeled by word vector embeddings.
In the metric space, the distance between dog and cat might be lower than dog and dolphin in many dimensions, but higher in this specific one. A general distance function will have to take all of the dimensions into account, not just those cherry-picked. So the conventional wisdom _and_ your personal belief are both accounted for, and in the context of training "dog" and "dolphin" might be more similar.
I still suspect that's not actually true, and I'd be really surprised if a survey of users found dog and dolphin to be closer than dog and cat in _any_ dimension.
The historic picture makes a little more sense (though this is not something a 5yo would understand).
We call these things embeddings because you start with a very high dimensional space (image a space with one dimension per word type, where each word is a unit vector in the appropriate dimension) and then approximate distances between sentences / documents / n-grams in this space using a space with much smaller dimensionality. So we "embed" the high dimensional space in a manifold in the lower dimensional space.
It turns out though that these low dimensional representations satisfy all sorts of properties that we like which is why embeddings are so popular.
A word embedding transforms a word into a series of numbers, with the property that similar words (e.g. "dog" and "canine") produce similar numbers.
You can have embeddings for other things, such as pictures, where you would want the property that e.g. two pictures of dogs produce more similar numbers than a picture of a dog and a picture of a cat.
It is indeed a vector space. You don't really choose a basis, an ML tool like word2vec [1] does. And like most advanced applications of ML, exactly how it works is a mystery.
> The reasons for successful word embedding learning in the word2vec framework are poorly understood. Goldberg and Levy point out that the word2vec objective function causes words that occur in similar contexts to have similar embeddings (as measured by cosine similarity) and note that this is in line with J. R. Firth's distributional hypothesis. However, they note that this explanation is "very hand-wavy" and argue that a more formal explanation would be preferable.
Inaccurate. This is absurd. Epigraphy is the basis of all modern NLP/NLU. Add computational epigraphy, neuroscience, linguistics and cognition. Ref: Word2Vec is based on an approach from Lawrence Berkeley National Lab
""Google silently did something revolutionary on Thursday. It open sourced a tool called word2vec, prepackaged deep-learning software designed to understand the relationships between words with no human guidance. Just input a textual data set and let underlying predictive models get to work learning."
“This is a really, really, really big deal,” said Jeremy Howard, president and chief scientist of data-science competition platform Kaggle. “… It’s going to enable whole new classes of products that have never existed before.” https://gigaom.com/2013/08/16/were-on-the-cusp-of-deep-learn...
Lawrence Berkeley National Lab was working on an approach more detailed than word2vec (in terms of how the vectors are structured) since 2005 after reading the bottom of their patent: http://www.google.com/patents/US7987191 The Berkeley Lab method also seems much more exhaustive by using a fibonacci based distance decay for proximity between words such that vectors contain up to thousands of scored and ranked feature attributes beyond the bag-of-words approach. They also use filters to control context of the output. It was also made part of search/knowledge discovery tech that won the 2008 R&D100 award http://newscenter.lbl.gov/news-releases/2008/07/09/berkeley-... & http://www2.lbl.gov/Science-Articles/Archive/sabl/2005/March...
We might combine these approaches as there seems to be something fairly important happening here in this area. Recommendations and sentiment analysis seem to be driving the bottom lines of companies today including Amazon, Google, Nefflix, Apple et al."
Really we are building on the shoulders of giants (calculus, linear algebra, statistics) but it seems like the modern use of recurrent neural networks crystalized in the 80s with the publication of Parallel Distributed Processing by David Rumelhardt, James L. McClelland, and PDP Research Group ( that included Geoffrey Hinton) which discussed backpropagation and recurrent neural networks even providing a handbook with code samples.
Jeffrey Elman (with others) wrote a successor to the PDP books called Rethinking Innateness: A Connectionist Perspective on Development (1997)
His paper Finding Structure in Time (1990) adapted backpropagation to take time into account, backpropagation through time (BPTT):
>Elman's work was highly significant to our understanding of how languages are acquired and also, once acquired, how sentences are comprehended. Sentences in natural languages are composed of sequences of words that are organized in phrases and hierarchical structures. The Elman network provides an important hypothesis for how neural networks - and, by analogy, the human brain - might be doing the learning and processing of such structures.
>Here we briefly discuss three of the findings from Elman (1990). Elman's work was highly significant to our understanding of how languages are acquired and also, once acquired, how sentences are comprehended. Sentences in natural languages are composed of sequences of words that are organized in phrases and hierarchical structures. The Elman network provides an important hypothesis for how neural networks - and, by analogy, the human brain - might be doing the learning and processing of such structures.
>The concept ‘word’ is actually a complicated one, presenting considerable difficulty to anyone who feels they must decide what is a word and what is not. Consider these examples: ‘linedrive’, ‘flagpole’, ‘carport’, ‘gonna’, ‘wanna’, ‘hafta’, ‘isn’t’ and ‘didn’t’ (often pronounced “dint”). How many words are involved in each case? If more than one word, where are the word boundaries? Life might be easier if we did not have to decide where the boundaries between words actually lie. Yet, we have intuitions that there are points in the stream of speech sounds that correspond to places where something ends and something else begins. One such place might be between ‘fifteen’ and ‘men’ in a sentence like ‘Fifteen men sat down at a long table’, although there is unlikely to be a clear boundary between these words in running speech.
> Elman’s approach to these issues, as previously mentioned, was to break utternances down into a sequence of elements, and present them to an SRN. In his letter-in-word simulation, he actually used a stream of sentences generated from a vocabulary of 15 words. The words were converted into a stream of elements corresponding to the letters that spelled each of the words, with no spaces. Thus, the network was trained on an unbroken stream of letters. After the network had looped repeatedly through a stream of about 5,000 elements, he tested its predictions for the first 50 or so elements of the training sequence.
Schmidhuber developed the LSTMs, LeCun developed CNN, the ideas were refined and processing capabilities developed and Hinton revived these connectionist ideas leading up to Imagenet in 2012
The "Wittgenstein’s theories" that appear here is just that "the meaning of a word is its use in the language". If such a plain concept was all of Wittgenstein’s theories, he would be long forgotten.
For centuries, dictionaries have presented words through one or several explanations as well as quotes and examples. 150 years ago, Émile Littré wrote a wonderful French dictionary that contains 80,000 words and about 300,000 literary quotes. He knew no word has a simple and permanent meaning, and that one needs to know many real world contexts to get a fine view on a word.