## Similarity and sameness

The mathematical model of signed sequences with repetitions (texts) is a multiset. The multiset was defined by D. Knuth in 1969 and later studied in detail by A. B. Petrovsky [1]. The universal property of a multiset is the existence of identical elements. The limiting case of a multiset with unit multiplicities of elements is a set. A set with unit multiplicities corresponding to a multiset is called its generating set or domain. A set with zero multiplicity is an empty set.

The problem is determining whether the elements are the same. The similarity depends on the properties of these elements that are taken into account. Cucumbers and watermelons are similar in color externally, but it is difficult to call them the same in gastronomic use, although the botanical description is largely the same.

According to G. Frege, any object that has relations with other objects and their combinations has as many properties (values) as these relations. The part of the values taken into account is called the meaning that the object is represented in this situation. The name of an object by a number, symbol, word, picture, sound, gesture for its short description is called an object sign (this is one of the values).

All possible parts of the object’s values (meaning) correspond to a single sign. This is the main problem of recognizing meaning, but at the same time the basis for making do with minimal sets of characters. It is not possible to assign a unique sign to each subset of values. The objects of information exchange are the minimum sets of characters (notes, alphabet, language dictionary). The meaning of signs is usually not calculated, but determined by the sign contexts (neighborhoods) intuitively.

The solution to the problem of ambiguity of signs is the semantic markup of the text. The semantic markup can be explained by the example of extreme unambiguity. On Russian accounts, the text is a sequence of identical characters (knuckles). According to [2], the dictionary of such a text consists of one word. It is impossible to use such texts without semantic markup. Therefore, the dictionary changes, and the characters are divided into groups – units, tens, hundreds, etc. These group names (numbers) are unique word numbers. The dictionary ** D** is the numbers from 0 to nine. Each knuckle is represented by a matrix unit on such a Cartesian abacus. For example, the number 2021 on a matrix abacus is represented by the sum of four matrix units:

where the subscripts are the Cartesian coordinates of the matrix word (numbers in this case). There was a transformation of identical objects into similar ones. The measure of similarity is the values of the coordinates of the words. In addition to positional numbers, repetitions of numbers from the dictionary occur when performing arithmetic operations. Equivalence relations are established:

If, after an arithmetic operation, the number *9 + 1*, is obtained, then 0 appears in this position, and 1 is added to the next digit. On the abacus, all the knuckles are shifted to their original (zero) position, and one is added to the next digit (wire). On the matrix abacus, the transformation is performed:

If you set a measure of the similarity of signs, then the tolerance (similarity) ratio can again be turned into an equivalence (sameness) ratio for this measure. For example, by rounding numbers. The difference between tolerance and equivalence can be recognized by the violation of transitivity. For a relationship of tolerance, it can be violated. For example, let the element A be similar to B in one sense. If the meaning of B does not coincide with the meaning of the element C, then A can be similar to C only in terms of the intersection of their meanings (part of the properties). The transitivity of the relationship is restored (closed), but only for this general part of the meaning. After the sameness achieved by specifying the meaning, A will be equivalent to C. For example, the above transformation (closure) on some coordinates ensures the execution of arithmetic operations on the matrix abacus.

Another example of the contextual dependence of signs is chess. It is even stronger in double chess [3]. In this modification of chess, it is allowed to make a finite number of double moves during the game at any given time. The game remains consistent. The rest of the rules are the same as in normal chess, with the exception of two: the first move is a single move and castling is allowed during the check. The author of the game in the case when all the moves are double is prof. Zaitsev G. A.

For chess, the dictionary of their matrix text is the numbers of one of the pieces of each color and the move separator (from 1 to 11). A word in a chess text is a matrix unit. The first coordinate of it is unique and is the number of the cell on the chessboard (from 1 to 64). The second coordinate of the word is from the dictionary. The chess matrix text at any point in the game is the sum of the matrix units, each of which shows a piece at the corresponding place on the chessboard.Repetitions in the text appear both because of the duplication of figures, and because of the constant transitions during the game from similarity to sameness and vice versa for all figures except the king. The game consists in the implementation of the most effective such transitions and the actual classification of shapes. Pawns that are the same at first then become similar only by the rule of the move, and sometimes the pawn becomes the same as the queen.

A tool for analyzing matrix texts is the transitivity control to check the difference between similarity and sameness. The lack of transitivity control is an algebraic explication of a misunderstanding for language texts, a loss in chess, or errors in numerical calculations.

Transitivity of relations is a condition for turning a set of objects into a mathematical category. The semantic markup of the text can be the calculation of its categories by means of transitive closure. The objects of the category are the contexts of matrix words [2], morphisms are the transformation matrices of these contexts.

## Context

The context of the word *E _{k,j}* of the matrix text [2] is its fragment

*F*– he sum of matrix units (words) between two matrix words-repetitions

^{j}_{i,k}*E*and

_{i,j}*E*

_{k,j:}where the index D R means that any index from the right dictionary D R of the matrix text [2] can stand in this place, including the characters of the text-forming fragments. The context is all the words of the matrix text between the repeated characters of the dictionary D R . For example, between repeated words, repeated dots, signs of paragraphs, chapters, volumes of language texts or phrases, periods, and parts of musical works.

The signs of text-forming fragments look the same, but they are also homonymous signs-their context is fragments (1). The context of a language fragment (explication or explanation) can be not only a language text, but also a sound (for example, music), figurative (photo) or joint (video). The context of a musical text can be a language text (for example, a libretto).

Matrix words correspond to their matrix contexts, represented as algebraic objects (1). All possible relations between these objects are the subject of analysis when determining the meaning of words. For the study of such constructions, category theory is useful because it is based on the concept of transitivity.

## Context category

Let *F _{1}^{j} , ..., F_{n}^{j}* – these are all contexts

*F*words

^{j}_{i,k}*E*in text

_{j,j}∈ D_{R}*P*, while

*D*, ...,

^{j}_{1R}*D*– right dictionaries of these contexts:

^{j}_{nR }By *k = i + 1* in (1) a special case of a fragment is a matrix word *E _{i+1,DR}* .

Context category **Cat***(E _{j,j})* text sign

*E*defined as follows:

_{j,j}∈ D_{R}Category objects – pairwise multiple [2] contexts

*F*._{1}^{j}, ..., F_{n}^{j}For each pair of multiple objects, there is [2] a set of morphisms

*F*,, each morphism corresponds to the singular_{ij}: F_{i}= F_{ij}F_{j}*F*and_{i}*F*._{j}For a pair of morphisms

*F*and_{ij}*F*such a composition of them is defined (the product of square matrices)_{jk}*F*, that if_{ij}F_{jk}*F*=_{i}*F*и_{ij}F_{j}*F*=_{j}*F*, then_{jk}F_{k}*F*(transitivity condition)._{i}= F_{ij}F_{jk}F_{k}For each object

*F*the identity morphism is defined as the unit matrix_{i}*E: F*. The category associativity follows from the associativity of matrix multiplication._{i}= EF_{i}E

## Context reduction

The intersection (in general words) of matrix dictionaries is their product:

The proof follows from the defining property of matrix units (6) [2] and the definition of dictionaries (9) [2] and (15) [2]. When multiplying the matrix units of dictionaries (the subscripts are the same in each unit), the product of their matrix words (units) with different indexes is zero. In the product (2), only common words with matching lower indices from all the factors (2) will remain.

The union of any pair of dictionaries *D _{i}* and

*D*is their sum minus the intersection (2):

_{j}Because of the properties (10) [2] in (3) in the sum *D _{i} + D_{j}* removed repetitions of matrix units.

The minimal dictionary of a matrix text fragment is called such a dictionary *D _{R}* text

*P*, that

*D*and

_{R}*P*mutually multiple:

For mutually multiples of *P* and *D _{R}* non-zero matrices

*F*and

_{PDR}*F*exist.

_{DRP}Sums of matrix units *F _{PDR}* and

*F*exist if the matrix units are

_{DRP}*P*and

*D*they contain the same number of second indexes (coordinates) and do not contain any other second indexes.

_{R}The concept of a minimal dictionary is introduced due to the fact that the properties of matrix units always hold:

where *D _{1R}* it can consist of words (matrix units) that are missing (those very others) in

*D*. For example, for

_{R}*F*always running:

_{1}^{j}= F_{1}^{j}D_{1R}, ..., F_{n}^{j}= F_{n}^{j}D_{nR}Minimum dictionaries *D _{minR1} , ..., D_{minRn}* fragments

*F*do not contain matrix words (second indexes of matrix units) that are not present in the corresponding text fragment.

_{1}^{j}, ..., F_{n}^{j}Context equivalence classes are defined by common minimal right-hand dictionaries *D _{minR}*. If a pair of contexts has a minimal common dictionary, then these contexts are mutually multiple. Hence, there are their mutual transformations (matrices).

If the contexts are *F _{1}^{j} , ..., F_{n}^{j}* Words signs

*E*have a minimal common right dictionary

_{j,j}*D*, then they are multiples of each other. In the future, the dictionaries of text fragments mean their minimal dictionaries.

_{R}If the specified contexts are *F _{1}^{j} , ..., F_{n}^{j}* multiply on the right by such a dictionary

*D*, that each resulting context will have the right dictionary (minimal)

^{j}_{R}*D*, then they are called reduced contexts:

^{j}_{R}When reducing (multiplying on the right) the part of the matrix units with the second indices, which are not in the *D ^{j}_{R}* deleted in each of the

*F*. If at least one of the dictionary indexes is missing in some of the received fragments, then it should not fall into (4).

_{1}^{j}, ..., F_{n}^{j}## Categorization

Contexts with common dictionaries, for example, after the reduction (4) of the sign word *E _{j,j}*, are objects of the sign category

**Cat**(

*E*). All matrix texts (4) by construction are multiples of each other by (20) [2], have a common (and minimal) dictionary, therefore, there are always transformation matrices

_{j,j}*F*as morphisms of the sign category

^{j}_{1,k}**Cat**(

*E*):

_{j,j}Relations (5) are the smallest transitive relations on the set *F _{1}^{j} , ..., F_{n}^{j}* and are the transitive closure of this set due to the fact that from the contexts

*F*operation (4) removes all matrix words that are not present in the general dictionary

_{1}^{j}, ..., F_{n}^{j}*D*.

^{j}_{R}The remaining categorical axioms are fulfilled due to the properties of square matrices of the same dimension.

The transitive closure (5) can be defined for any subset *(m < n)*

setting for *F _{1}^{j} , ..., F_{m}^{j}* by (2) their general vocabulary

*D*(

^{j}_{mR}⊇ D^{j}_{R}*D*is a subset of

^{j}_{R}*D*by properties (2)). In this case, the transitive closure (5) is performed by the dictionary

^{j}_{mR}*D*:

^{j}_{mR}## Example

As an example of a matrix text, (5) [2] is used, in which there are four identical signs of the word «set» *E _{1,1}, E_{5,1}, E_{10,1}, E_{14,1}*. These four signs, in turn, have four contexts

*F*:

^{1}_{1,5}, F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}where *D ^{1}_{1} , D^{1}_{2} , D^{1}_{3} , D^{1}_{4}* – these are dictionaries of the corresponding contexts, in the latter context

*F*the second index is not equal to the number of the last repetition of the sign that is missing in the text dictionary, but to the number of the last word in the text in order to determine the end of the context.

^{1}_{14,17}The problem statement is the calculation of the similarity and difference of words *E _{1,1}, E_{5,1}, E_{10,1}, E_{14,1}* depending on the similarity and difference in some measure (modulus) of their contexts

*F*. The similarity of contexts is determined by the presence of common dictionaries, which are used as a module for comparing contexts. The difference is determined by the context deductions for the same module. Deductions will define their equivalence classes (deduction classes) and deduction categories, since transitivity closure can also occur for them.

^{1}_{1,5}, F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}A general dictionary of four contexts *F ^{1}_{1,5} , F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}* according to (2):

Transitive closure (4) on the general dictionary-module leads to the removal of "extra" words:

Thus, reduced (abbreviated) contexts of the sign-word *E _{1,1}* («set») are four words

*E*and

_{3,3}, E_{6,3}, E_{11,3}*E*. These words have the same sign

_{15,3}*E*(«object») in the combined software (3) dictionary for

_{3,3}*D*:

^{1}_{1}, D^{1}_{2}, D^{1}_{3}, D^{1}_{4}where each formula is a sequentially pairwise union of dictionaries (3).

Words *E _{1,1}, E_{5,1}, E_{10,1}, E_{14,1}* in the sense of their reduced (reduced) contexts

*E*and

_{3,3}, E_{6,3}, E_{11,3}*E*they can be the same or different. Setting the comparison measure

_{15,3}*E*and

_{3,3}, E_{6,3}, E_{11,3}*E*defines the result of the comparison

_{15,3}*E*. In the simplest case, if the values are assumed to be the same

_{1,1}, E_{5,1}, E_{10,1}, E_{14,1}*E*and

_{3,3}, E_{6,3}, E_{11,3}*E*, then they will be the same and

_{15,3}*E*. This is the case, for example, when words are understood only as signs-letters in the dictionary-alphabet, and their context dependence is absent.

_{1,1}, E_{5,1}, E_{10,1}, E_{14,1}To solve the problem of comparing the meaning of words, it is useful to calculate the corresponding category of signs of these words. Sign **Cat**(*E _{3,3}*) consists of four reduced context objects (10).

Morphisms **Cat**(*E _{1,1}*) are the four matrices

*E*и

_{6,3}, E_{11,6}, E_{11,3}*E*:

_{15,3}The composition of morphisms is the relation:

The composition (13) is an expression of the interval markup of the word *E _{3,3}* (45) [2] in the language of category theory, and reduction (10) - is an example of solving a system of comparisons modulo

*F*(39) [2]. The usefulness of using category theory is that its approach is more general and allows you to use methods from different sections of algebra.

_{m}So all four pieces of text are *F ^{1}_{1,5} , F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}* are the same (equivalent) in the sense of the sign-word

*E*(comparable in modulus

_{3,3}*E*). There are matrix-morphisms

_{3,3}*E*, converting these texts according to (12) into each other. By analogy with the library catalog, all four texts are

_{15,11 }, E_{11,6}, E_{6,3}, E_{15,3}*F*(objects of the sign category

^{1}_{1,5}, F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}**Cat**(

*E*)) they are in the same catalog box with the name of the sign

_{3,3}*E*. This is an example of a rough classification of texts by keywords. The contextual meaning of words is not taken into account, all such words as signs are the same, and all cases of their appearance in the text can be added to calculate the significance of keywords by frequency of use.

_{3,3}The resulting result means that, in the first approximation, all four words «set» are contextually related to the word «object». The words «set» *E _{1,1}, E_{5,1}, E_{10,1}, E_{14,1}* can be the same or differ as much as their reduced (reduced) contexts are the same or different

*E*и

_{3,3}, E_{6,3}, E_{11,3}*E*.

_{15,3}In [2] it was shown that modulo comparisons are performed for matrix texts. The remainder of the division of fragments of matrix texts into other fragments (modules) can have residues (deductions), which, like modules, are classifying features.

A sign of the divisibility (multiplicity ⋮ ) of fragments of matrix texts is the divisibility (multiplicity) of their right dictionaries (20) [2]. The remainder of the division of dictionaries (subtractions of dictionaries) of fragments are the dictionaries of the remainder of the division of these fragments.

To calculate the similarities and differences of words *E _{3,3}, E_{6,3}, E_{11,3} *and

*E*you need to compare the contexts

_{15,3}*F*by module

^{1}_{1,5}, F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}*E*.

_{3,3}Then the deductions of each context modulo *E _{3,3}* equal to:

It follows from (14) that all *F ^{1}_{1,5} , F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}* (hence, the words «set»

*E*) incomparable in modulus

_{1,1}, E_{5,1}, E_{10,1}, E_{14,1}*E*. The deductions are not pairwise multiples and do not form any class of deductions pairwise. This means that all the words

_{3,3}*E*they are different in meaning (context).

_{1,1}, E_{5,1}, E_{10,1}, E_{14,1}The similarity is found in the next step (for deductions), if for pairs of deductions we calculate by (2) the general dictionaries and reduce (4). The general dictionary for all deductions *D ^{j}_{res}* does not exist:

Equality (15) is the reason for the absence of a general class of deductions and a corresponding category **Cat**_{res}(*E _{3,3}*). But some pairs of deductions (14) have common dictionaries:

Then these pairs of deductions after reducing (4) form classes and categories of deductions with names *E _{2,2}*,

*E*and

_{4,4}*E*. To a folder named

_{7,7}*E*fragments will get there

_{2,2}*F*and

^{1}_{1}*F*, in directory with the name

^{1}_{2}*E*- fragments

_{4,4}*F*and

^{1}_{1}*F*, to a folder named

^{1}_{3}*E*– fragments

_{7,7}*F*and

^{1}_{2}*F*.

^{1}_{4}Word *E _{8,8}* it is an annuler (zero divisor) of three deductions (14)

Word *E _{12,12}* – annuler

Word *E _{16,16}* – annuler

These are words of the matrix text that have no context (the last three terms in the context dictionary (49) [2]) – when multiplying a deduction by an annuler, the product is different from zero if the deduction contains this annuler.

So, the problem statement of the given example was the calculation of the similarity and difference of words *E _{1,1}, E_{5,1}, E_{10,1}, E_{14,1}* depending on the similarity and difference of their contexts

*F*by some measure (modulus).

^{1}_{1,5}, F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}Solution received: words *E _{1,1}, E_{5,1}, E_{10,1}, E_{14,1}* (as their contexts are

*F*) comparable in modulus

^{1}_{1,5}, F^{1}_{5,10}, F^{1}_{10,14}, F^{1}_{14,17}*E*and are not comparable (different) in modules

_{3,3}*E*,

_{8,8}*E*,

_{12,12}*E*.

_{16,16}This means that the reduction (10) should not be performed according to the general dictionary (9), which consists of a single sign word *E _{3,3}*. As it turned out, this word-sign has a different meaning in different places of the text. Taking into account (16), (17), (18):

To the right dictionary *D _{R}* (9) [2] text (5) [2] then the extension is required:

The source dictionary (9) [2] has been converted to the context dictionary (20). To the word signs *E _{3,3}*,

*E*,

_{6,3}*E*and

_{11,3}*E*added additional words using category calculation

_{15,3}*E*,

_{8,8}*E*,

_{12,12}*E*. With these additional words

_{16,16}*E*,

_{8,8}*E*,

_{12,12}*E*words

_{16,16}*E*,

_{6,3}*E*and

_{11,3}*E*they differ from each other.

_{15,3}The above classification is a categorization of matrix texts by dictionary. When categorizing, classes and their names are calculated as algebraic functions of the text. The categorization was calculated by dictionaries, since the classifying features (category names) were determined by the mutual intersection of dictionaries (2). This categorization does not take into account the order of words in the text, but can be used later in the construction of a more subtle categorization that takes into account the mutual order of words. In this case, the comparison modules are not parts of dictionaries, but fragments of contexts. When replacing dictionary fragments with text fragments, word repetitions may appear in contexts. There is ambiguity in the division (construction of morphisms of the category) [2]. That is why, first, a comparison is made modulo dictionaries, and similarities and differences (divisors and residuals) are determined by this measure. Then, after establishing the similarity and difference of the repeated words in the contexts, the dictionary comparison module is replaced with a text fragment that already takes into account the word order. The category names are the text fragments.

The general method of calculating classifying features gives an analog of CRT for matrix texts.

## Chinese Remainder Theorem (CRT)

The Chinese remainder theorem for matrix texts is formulated as follows. Let be given:

*D*pairwise non-multiple minimal dictionaries of matrix text fragments_{1R}, ..., D_{kR}*F*._{1}, ..., F_{k}*D*– right dictionary of some text_{R}= D_{1R}+ ... + D_{kR}*P*.*D'*– right dictionary of some text_{R}= D'_{1R}+. . . + D'_{mR}*P', m < k*.*P' ⊂ P : D'*(text_{R}⊂ D_{R}*P'*is a part of*P*in the sense that its dictionary*D'*it is part of the dictionary_{R}*D*)_{R}Tuple (

*r*), where_{1}, ... , r_{k}*r*(this means that_{1}≡ P' ( mod D'_{1R}), ..., r_{k}≡ P' ( mod D'_{kR})*P' = P' D'*)._{1R}+r_{1}, ..., P'= P'D'_{1R}+r_{k}

Then there is a one-to-one correspondence:

It is proved by induction using the definition of the multiplicity of the polynomials of matrix units and the minimality of the dictionary.

Deduction tuple *(r _{1} , ..., r_{k} )* it is a classifying feature of all possible multiples of each other texts that have a dictionary

*D'*or any part of it. It is according to (21) that classifiers of language and other sign sequences should be constructed.

_{R}## References

A.B. Petrovsky. Theory of countable sets and multisets. M. Nauka, 2018.

S. B. Pshenichnikov. Algebra of text. Researchgate Preprint, 2021.

S. B. Pshenichnikov. Computer game "Double chess". certificate of state registration of the computer program. 4.12.1992 No 920129.