The data on bank card’s payments
To test the model, we
used the data of payments by Sberbank’s cards tied to the place of registration
of the device through which the payment was done. For the analysis, we used the
data for payments made within 7 months from July 2020 to December 2021, grouped
by working and non-working days of each month and by two-hour intervals from 0
to 24 hours. Information about the age
and gender of cardholders as well as the payment's purpose was also available.
A cardholder was counted as moved from one place to another if she or he paid
twice during one hour in the different places. Data were collected both for the
number of such persons and for the amounts of money spent in each place. The
territories of cities were covered by the net of hexagons (edge of each was 170
m) with known coordinates (latitude and longitude) of centers. Such hexagons
were grouped (if necessary) into larger geographical objects (city districts
for example). It was these objects that were considered as places between which
the citizens were moving, in equation [2].
These data were used
to calculate the main parameters in (2) -
,
:
(3a)
Here
–number of cardholders who moved from the
place (node) i to the node j,
(3b)
M – As above is the
total number of the nodes under consideration. N – Is the total number of
moving cardholders during the considered time – T:
(3c)
And finally:
(3d)
One can easily show
that for each node:
(3e)
It is convenient to
introduce dimensionless time
(4)
As an example, Fig.1 shows the comparison of ?
distribution for the nodes of two Russian cities: Moscow and Ekaterinburg
(Figure 1).
The overall number of citizens in flows during T
– ? (5) is equal:

Figure
1: Relative fraction (y-line) of nodes i with
– (3d) lying in the corresponding intervals
of
values (x-line).
The probabilities
(5) cannot be shown because of its tremendous
amount. For illustration we present figure 2 the probabilities of citizen’s
movement on the definite distance for the same cities as on Figure 1 (Figure
2).

Figure
2:
The probability of citizen’s movement on the given distance.
Methods of people flows prognosis
In this section we are
going to discuss the possible solutions of the main equation (2) and its
consequences, particularly to the possible variants of people flows
predictions.
Correlation functions of people concentrations: Let's return to the equation (2). It can
be used to determine moment-generating function or solved numerically. We’ll
use both variants below. The results obtained allow to predict the probability
distribution
, for the parameters
,
, determined in the
section 2.1. Usually, it is quite enough to calculate first two moments of a
random variable
: expected value
and correlation
function
(5a)
(5b)
After
standard routine with the usage of (2) one can transform (6a,b) to the form:
(6a)
(6b)
Here
(6c)
By direct summation in
(7a,b) it can be shown that:
(7)
As it should be in
accordance with (1).
From (6) one can find
that for a large N,
has
the zero order of magnitude in (1/N), and
is proportional to
(1/N), so as N tends to infinity,
is finit, and
tends to zero.
Moreover, it can be shown, that not only
, but also the high
order moments of the random variables x_i tends to zero as N??. Thus, in the limit of large N:
(8)
Where
–
are the solutions of (6a)
The continuity equation and its solutions: In the limit of large N it is convenient
to transform equation (2) by expanding the probability
in Taylor’s series in the vicinity of the
point ?=1/N=0 up to the first order over ?. In the zero order over ? one gets
identity. In the linear approximation (2) takes the form of continuity
equation:
(9a)
(9b)
can be treated as probability current. Multiplying the
components J_i by the total number of moving citizens, we get the values of
flows through the node i. The solutions of (9) that correspond to stationary
states are of particular interest. In such states
(10a)
And correspondingly:
(10b)
It is easy to show that total current:
(10c)
We’ll consider the case
. This means that in stationary state summary currents between each of
nodes i=1,2,...,M and their surroundings are absent (or that is the same, the
currents from each node i and to it are equal). In this case one gets from
(10b) the equations for intensities of stationary flows from the nodes
i=1,2,...,M –
=
:
(11)
In accordance with (3e) matrix
is stochastic and consequently
has eigenvector
, corresponding to eigenvalue equal to unity1[12,13]. Stationary
concentrations x_i^s can be defined either as
, or
directly from the equation for
, by means of the algorithm like PageRank [12,13]. One can transform
the equation for
to discrete form, taking as the unit of time in (6a) –
defined by (4):
(12a)
Where:
(12b)
And other parameters are defined by (3). Either matrix
, or
can be treated as indecomposable . It is evident that (12) describes
the random walk over the vertexes of an oriented graph and element
corresponds to the edge connecting vertex i with vertex k. One can
show that for each column of matrix
(12b) the sum of all elements is unity.
Thus
can be treated as the stochastic matrix of the
transition’s probabilities for such random walk. In the limit
, (12a)
takes the form of an eigenvalue problem for matrix
with the eigenvalue equal to
unity [12,13]. The components of the corresponding eigenvector describe the
stationary state achievable in the limit
,
(13)
It can be shown that this stationary state is stable.
The solution of (9) can be found by standard routine [14]:
Here
– an arbitrary function, of M independent
first integrals of the auxiliary system:
(14b)
(14c)
The concrete form of
is defined by the initial
conditions on
(14d)
Let's assume, for example, that the initial distribution is uniform in
some hypercube of the space
:
(15)
and is equal to zero otherwise. One can show, that for
,
exponentially tends to infinity. The region where
is defined by the equations:
(16)
Here
is submatrix of
(14c) with rang
.
One can show that in the limit
the inequalities (15)
can be satisfied only if
. Thus, the probability
distribution
in the limit ??? has the form of sharp – ? –
peak
(17)
The feedback effects: Instead of solving multi-node problem one can use more simple
approximations. For example, the most probable concentrations of citizens in
the nodes, where intra-city flows are crossed can be estimated in two nodes
model: X is the node of interest, and Y all other nodes (XY – model). For
analysis of flows between nodes i and j one can use three nodes model: i is
node X, j – Y, and all other nodes are united into node Z (XYZ – model). For
instance, in Ekaterinburg standard deviation between concentrations
defined in the multi-node and XY models did
not exceed some percent. So, such models are quite appropriate to treat some
simple one – two node tasks when the mutual influence of the processes in
different nodes is not essential. Moreover, XY-model is very convenient to
study feedback effect – an influence of people concentration in the nodes on
the probabilities of their replacements between these nodes. In the XY – model
the equation (9) can be reduced to the equation with the only one variable and
solved analytically. If
– are the concentrations of citizens in the
nodes X and Y (x+y=1) the probability distribution of concentrations x or y is:
, (18a)
where
is the function reversal to:
(18b)
c is a constant,
that does not influence the results,
– the initial probability distribution:
(18c)
Here
is a singular point of (6),
corresponding to stationary state
(18d)
(18e)
The solution (18)
allows to analyse the peculiarities of
for the tasks where transition probabilities
or flow intensities
in (18c) depends upon x (y). The last, results
in dependence of
. In this case the
equation (6) can have several singular points. All of them are the solutions of
transcendental equation:
(19)
Let’s consider uniform
initial distribution (all concentrations x are equivalently possible), so
, where ?(x) is
Heaviside step function. Expanding z(x)
in Taylor’s series in the vicinity of any solution of (19) –
one
finds:
(20a)
Here:
(20b)
And:
(20c)
Thus, if
,, ? is positive and P(x_*,t??,)??, while
the interval for
[
becomes more and more narrow. For
there are no singularities in the
behavior. To check the above-mentioned
features, we realized the numerical solution of (3) for two opposite patterns
of citizen’s behavior:
·
The
avoidance of the places with small people concentrations.
·
The
tendency to visit such places.
Suppose that X is the investigated place and Y
all other places of the city available for visits. For simplicity, let's assume
that in (18)
.
Figure 3 shows the probabilities of XY-model for the two above cases and the
time evolutions of the corresponding probabilities distributions (Figure 3).
Figure
3:
The transitions (X?Y,Y?X) probabilities
and
corresponding probabilities distributions for different patterns of citizens
behavior in the movements: a) the avoidance of the places with small people
concentrations; b) the tendency to visit places with small people
concentrations; a1, b1 – transitions
probabilities; a2, b2 – the graphical determination of singular points –
the solutions of the equation (19) –
a3,
b3 – the results of computer modelling – numerical solutions of (2) for XY –
model with N=1000. The equation was solved recurrently step by step from
initial probability distribution to the time step – ?=S. Time unit for each
step was
defined by (4).
It is worth to mention
that in the case of avoiding places with small people concentrations the
equations (2), (9) have two stable singular points –
(
) and one unstable –
. In the case of
attractiveness of such places the stable point is only one –
. These results match
with the conclusions made above on the base of (20).