Traffic flow digital twin generation for highway scenario based on radar-camera paired fusion – Scientific Reports

[ad_1]

Coordinate systems

In city-scale highway traffic flow sensing applications, a large number of radars and cameras will be deployed at different locations. Since the measurement of target positions by different sensors is usually performed in their own local coordinate systems, these sensors need to be spatially aligned for making the target positions consistent across sensors. A feasible way to alignment of sensors can be realized by choosing a unified coordinate system (UCS)27,28. The most common UCS is the WGS-84 system (World Geodetic System)29, in which the position of the target is uniquely determined by longitude, latitude, and altitude. When all sensor measurements are converted to WGS-84 system coordinates, spatial alignment can be achieved for all targets in the area covered by these sensors. A typical conversion process from sensor local coordinates system to WGS-84 system is illustrated in Fig. 3. It can be seen from Fig. 3 that the transformation from the sensor local Cartesian (LC) coordinate system, i.e., xyz, to WGS-84 system requires the help of intermediate coordinate systems. In this work, the intermediate coordinate systems can be selected as the local east-north-up (ENU) coordinate system, i.e., \(x’ y’ z’\), and Earth-centered Earth-fixed (ECEF) coordinate system, i.e., XYZ. It is worth noting that radar sensor measurements of targets are defined in a local polar (LP) coordinate system. In order to convert a target position measured by a radar sensor to WGS-84 system, it is necessary to first convert the LP coordinate to a LC coordinate. According to Fig. 3, the transformation process can be summarized as follows:

  1. 1.

    Conversion of the LP \((R,\theta )\) to LC xyz with the same origin. This is for radar sensor only.

  2. 2.

    Conversion of the LC xyz to ENU \(x’ y’ z’\) with the same origin and the same z axis.

  3. 3.

    Conversion of the ENU \(x’ y’ z’\) to ECEF XYZ.

  4. 4.

    Conversion of the ECEF XYZ to WGS-84.

Figure 3
figure 3

Coordinate systems for DT generation.

Although the use of WGS-84 can solve the sensor alignment problem in arbitrary scenes, it also brings an increase in computational complexity, i.e., each sensor needs to complete the transformation to WGS-84 before post-processing can be performed. In the data fusion application of radar and camera, the choice of UCS can be based on the deployment location of the radar and camera. When the radar and camera are installed close enough, the effect of earth curvature can be neglected and there is no need to select WGS-84 system as UCS. In fact, three coordinate systems can be chosen as UCS depending on the relative deployment positions of the cameras and radars. The details are shown in Fig. 4. For highway sensing, the radar and camera are deployed in pairs at a certain site with the same location. Therefore, LC is adequate to be used as a UCS for a radar-camera paired site.

Figure 4
figure 4

UCS selection in three sensor deployment cases.

In this work, radar-camera pairs are used for highway scene sensing, hence LC is chosen as UCS for sensor fusion. LC in this work is defined as follows: the direction normal of the sensor is y-axis, pointing to the right of the sensor and perpendicular to the y-axis is x-axis, pointing above of the sensor and perpendicular to both x-axis and y-axis is z-axis. Based on the radar and camera models, the output of radar measurements are defined in a LP coordinate system, which measures the slant distance R and azimuth \(\theta\) of the target, and the output of camera measurements are defined in camera coordinate, which measures the two-dimensional (2D) position of the target. In sensors fusion, the radar and camera outputs need to be transformed from their corresponding measurement coordinate systems to the UCS. In this case, the radar LP to LC transformation is defined as

$$\begin{aligned} \left\{ \begin{array}{l} x^{r} = R\sin (\theta )\\ y^{r} = R\cos (\theta ) \end{array}\right. . \end{aligned}$$

(6)

As shown in Fig. 2, the camera coordinate to LC conversion is defined as

$$\begin{aligned} \left\{ \begin{array}{l} x^{c} = v^{c}\\ y^{c} = w^{c} \end{array}\right. . \end{aligned}$$

(7)

It is worth noting that since both the radar and the camera measure 2D coordinates of the target in the ground plane, the z coordinate in LC is considered as a constant.

Figure 5
figure 5

Sensors deployment schematic for radar-camera calibration.

Adaptive system calibration based on road feature

After determining the UCS, the sensors need to be calibrated before post-processing. The main contents of system calibration is the output data registration of the radar and camera.

In this work, the system calibration of the first case as shown in Fig. 4 is considered, i.e., the radar and the camera are paired in the same location. The schematic diagram of radar and camera normal error is shown in Fig. 5. It can be seen from Fig. 5 that the presence of installation and manufacturing errors make the radar normal and the camera normal not parallel to each other in actual deployment. Due to the error angle \(\beta\) between radar and camera normal, the target positions detected by radar and camera cannot appear in the same position even in the same coordinate system. If the radar LC coordinate system is chosen as the UCS, the target position detected by the camera needs to be compensated for the normal error angle before it can be converted to UCS.

When the radar and camera are installed at the same position, there is a rotational transformation relationship between the radar LC and the camera LC due to the angular error between their normal lines. Meanwhile, when the focal length of the camera is unknown, the target positions measured by the camera and the radar have a scale-transformation relationship. Therefore, the relationship between radar and camera LC coordinates is an affine transformation, and there is no translation in the transformation because the origin of their coordinate system overlaps. Hence, the transformation of camera LC to radar LC is given as

$$\begin{aligned} \begin{aligned} {\textbf{g}}^{r}&=\textbf{E F g}^{c} \\&=\begin{bmatrix} \rho _{x} &{} 0\\ 0 &{} \rho _{y} \end{bmatrix} \begin{bmatrix} \cos \beta &{} -\sin \beta \\ \sin \beta &{}\cos \beta \end{bmatrix} \begin{bmatrix} x^{c}\\ y^{c} \end{bmatrix}, \end{aligned} \end{aligned}$$

(8)

where \({\textbf{E}}\) is the scaling transformation, \(\rho _{x}\) and \(\rho _{y}\) are the scaling factors of the corresponding coordinate axis respectively, \({\textbf{F}}\) is the rotation transformation, \({\textbf{g}}^{r}=[x^{r},y^{r}]\) and \({\textbf{g}}^{c}=[x^{c},y^{c}]\) is the target coordinates in radar LC and camera LC respectively. The affine transformation in Eq. (8) can be solved using some point cloud registration techniques30,31. However, with the help of road features in highway scenario, we can simplify this registration process. In the highway scenario, the intermediate belt is a major straight line feature. If the straight line corresponding to the belt can be localized from the detection results of radar and camera respectively, then the angle between the two straight lines is the angle deviation \(\beta\) between the radar normal and the camera normal.

Hough transform is an effective linear detection technique that can be used to detect highway intermediate belt in the results of radar and camera32,33. The transform maps a line to the Hough parameter space to accumulate the number of points, where the line can be obtained by threshold detection. The line function defined in Hough transform is

$$\begin{aligned} \eta =x \cos (\phi )+y \sin (\phi ) , \end{aligned}$$

(9)

where the coordinate (xy) is used to describe the target position for sensors, while each point \((\eta , \phi )\) in Hough parameter space represents a line in the input 2D position matrix. The score of corresponding point in the parameter space can be measured as

$$\begin{aligned} \begin{aligned} H(\eta , \phi )&=\iint _{L} \delta (x, y) d x d y , \\ \text {with } \delta (x, y)&=\left\{ \begin{array}{l}1, \text{ if } (x, y) \text{ is } \text{ on } L \\ 0, \text{ otherwise } \end{array}\right. , \end{aligned} \end{aligned}$$

(10)

where L denotes that the line satisfies with Eq. (9). After obtaining all the scores of parameter space, lines can be extracted if \(H_{p}(\eta , \phi )\) is greater than a specified threshold, and line position in input matrix is

$$\begin{aligned} \left\{ \begin{array}{llrl} x &{} =\eta , &{} &{} \text{ if } \sin (\phi )=0 \\ y &{} =-\cot (\phi ) x+\frac{\eta }{\sin (\phi )}, &{} &{} \text{ otherwise } \end{array}\right. . \end{aligned}$$

(11)

It is worth noting that the input matrix can be either target positions detected by the radar or an image recorded by the camera. After obtaining the intermediate belt straight line detected by the radar, i.e., \(\vec {l}^{r}\), and the camera, i.e., \(\vec {l}^{c}\), respectively, the angle \(\beta\) between the two straight lines can be calculated as

$$\begin{aligned} \beta =\arccos \left( \frac{\left| \vec {l}^{r} \cdot \vec {l}^{c}\right| }{\left| \vec {l}^{r}\right| \left| \vec {l}^{c}\right| }\right) , \quad \beta \in \left[ 0^{o}, 90^{\circ }\right] , \end{aligned}$$

(12)

and the rotation transformation can be obtained in terms of Eq. (8), then the affine transformation defined in Eq. (8) is simplified as

$$\begin{aligned} \begin{aligned} {\textbf{g}}^{r}&=\textbf{E g}^{c}_{F} \\&=\begin{bmatrix} \rho _{x} &{} 0\\ 0 &{} \rho _{y} \end{bmatrix} \begin{bmatrix} x^{c}_{F}\\ y^{c}_{F} \end{bmatrix}. \end{aligned} \end{aligned}$$

(13)

In this case, the remaining calibration work is to estimate the scaling transformation matrix \({\textbf{E}}\). Some vehicle targets in the highway scenario can be selected as feature points. For instance, the detected positions for three vehicles by radar and camera form two triangles respectively as shown in Fig. 5a. The relationship between these two triangles is scale-transformed, and the scaling transformation can be derived in terms of Eq. (13) as

$$\begin{aligned} \begin{aligned} {\textbf{E}}&= [{\textbf{G}}^{r}({\textbf{G}}^{c}_{F}) ^{*}] [{\textbf{G}}^{c}_{F}({\textbf{G}}^{c}_{F})^{*}]^{-1},\\ \text {with } {\textbf{G}}^{r}&=({\textbf{g}}^{r_{1}},{\textbf{g}}^{r_{2}},…,{\textbf{g}}^{r_{N}})\\ {\textbf{G}}^{c}_{F}&=({\textbf{g}}^{c_{1}}_{F},{\textbf{g}}^{c_{2}}_{F},…,{\textbf{g}}^{c_{N}}_{F}), \end{aligned} \end{aligned}$$

(14)

where \(*\) denotes matrix transposition. It is worth noting that the target number should satisfy \(N\ge 2\) to ensure that \([{\textbf{G}}^{c}_{F}({\textbf{G}}^{c}_{F})^{*}]^{-1}\) exists.

After the scaling transformation \({\textbf{E}}\) and rotation transformation \({\textbf{F}}\) are obtained, the conversion from camera LC coordinates to radar LC coordinates can be realized in terms of Eq. (8).

Figure 6
figure 6

Measurements error distribution for different sensors.

Radar-camera fusion detection and tracking

According to sensor characteristics, radar is more accurate in distance and velocity measurements, while camera is more accurate in angle, height and target class measurements34. The measurement accuracy of radar and camera usually has the distribution as shown in Fig. 6. Based on their respective advantages in target measurement, a novel radar-camera fusion framework is proposed in this section.

After the target positions detected by the radar and camera are converted to the UCS, the targets tracking based on sensors fusion can be realized. Since the goal is to obtain the target trajectory after sensor fusion, the Kalman filter (KF) framework is adopted in this paper for achieving both fusion and subsequent tracking35,36. For radar-camera fusion in traffic applications, the target dynamics and measurements of sensors can be modeled as a system which has the same state equation and multiple measurement equations

$$\begin{aligned} {\textbf{u}}_{k}= & {} {\textbf{A}} {\textbf{u}}_{k-1} + \mathbf {\varepsilon }, \end{aligned}$$

(15)

$$\begin{aligned} {\textbf{g}}_{k}^{r}= & {} {\textbf{C}} {\textbf{u}}_{k} + \mathbf {\zeta }^{r}, \nonumber \\ {\textbf{g}}_{k}^{c}= & {} {\textbf{C}} {\textbf{u}}_{k} + \mathbf {\zeta }^{c}, \end{aligned}$$

(16)

where k is the discrete time, \({\textbf{A}}\) is the state transfer matrix, \({\textbf{u}}_{k}\) is the state vector, i.e., the target position determined by the target motion equation, \({\textbf{g}}_{k}^{r}\) and \({\textbf{g}}_{k}^{c}\) is the measurement vector of radar and camera respectively, \(\mathbf {\varepsilon }\) and \(\mathbf {\zeta }\) are the process noise and measurement noise with covariance matrices \(\mathbf {\aleph }\) and \(\mathbf {\Re }\) respectively. It is worth noting that the state-to-measurement matrix \({\textbf{C}}\) is equal to identity matrix, since both the state space and measurement space are in the UCS. Sensors fusion can be achieved by either state vector fusion or measurement fusion, and the latter one has been shown to provide better performance35,36. In this paper, the measurements of the radar and camera are combined for establishing a target tracking framework which is suitable for traffic scenarios.

Figure 7
figure 7

Fusion Kalman filter framework.

In traffic sensing application, as shown in Fig. 6, the measurement of the radar has a large variance component level in the x-axis, while the measurement of the camera has a large variance component level in the y-axis. After setting the measurement covariance matrix \(\mathbf {\Re }^{r}\) and \(\mathbf {\Re }^{c}\) for radar and camera according to the measurement error distribution characteristics, sensor information fusion is achieved in two parts:

  1. 1.

    The target measurement positions fusion of radar and camera, which is computed as

    $$\begin{aligned} \bar{{\textbf{g}}}_{k}={\textbf{g}}_{k}^{r}+\mathbf {\Re }^{r}\left( \mathbf {\Re }^{r}+\mathbf {\Re }^{c}\right) ^{-1}\left( {\textbf{g}}_{k}^{c}-{\textbf{g}}_{k}^{r}\right) . \end{aligned}$$

    (17)

  2. 2.

    The fusion of radar and camera measurement errors, which is computed as

    $$\begin{aligned} \bar{\mathbf {\Re }}=\left[ \left( \mathbf {\Re }^{r}\right) ^{-1}+\left( \mathbf {\Re }^{c}\right) ^{-1}\right] ^{-1}. \end{aligned}$$

    (18)

With the fusion results of measurement \(\bar{{\textbf{g}}}_{k}\) and measurement covariance \(\bar{\mathbf {\Re }}\), denote \({\textbf{I}}\) as the identity matrix, the implementation flow of the fusion tracking approach is shown in Fig. 7.

DT generation approach for highway scenario

In a nutshell, the approach of DT model generation is shown in Fig. 8. For highway scenario, radar-camera pairs can be deployed at multiple sites along the road, and each pair of radar and camera is responsible for traffic flow sensing in a local area. For a single site, the radar and the camera acquire their respective sensory data. The radar obtains 2D position and velocity information of the target after signal processing. The camera obtains the 3D position information of the target by image processing followed by a image plane to UCS conversion. In UCS, the detection data of the two sensors are fused and the trajectory of the target is obtained by the fusion Kalman filter as shown in Fig. 7, which completes the generation of local traffic flow DT.

When the traffic flow data of all sites are obtained, the traffic flow information of each site is converted to WGS-84 system by coordinate conversion, and the DT model of the whole highway scenario can be generated.

Figure 8
figure 8

End to end DT generation approach.

Here are some factors to consider in DT model generation:

  1. 1.

    Besides detecting the 2D position of the target, the radar can be used to measure the target velocity more accurately. In practice, the radar velocity information can be output as needed.

  2. 2.

    The camera provides a more accurate measurement of the width, height, and class of the target, besides detecting the 2D position of the target. These information can be output in practice as attached attributes based on demand.

  3. 3.

    After DT model generation, target locations in the DT model need to be exported in practical applications. Similar to the coordinate system considered for sensor fusion, the model output also requires the selection of coordinate system according to practical applications. The presentation of the DT model in WGS-84 coordinate system is not required. It depends on whether a large-scale scene needs to be modeled and whether that DT model needs to be fused with the map system. When fusion with the map system is required, traffic target information needs to be transformed to WGS-84. When fusion with the map system is not required, for single-site models, the target information can be output directly in the LC. For small-scale models, such as several intersections, transformation to ENU coordinate system is sufficient, and for large-scale models, such as city level and above, transformation to ECEF is sufficient.

[ad_2]

Source link

Leave a Comment