SPL assists MongoDB: Find multiple latest by filter criteria
The data for a collection (named category_time) in the MongoDB database is as follows:
[
{
"_id": 1,
"category": "FIRE",
"time": "2024-05-11T07:11:00Z"
},
{
"_id": 2,
"category": "FIRE",
"time": "2024-05-11T08:11:00Z"
},
{
"_id": 3,
"category": "FIRE",
"time": "2024-05-11T09:11:00Z"
},
{
"_id": 4,
"category": "POLICE",
"time": "2024-05-11T07:22:00Z"
},
{
"_id": 5,
"category": "POLICE",
"time": "2024-05-11T08:22:00Z"
},
{
"_id": 6,
"category": "POLICE",
"time": "2024-05-11T09:22:00Z"
},
{
"_id": 7,
"category": "AMBULANCE",
"time": "2024-05-11T07:33:00Z"
},
{
"_id": 8,
"category": "AMBULANCE",
"time": "2024-05-11T08:33:00Z"
},
{
"_id": 9,
"category": "AMBULANCE",
"time": "2024-05-11T09:33:00Z"
}
]
Now input two collect parameters, corresponding to the category collect and time collect, such as [“FIRE”, “AMBULANCE”] and [“2024-05-11T08:15:00Z”, “2024-05-11T09:00:00Z”].
Please combine the members of these two collects into multiple sets of parameters, that is, cross product. In this example, there are four sets, each consisting of a category and a time.
Then use 4 sets of parameters to traverse and search the document, each time finding the record with category equal to the parameter and time earlier than the parameter but closest.
[
{
"category": "FIRE",
"time": "2024-05-11T08:15:00Z",
"last_entry_on_or_before": {
"_id": 2,
"category": "FIRE",
"time": "2024-05-11T08:11:00Z"
}
},
{
"category": "FIRE",
"time": "2024-05-11T09:00:00Z",
"last_entry_on_or_before": {
"_id": 2,
"category": "FIRE",
"time": "2024-05-11T08:11:00Z"
}
},
{
"category": "AMBULANCE",
"time": "2024-05-11T08:15:00Z",
"last_entry_on_or_before": {
"_id": 7,
"category": "AMBULANCE",
"time": "2024-05-11T07:33:00Z"
}
},
{
"category": "AMBULANCE",
"time": "2024-05-11T09:00:00Z",
"last_entry_on_or_before": {
"_id": 8,
"category": "AMBULANCE",
"time": "2024-05-11T08:33:00Z"
}
}
]
The difficulty of this question lies in querying each set of parameters once and selecting the record with the closest time from each query result. I have tried many methods using MongoDB query, but the results are not correct. One nearly-correct way is as follows:
db.category_time.aggregate([
{
"$match": {
"_id": {
"$exists": false
}
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
"$documents": [
{
"category": //your input category array here
["FIRE",
"AMBULANCE"
],
//your input time array here
"time": [
"2024-05-11T08:15:00Z",
"2024-05-11T09:00:00Z"
]
}
]
},
{
"$unwind": "$category"
},
{
"$unwind": "$time"
}
]
}
},
{
"$lookup": {
"from": "collection",
"localField": "category",
"foreignField": "category",
"let": {
ts: "$time"
},
"pipeline": [
{
"$match": {
$expr: {
$lte: [
"$time",
"$$ts"
]
}
}
},
{
"$sort": {
"time": -1
}
},
{
"$limit": 1
}
],
"as": "last_entry_on_or_before"
}
},
{
"$unwind": "$last_entry_on_or_before"
}
])
SPL provides the maxp function, which makes it easy to find the closest record in records with time less than the parameter:
A |
|
1 |
=mongo_open@d("mongodb://127.0.0.1:27017/local") |
2 |
=mongo_shell@d(A1, "{'find':'category_time'}") |
3 |
=mongo_close(A1) |
4 |
["FIRE", "AMBULANCE"] |
5 |
["2024-05-11T08:15:00Z", "2024-05-11T09:00:00Z"] |
6 |
>A5.run(~=datetime(~,"yyyy-MM-dd'T'HH🇲🇲ss'Z'")) |
7 |
=A4.conj(A5.new(~:time,A4.~:category)).sort(category,time) |
8 |
>A2.run('time'=datetime('time',"yyyy-MM-dd'T'HH🇲🇲ss'Z'")) |
9 |
=A7.new(category,time,A2.select(category==A7.category && time<=A7.time).maxp(time):last_entry_on_or_before) |
10 |
=json(A9) |
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL
中文版:https://c.raqsoft.com.cn/article/1741228571163