SPL assists MongoDB: Find multiple latest by filter criteria

The data for a collection (named category_time) in the MongoDB database is as follows:

[
  {
    "_id": 1,
    "category": "FIRE",
    "time": "2024-05-11T07:11:00Z"
  },
  {
    "_id": 2,
    "category": "FIRE",
    "time": "2024-05-11T08:11:00Z"
  },
  {
    "_id": 3,
    "category": "FIRE",
    "time": "2024-05-11T09:11:00Z"
  },
  {
    "_id": 4,
    "category": "POLICE",
    "time": "2024-05-11T07:22:00Z"
  },
  {
    "_id": 5,
    "category": "POLICE",
    "time": "2024-05-11T08:22:00Z"
  },
  {
    "_id": 6,
    "category": "POLICE",
    "time": "2024-05-11T09:22:00Z"
  },
  {
    "_id": 7,
    "category": "AMBULANCE",
    "time": "2024-05-11T07:33:00Z"
  },
  {
    "_id": 8,
    "category": "AMBULANCE",
    "time": "2024-05-11T08:33:00Z"
  },
  {
    "_id": 9,
    "category": "AMBULANCE",
    "time": "2024-05-11T09:33:00Z"
  }
]

Now input two collect parameters, corresponding to the category collect and time collect, such as [“FIRE”, “AMBULANCE”] and [“2024-05-11T08:15:00Z”, “2024-05-11T09:00:00Z”].
Please combine the members of these two collects into multiple sets of parameters, that is, cross product. In this example, there are four sets, each consisting of a category and a time.
Then use 4 sets of parameters to traverse and search the document, each time finding the record with category equal to the parameter and time earlier than the parameter but closest.

[
  {
    "category": "FIRE",
    "time": "2024-05-11T08:15:00Z",
    "last_entry_on_or_before": {
      "_id": 2,
      "category": "FIRE",
      "time": "2024-05-11T08:11:00Z"
    }
  },
  {
    "category": "FIRE",
    "time": "2024-05-11T09:00:00Z",
    "last_entry_on_or_before": {
      "_id": 2,
      "category": "FIRE",
      "time": "2024-05-11T08:11:00Z"
    }
  },
  {
    "category": "AMBULANCE",
    "time": "2024-05-11T08:15:00Z",
    "last_entry_on_or_before": {
      "_id": 7,
      "category": "AMBULANCE",
      "time": "2024-05-11T07:33:00Z"
    }
  },
  {
    "category": "AMBULANCE",
    "time": "2024-05-11T09:00:00Z",
    "last_entry_on_or_before": {
      "_id": 8,
      "category": "AMBULANCE",
      "time": "2024-05-11T08:33:00Z"
    }
  }
]

The difficulty of this question lies in querying each set of parameters once and selecting the record with the closest time from each query result. I have tried many methods using MongoDB query, but the results are not correct. One nearly-correct way is as follows:

db.category_time.aggregate([
  {
    "$match": {
      "_id": {
        "$exists": false
      }
    }
  },
  {
    "$unionWith": {
      "coll": "collection",
      "pipeline": [
        {
          "$documents": [
            {
              "category": //your input category array here
              ["FIRE",
              "AMBULANCE"
            ],
            //your input time array here
            "time": [
              "2024-05-11T08:15:00Z",
              "2024-05-11T09:00:00Z"
            ]
          }
        ]
      },
      {
        "$unwind": "$category"
      },
      {
        "$unwind": "$time"
      }
    ]
  }
},
{
  "$lookup": {
    "from": "collection",
    "localField": "category",
    "foreignField": "category",
    "let": {
      ts: "$time"
    },
    "pipeline": [
      {
        "$match": {
          $expr: {
            $lte: [
              "$time",
              "$$ts"
            ]
          }
        }
      },
      {
        "$sort": {
          "time": -1
        }
      },
      {
        "$limit": 1
      }
    ],
    "as": "last_entry_on_or_before"
  }
},
{
  "$unwind": "$last_entry_on_or_before"
}
])

SPL provides the maxp function, which makes it easy to find the closest record in records with time less than the parameter:



A

1

=mongo_open@d("mongodb://127.0.0.1:27017/local")

2

=mongo_shell@d(A1, "{'find':'category_time'}")

3

=mongo_close(A1)

4

["FIRE", "AMBULANCE"]

5

["2024-05-11T08:15:00Z", "2024-05-11T09:00:00Z"]

6

>A5.run(~=datetime(~,"yyyy-MM-dd'T'HH🇲🇲ss'Z'"))

7

=A4.conj(A5.new(~:time,A4.~:category)).sort(category,time)

8

>A2.run('time'=datetime('time',"yyyy-MM-dd'T'HH🇲🇲ss'Z'"))

9

=A7.new(category,time,A2.select(category==A7.category && time<=A7.time).maxp(time):last_entry_on_or_before)

10

=json(A9)


Question source:https://stackoverflow.com/questions/78529669/mongodb-find-multiple-latest-by-filter-criteria