The Content Search endpoint allows you to quickly and easily search for content from any of your enabled content sources.

This allows you to

  • Search for content by query string
  • Retrieve content published within a given date range
  • Search for content by publisher or content type
  • Search for content mentioning a specific security, keyword, or topic

📘

Prerequisites

Before calling the content API's ensure that

Examples

Date Range for filtering content or scrolling through content

To query the content API for a specific range it as simple as making a call to the endpoint, including a fromdate and todate date strings in the format YYYY-mm-ddTHH:MM:SS as demonstrated here

 {
    "maxresults": 1,
    "fromdate": "2023-08-12T01:01:01",
    "todate": "2023-08-15T01:01:01"
}
{
  "resultsfound": 2032,
  "resultsreturned": 1,
  "results": [
		{
      "fxcid": "4407e4f460c243b1a951016a1bf363c8",
      ...,
      "datetimepublished": "2023-08-02T08:27:15+00:00",
      "components": [
        {
          "role": "story",
          ...,
          "content": "Piper Sandler  analyst Brent Bracelin   maintains Freshworks (NASDAQ:<a class=\"ticker\" href=\"https://www.benzinga.com/stock/FRSH#NASDAQ\">FRSH</a>) with a Overweight and raises the price target from $22 to $27."
        }
	],
  "timetaken": 5,
  "searchinfo": 0
}

The above shows just a snippet of the response. Full anatomy of the content response objects given in Anatomy of a Content Object below

💡

Paging through results

  • We can utilize the date range arguments to page through our results.
  • To achieve this, take the last item from the results list returned and retrieve the datetimepublished value (shown above)
  • Then feed this back into the todate argument of the subsequent API call.
  • Repeat as necessary as paging

Filtering content by text match

The simplest and quickest use case for the endpoint is to search for some content referencing a particular query string. Below, we get search for content mentioning "Tesla" that was published within a defined date range:

{
    "apikey": "<access_token>",
    "contentquery": "Tesla",
    "maxresults": 2,
    "fromdate": "2023-08-12",
    "todate": "2023-08-15"
}
{
 "resultsfound": 171,
  "resultsreturned": 2,
  "results": [
    { 
      "fxcid": "5f3f1961e9234129a299a9a03ef0db71",
      "publisherid": "6f956d5af2d94bc095081c5916a13df6",
      "contenettypeid": "01a65965e78d44ddbc508729f7c22b63",
      "publishergontentid": "1657884957941944320",
      "version": "v1",
      "type": "Text",
      "datetimepublished": "2023-05-14T23:05:50+00:00",
      "unixdatetimepublished": 1684105550,
      "datetimecreated": "2023-05-14T23:05:50+00:00",
      "unixdatetimecreated": 1684105550,
      "datetimeupdated": null,
      "unixdatetimeupdated": null,
      "entities": [],
      "topics": [],
      "components": [
        {
          "role": "Tweet",
          "contentmetadata": {
            "language": "en",
            "slugline": null,
            "headline": null,
            "description": null,
            "keywords": [
              {
                "value": "sspencer_smb",
                "type": "User"
              },
              {
                "value": "WholeMarsBlog",
                "type": "User"
              },
              {
                "value": "twitter.com/WholeMarsBlog/…",
                "type": "url"
              }
            ],
            "authors": [
              {
                "name": "Steven Spencer",
                "username": "sspencer_smb",
                "role": "TwitterUser",
                "profileImageurl": "https://pbs.twimg.com/profile_images/897435372044648449/2v0EJLku_normal.jpg"
              }
            ],
            "securities": []
          },
          "content": "\"The strategy of getting an ordinary Tesla to drive itself with just computer vision. It wasn't so crazy after all.\" $TSLA https://t.co/Z5OpLK4z7R"
        }
      ],
      "images": [],
      "timestampprocessed": "2023-05-14T23:05:56.9817902+00:00",
      "unixtimestampprocessed": 1684105556
    }, ...
  ],
  "timetaken": 323,
  "searchinfo": 0    
}

Retrieve content tagged with a specific security

To return just content tagged by a specific security value, you can add that value to the securitynames or securityidentifiers fields, like so:

{
    "apikey": "<access_token>",
    "securitynames": ["AAPL"],
		"securityidentifiers": ["AAPL"],
    "maxresults": 2,
    "fromdate": "2023-08-12",
    "todate": "2023-08-15"
}

⚠️

Note

Only content sources that contain the “securities” field in their response will be returned when filtering by security.

Filter by a specific publisher (data source)

To retrieve content from one or more specific publishers (data sources) we should utilize the publisherids field.

To do this we first need to retrieve the available publishers using the List Publishers Endpoint and then feed them into the publisherids field of the news feed endpoint

💡

Getting a list of publishers

The List publishers endpoint returns a list of publishers and content types that are enabled for your api key. The publisherid field includes the id that can be used to filter content by publisher

curl --location 'https://personafinai.azurewebsites.net/v1/content/publishers/list/enabled' \
--header 'Content-Type: application/json' \
--header 'Cookie: ARRAffinity=628356fae902f3f844f9e9113bb6432b5013900ff654c4981f9460b163e412d2; ARRAffinitySameSite=628356fae902f3f844f9e9113bb6432b5013900ff654c4981f9460b163e412d2' \
--data '{
    "apikey": "<access_token>"
}'
[
    {
        "publisher": "Benzinga Inc",
        "publisherid": "6299a0723d954934980299568664f74a",
        "owner": "Benzinga Inc",
        "contenttype": "Story",
        "datasetname": "Benzinga News",
        "contenttypeid": "daa7632ac81a4cb5a96f4bcee228e527"
    },
    {
        "publisher": "Benzinga Inc",
        "publisherid": "6299a0723d954934980299568664f74a",
        "owner": "Benzinga Inc",
        "contenttype": "Story",
        "datasetname": "Why is it Moving",
        "contenttypeid": "036df6e258e1427aa41f79945f8b7a16"
    }
]

Once you have the id of the target publisher you wish target, we can easily filter the results. For example, the following request body passed into the content search endpoint, will return the latest 10 news items, filtered to the Twitter data source/publisher

{
  "apikey": "<access_token>",
  "publisherids": [
    "6f956d5af2d94bc095081c5916a13df6"
  ]
}

Filter by by one or more content types

You may want to go one step further than filtering down to a particular publisher, and instead filter down to one or more content sets (types) that are provided by that publisher. For example, if you want to populate a Benzinga "Why Is It Moving" feed, excluding any other content from other Benzinga content sets or other publishers.

To do this we first need to retrieve the available content types using the List Publishers Endpoint and then feed them into the contenttypeids field of the news feed endpoint

💡

Getting a list of content types

The List publishers endpoint returns a list of publishers and content types that are enabled for your api key. The contenttypeid field includes the id that can be used to filter content by content type

curl --location 'https://personafinai.azurewebsites.net/v1/content/publishers/list/enabled' \
--header 'Content-Type: application/json' \
--header 'Cookie: ARRAffinity=628356fae902f3f844f9e9113bb6432b5013900ff654c4981f9460b163e412d2; ARRAffinitySameSite=628356fae902f3f844f9e9113bb6432b5013900ff654c4981f9460b163e412d2' \
--data '{
    "apikey": "<access_token>"
}'
[
    {
        "publisher": "Benzinga Inc",
        "publisherid": "6299a0723d954934980299568664f74a",
        "owner": "Benzinga Inc",
        "contenttype": "Story",
        "datasetname": "Benzinga News",
        "contenttypeid": "daa7632ac81a4cb5a96f4bcee228e527"
    },
    {
        "publisher": "Benzinga Inc",
        "publisherid": "6299a0723d954934980299568664f74a",
        "owner": "Benzinga Inc",
        "contenttype": "Story",
        "datasetname": "Why is it Moving",
        "contenttypeid": "036df6e258e1427aa41f79945f8b7a16"
    }
]

Once you've identified the content type id's you wish to filter down to, they can be included in the contenttypeids field to retrieve the content of interest. For example, below we will just return the Benzinga "Why is it Moving" content set.

{
  "apikey": "<access_token>",
  "contenttypeids": [
    "daa7632ac81a4cb5a96f4bcee228e527"
  ]
}

Search for a specific piece of content by publisher content id

Often we will want to retrieve a specific piece of content that we want to display on a frontend. For example, if we retrieve a list of popular content items using the Popular Content API, we can pass that list of content Id's into the publishercontentid argument to retrieve the full body for those articles.

{
    "apikey": "<access_token>",
    "publishercontentid": ["33084401", "33089399", "33091769"],
    "maxresults": 2
}

Anatomy of a Content Object

The Content objects returned by the endpoint can have varying components depending on the source (publisher) of the content, and the structure of the content on ingestion. However, they all have certain common elements

Response Object

The API itself will respond with the following major elements

FieldData typeDescription
resultsfoundintNumber of content items found given filter arguments
resultslistList of content item objects returned by the api.
See Anatomy of a Content Object for details on content item objects
timetakenintTime for the API to generate response, in milliseconds

Common elements - Content Object

All content items returned by the content API contain the following common elements

FieldData TypeDescription
fxcidguidInternal GUID assigned to the piece of content
publisheridguidInternal GUID assigned to the publisher (data source) of the content
publishercontentidstringID assigned to the content item by the publisher
versionstring
typestringType of content this includes Currently supports: “Text”
datetimepublisheddatetimeDatetime that the content was published. In the format YYYY-mm-ddTHH:MM:SS
unixdatetimepublishedintDatetime that the content was published. Unix epoch value.
datetimecreateddatetimeDatetime that the content was added to the system. In the format YYYY-mm-ddTHH:MM:SS
unixdatetimecreatedintDatetime that the content was added to the system. Unix epoch value.
datetimeupdateddatetimeDatetime that the content was updated in the system. In the format YYYY-mm-ddTHH:MM:SS
unixdatetimeupdatedintDatetime that the content was updated in the system. Unix epoch value.
entitieslistFinancial entities that the system has associated with the content Not currently supported
topics listFinancial topics that the system has associated with the content Not currently supported
componentslistContains details of the content returned itself
components.rolestringObject category, that can be used to distinguish between content types. Examples include: “Tweet”, “story”
components.contentmetadatalistMeta data that were included in the content object on ingestion These values can differ depending on content source. For notable elements see Content MetaData - Notable values below
components.contentstringThe text body or HTML form of the returned content Key field for displaying content
imageslistImages included in the content
timestampprocesseddatetime
unixtimestampprocessedint

Content MetaData - Notable values

The content object returned holds the original object payload/details from the publisher (data source) in the components.contentmetadata element of the content object.

These elements can differ and are largely dependent on data source.

Some notable values contained in the MetaData include:

FieldData TypeDescription
languagestringLanguage code for the content
headlinestringThe headline of the article
keywordslistList of keyword value - type objects contained in the content
authorslistList of author objects including - name: str - username: str - role: str - profileImageurl: url Note: for tweets, this can be used to extract the author profile image
securitieslistA list of security symbols tagged in the content by the source. Includes the following information
- name: str - the name of the security or the identifier
- identifier: str - the symbol tagged in the content
- identifiertype: str - the type of identifier (e.g. Symbol)
- exchange: str - the symbol exchange (if known)
Language
Click Try It! to start a request and see the response here!