Sellers.json

Would you like to find who is advertising with Google AdSense? There is a Google file sellers.json . Most objects have a property “is_confidential”: 1, but other objects could have a domain name and other information. Be aware, the file is huge, more than 140 MB and getting bigger. Of course, there is more than one million sellers. Below is a command to download it:

curl https://storage.googleapis.com/adx-rtb-dictionaries/sellers.json -O

Here are several example commands utilizing jq tool:

# let's find how many sellers are in the file
$ jq '.sellers[] | length ' sellers.json | wc -l
1302360

# one concrete seller_id
$ jq '.sellers[] | select(.seller_id == "pub-0000000381088596")'  < sellers.json
{
  "seller_id": "pub-0000000381088596",
  "is_confidential": 1,
  "seller_type": "PUBLISHER"
}

# all non confidential sellers
$ jq '.sellers[] | select(.is_confidential != 1)'  < sellers.json
...removed...

#all objects with domain
$ jq '.sellers[] | select(.domain != null)'  < sellers.json | head -11
{
  "seller_id": "pub-0000074043221914",
  "seller_type": "PUBLISHER",
  "name": "Imago Informationstechnologie GmbH",
  "domain": "automobile.de"
}
...truncated...

# all objects with domain containing .tv
$ jq '.sellers[] | select(.domain != null) | select(.domain | contains(".tv"))'  < sellers.json
{
  "seller_id": "pub-0015244548563708",
  "seller_type": "PUBLISHER",
  "name": "Richard Foster",
  "domain": "streamfree.tv"
}
...truncated...
This entry was posted in workday. Bookmark the permalink.

Leave a Reply